Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

Gregory Farnum Thu, 18 Dec 2014 20:24:32 -0800

What kind of uploads are you performing? How are you testing?
Have you looked at the admin sockets on any daemons yet? Examining the OSDs
to see if they're behaving differently on the different requests is one
angle of attack. The other is look into is if the RGW daemons are hitting
throttler limits or something that the RBD clients aren't.
-Greg
On Thu, Dec 18, 2014 at 7:35 PM Sean Sullivan <seapasu...@uchicago.edu>
wrote:


> Hello Yall!
>
> I can't figure out why my gateways are performing so poorly and I am not
> sure where to start looking. My RBD mounts seem to be performing fine
> (over 300 MB/s) while uploading a 5G file to Swift/S3 takes 2m32s
> (32MBps i believe). If we try a 1G file it's closer to 8MBps. Testing
> with nuttcp shows that I can transfer from a client with 10G interface
> to any node on the ceph cluster at the full 10G and ceph can transfer
> close to 20G between itself. I am not really sure where to start looking
> as outside of another issue which I will mention below I am clueless.
>
> I have a weird setup
> [osd nodes]
> 60 x 4TB 7200 RPM SATA Drives
> 12 x  400GB s3700 SSD drives
> 3 x SAS2308 PCI-Express Fusion-MPT cards (drives are split evenly across
> the 3 cards)
> 512 GB of RAM
> 2 x CPU E5-2670 v2 @ 2.50GHz
> 2 x 10G interfaces  LACP bonded for cluster traffic
> 2 x 10G interfaces LACP bonded for public traffic (so a total of 4 10G
> ports)
>
> [monitor nodes and gateway nodes]
> 4 x 300G 1500RPM SAS drives in raid 10
> 1 x SAS 2208
> 64G of RAM
> 2 x CPU E5-2630 v2
> 2 x 10G interfaces LACP bonded for public traffic (total of 2 10G ports)
>
>
> Here is a pastebin dump of my details, I am running ceph giant 0.87
> (c51c8f9d80fa4e0168aa52685b8de40e42758578) and kernel 3.13.0-40-generic
> across the entire cluster.
>
> http://pastebin.com/XQ7USGUz -- ceph health detail
> http://pastebin.com/8DCzrnq1 -- /etc/ceph/ceph.conf
> http://pastebin.com/BC3gzWhT -- ceph osd tree
> http://pastebin.com/eRyY4H4c -- /var/log/radosgw/client.radosgw.rgw03.log
> http://paste.ubuntu.com/9565385/ -- crushmap (pastebin wouldn't let me)
>
>
> We ran into a few issues with density (conntrack limits, pid limit, and
> number of open files) all of which I adjusted by bumping the ulimits in
> /etc/security/limits.d/ceph.conf or sysctl. I am no longer seeing any
> signs of these limits being hit so I have not included my limits or
> sysctl conf. If you like this as well let me know and I can include it.
>
> One of the issues I am seeing is that OSDs have started to flop/ be
> marked as slow. The cluster was HEALTH_OK with all of the disks added
> for over 3 weeks before this behaviour started. RBD transfers seem to be
> fine for the most part which makes me think that this has little baring
> on the gateway issue but it may be related. Rebooting the OSD seems to
> fix this issue.
>
> I would like to figure out the root cause of both of these issues and
> post the results back here if possible (perhaps it can help other
> people). I am really looking for a place to start looking at as the
> gateway just outputs that it is posting data and all of the logs
> (outside of the monitors reporting down osds) seem to show a fully
> functioning cluster.
>
> Please help. I am in the #ceph room on OFTC every day as 'seapasulli' as
> well.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 1256 OSD/21 server ceph cluster performance issues.

Reply via email to