I thought I'd follow up on this just in case anyone else experiences
similar issues. We ended up increasing the tcmalloc thread cache size and
saw a huge improvement in latency. This got us out of the woods because we
were finally in a state where performance was good enough that it was no
longer impacting services.

The tcmalloc issues are pretty well documented on this mailing list and I
don't believe they impact newer versions of Ceph but I thought I'd at least
give a data point. After making this change our average apply latency
dropped to 3.46ms during peak business hours. To give you an idea of how
significant that is here's a graph of the apply latency prior to the
change: https://imgur.com/KYUETvD

This however did not resolve all of our issues. We were still seeing high
iowait (repeated spikes up to 400ms) on three of our OSD nodes on all
disks. We tried replacing the RAID controller (PERC H730) on these nodes
and while this resolved the issue on one server the two others remained
problematic. These two nodes were configured differently than the rest.
They'd been configured in non-raid mode while the others were configured as
individual raid-0. This turned out to be the problem. We ended up removing
the two nodes one at a time and rebuilding them with their disks configured
in independent raid-0 instead of non-raid. After this change iowait rarely
spikes above 15ms and averages <1ms.

I was really surprised at the performance impact when using non-raid mode.
While I realize non-raid bypasses the controller cache I still would have
never expected such high latency. Dell has a whitepaper that recommends
using individual raid-0 but their own tests show only a small performance
advantage over non-raid. Note that we are running SAS disks, they actually
recommend non-raid mode for SATA but I have not tested this. You can view
the whtiepaper here:
http://en.community.dell.com/techcenter/cloud/m/dell_cloud_resources/20442913/download

I hope this helps someone.

John Petrini
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to