Re: RADOS + deep scrubbing performance issues in production environment

icq2206241 Fri, 10 Jul 2015 08:31:31 -0700

All IO drops to ZERO IOPS for 1-15 minutes during the deep-scrub on my cluster. 
There is clearly a locking bug!


I have VMs - every day, several times, sometime on all of them disk IO 
_completely_ stops. Disk queue is growing, 0 IOPS are performed, services are 
dying with timeouts... At the same time the CEPH (where the VM images are 
stored) is doing a deep scrub. No fiddling with priorities and number of 
different threads are helping. Actually, making the scrub slower makes those 
delays longer - so there is clearly a bug with locking. 

I am experiencing this for two years already, since then we tried everything 
and upgraded our cluster several times! Nothing helps!

Re: RADOS + deep scrubbing performance issues in production environment

Reply via email to