On Thursday, July 4, 2013, Sylvain Munaut wrote: > Hi, > > > I'm doing some test on a cluster, or at least part of it. I have > several crush rule reparting data over different types of OSD. The > only part of interest today is composed of 768 PGs distributed over 4 > servers and 8 OSDs (2 OSDs per server). This pool is almost empty > there is like 5-10 Go of data in it or so. > > As a test, I shutdown one of the server and let it redistribute data. > > And it's taking forever ... 1 hour after the start, it's still not > done, and status is something like : > > 2013-07-04 13:53:42.393478 mon.0 [INF] pgmap v16947951: 12808 pgs: > 12603 active+clean, 6 active+degraded+wait_backfill, 171 > active+recovery_wait, 1 active+degraded+backfilling, 1 > active+degraded+remapped+wait_backfill, 26 active+recovering; 796 GB > data, 1877 GB used, 12419 GB / 14296 GB avail; 764KB/s rd, 122KB/s wr, > 68op/s; 158783/2934125 degraded (5.412%); recovering 93 o/s, 1790KB/s > > The network usage, disk usage and cpu usage on the OSDs is very low so > I'm not sure why it's so slow. I mean some throttling to serve real > query in priority is good but here it's just a bit too much. > I mean it's doing less than 1Mo/s network transfer or disk writes ... > > Anyone has an explanation and/or better a solution ?
What's the average object size? It looks like you've got 27 PGs which are actively doing recovery and they're each doing about 3 recoveries per second. That's about the right PG count given the small number of OSDs in the pool (based on the tunable recovery values), so it's just the speed which is bad. If it is because the objects are small, Sam is doing work now that I believe will land in Dumpling and should significantly improve that situation. -Greg -- Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com