Yeah, that hit the nail on the head. Significantly reducing/eliminating the recovery sleep times increases the recovery speed back up (and beyond!) the levels I was expecting to see - recovery is almost an order of magnitude faster now. Thanks for educating me about those changes!
Rich On 14/09/17 11:16, Richard Hesketh wrote: > Hi Mark, > > No, I wasn't familiar with that work. I am in fact comparing speed of > recovery to maintenance work I did while the cluster was in Jewel; I haven't > manually done anything to sleep settings, only adjusted max backfills OSD > settings. New options that introduce arbitrary slowdown to recovery > operations to preserve client performance would explain what I'm seeing! I'll > have a tinker with adjusting those values (in my particular case client load > on the cluster is very low and I don't have to honour any guarantees about > client performance - getting back into HEALTH_OK asap is preferable). > > Rich > > On 13/09/17 21:14, Mark Nelson wrote: >> Hi Richard, >> >> Regarding recovery speed, have you looked through any of Neha's results on >> recovery sleep testing earlier this summer? >> >> https://www.spinics.net/lists/ceph-devel/msg37665.html >> >> She tested bluestore and filestore under a couple of different scenarios. >> The gist of it is that time to recover changes pretty dramatically depending >> on the sleep setting. >> >> I don't recall if you said earlier, but are you comparing filestore and >> bluestore recovery performance on the same version of ceph with the same >> sleep settings? >> >> Mark >> >> On 09/12/2017 05:24 AM, Richard Hesketh wrote: >>> Thanks for the links. That does seem to largely confirm that what I haven't >>> horribly misunderstood anything and I've not been doing anything obviously >>> wrong while converting my disks; there's no point specifying separate >>> WAL/DB partitions if they're going to go on the same device, throw as much >>> space as you have available at the DB partitions and they'll use all the >>> space they can, and significantly reduced I/O on the DB/WAL device compared >>> to Filestore is expected since bluestore's nixed the write amplification as >>> much as possible. >>> >>> I'm still seeing much reduced recovery speed on my newly Bluestored >>> cluster, but I guess that's a tuning issue rather than evidence of >>> catastrophe. >>> >>> Rich > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Richard Hesketh Systems Engineer, Research Platforms BBC Research & Development
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com