Sage, 

would it help if you add a cache pool to your cluster? Let's say if you add a 
few TBs of ssds acting as a cache pool to your cluster, would this help with 
retaining IO to the guest vms during data recovery or reshuffling? 

Over the past year and a half that we've been using ceph we had a positive 
experience for the majority of time. The only downtime we had for our vms was 
when ceph is doing recovery. It seems that regardless of the tuning options 
we've used, our vms are still unable to get IO, they get to 98-99% iowait and 
freeze. This has happened on dumpling, emperor and now firefly releases. 
Because of this I've set noout flag on my cluster and have to keep an eye on 
the osds for manual intervention, which is far from ideal case (((. 

Andrei 

-- 
Andrei Mikhailovsky 
Director 
Arhont Information Security 

Web: http://www.arhont.com 
http://www.wi-foo.com 
Tel: +44 (0)870 4431337 
Fax: +44 (0)208 429 3111 
PGP: Key ID - 0x2B3438DE 
PGP: Server - keyserver.pgp.com 

DISCLAIMER 

The information contained in this email is intended only for the use of the 
person(s) to whom it is addressed and may be confidential or contain legally 
privileged information. If you are not the intended recipient you are hereby 
notified that any perusal, use, distribution, copying or disclosure is strictly 
prohibited. If you have received this email in error please immediately advise 
us by return email at and...@arhont.com and delete and purge the email and any 
attachments without making a copy. 


----- Original Message -----

From: "Sage Weil" <sw...@redhat.com> 
To: "Gregory Farnum" <g...@inktank.com> 
Cc: ceph-users@lists.ceph.com 
Sent: Thursday, 17 July, 2014 1:06:52 AM 
Subject: Re: [ceph-users] ceph osd crush tunables optimal AND add new OSD at 
the same time 

On Wed, 16 Jul 2014, Gregory Farnum wrote: 
> On Wed, Jul 16, 2014 at 4:45 PM, Craig Lewis <cle...@centraldesktop.com> 
> wrote: 
> > One of the things I've learned is that many small changes to the cluster 
> > are 
> > better than one large change. Adding 20% more OSDs? Don't add them all at 
> > once, trickle them in over time. Increasing pg_num & pgp_num from 128 to 
> > 1024? Go in steps, not one leap. 
> > 
> > I try to avoid operations that will touch more than 20% of the disks 
> > simultaneously. When I had journals on HDD, I tried to avoid going over 10% 
> > of the disks. 
> > 
> > 
> > Is there a way to execute `ceph osd crush tunables optimal` in a way that 
> > takes smaller steps? 
> 
> Unfortunately not; the crush tunables are changes to the core 
> placement algorithms at work. 

Well, there is one way, but it is only somewhat effective. If you 
decompile the crush maps for bobtail vs firefly the actual difference is 

tunable chooseleaf_vary_r 1 

and this is written such that a value of 1 is the optimal 'new' way, 0 is 
the legacy old way, but values > 1 are less-painful steps between the two 
(though mostly closer to the firefly value of 1). So, you could set 

tunable chooseleaf_vary_r 4 

wait for it to settle, and then do 

tunable chooseleaf_vary_r 3 

...and so forth down to 1. I did some limited testing of the data 
movement involved and noted it here: 

https://github.com/ceph/ceph/commit/37f840b499da1d39f74bfb057cf2b92ef4e84dc6 

In my test case, going from 0 to 4 was about 1/10th as bad as going 
straight from 0 to 1, but the final step from 2 to 1 is still about 1/2 as 
bad. I'm not sure if that means it's not worth the trouble of not just 
jumping straight to the firefly tunables, or whether it means legacy users 
should just set (and leave) this at 2 or 3 or 4 and get almost all the 
benefit without the rebalance pain. 

sage 
_______________________________________________ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to