Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down !

Chu Duc Minh Mon, 16 Mar 2015 08:02:23 -0700

@Michael Kuriger: when ceph/librbd operate normally, i know that double the
pg_num is the safe way. But when it has problem, i think double it can make
many many VMs die (maybe >= 50%?)



On Mon, Mar 16, 2015 at 9:53 PM, Michael Kuriger <mk7...@yp.com> wrote:

>   I always keep my pg number a power of 2.  So I’d go from 2048 to 4096.
> I’m not sure if this is the safest way, but it’s worked for me.
>
>
>
> [image: yp]
>
>
>
> Michael Kuriger
>
> Sr. Unix Systems Engineer
>
> * mk7...@yp.com |( 818-649-7235
>
>   From: Chu Duc Minh <chu.ducm...@gmail.com>
> Date: Monday, March 16, 2015 at 7:49 AM
> To: Florent B <flor...@coppint.com>
> Cc: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
> Subject: Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down !
>
>    I'm using the latest Giant and have the same issue. When i increase
> PG_num of a pool from 2048 to 2148, my VMs is still ok. When i increase
> from 2148 to 2400, some VMs die (Qemu-kvm process die).
>  My physical servers (host VMs) running kernel 3.13 and use librbd.
>  I think it's a bug in librbd with crushmap.
>  (I set crush_tunables3 on my ceph cluster, does it make sense?)
>
> Do you know a way to safely increase PG_num? (I don't think increase
> PG_num 100 each times is a safe & good way)
>
>  Regards,
>
> On Mon, Mar 16, 2015 at 8:50 PM, Florent B <flor...@coppint.com> wrote:
>
>> We are on Giant.
>>
>> On 03/16/2015 02:03 PM, Azad Aliyar wrote:
>> >
>> > May I know your ceph version.?. The latest version of firefly 80.9 has
>> > patches to avoid excessive data migrations during rewighting osds. You
>> > may need set a tunable inorder make this patch active.
>> >
>> > This is a bugfix release for firefly.  It fixes a performance regression
>> > in librbd, an important CRUSH misbehavior (see below), and several RGW
>> > bugs.  We have also backported support for flock/fcntl locks to
>> ceph-fuse
>> > and libcephfs.
>> >
>> > We recommend that all Firefly users upgrade.
>> >
>> > For more detailed information, see
>> >   http://docs.ceph.com/docs/master/_downloads/v0.80.9.txt
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_-5Fdownloads_v0.80.9.txt&d=AwMFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=0MEOMMXqQGLq4weFd85B2Bxn5uBH9V9uMiuajNVb7o0&s=-HHkWm2cMQZ06FKpWF4Ai-YkFb9lUR_tH_KR0eITbuU&e=>
>> >
>> > Adjusting CRUSH maps
>> > --------------------
>> >
>> > * This point release fixes several issues with CRUSH that trigger
>> >   excessive data migration when adjusting OSD weights.  These are most
>> >   obvious when a very small weight change (e.g., a change from 0 to
>> >   .01) triggers a large amount of movement, but the same set of bugs
>> >   can also lead to excessive (though less noticeable) movement in
>> >   other cases.
>> >
>> >   However, because the bug may already have affected your cluster,
>> >   fixing it may trigger movement *back* to the more correct location.
>> >   For this reason, you must manually opt-in to the fixed behavior.
>> >
>> >   In order to set the new tunable to correct the behavior::
>> >
>> >      ceph osd crush set-tunable straw_calc_version 1
>> >
>> >   Note that this change will have no immediate effect.  However, from
>> >   this point forward, any 'straw' bucket in your CRUSH map that is
>> >   adjusted will get non-buggy internal weights, and that transition
>> >   may trigger some rebalancing.
>> >
>> >   You can estimate how much rebalancing will eventually be necessary
>> >   on your cluster with::
>> >
>> >      ceph osd getcrushmap -o /tmp/cm
>> >      crushtool -i /tmp/cm --num-rep 3 --test --show-mappings > /tmp/a
>> 2>&1
>> >      crushtool -i /tmp/cm --set-straw-calc-version 1 -o /tmp/cm2
>> >      crushtool -i /tmp/cm2 --reweight -o /tmp/cm2
>> >      crushtool -i /tmp/cm2 --num-rep 3 --test --show-mappings > /tmp/b
>> > 2>&1
>> >      wc -l /tmp/a                          # num total mappings
>> >      diff -u /tmp/a /tmp/b | grep -c ^+    # num changed mappings
>> >
>> >    Divide the total number of lines in /tmp/a with the number of lines
>> >    changed.  We've found that most clusters are under 10%.
>> >
>> >    You can force all of this rebalancing to happen at once with::
>> >
>> >      ceph osd crush reweight-all
>> >
>> >    Otherwise, it will happen at some unknown point in the future when
>> >    CRUSH weights are next adjusted.
>> >
>> > Notable Changes
>> > ---------------
>> >
>> > * ceph-fuse: flock, fcntl lock support (Yan, Zheng, Greg Farnum)
>> > * crush: fix straw bucket weight calculation, add straw_calc_version
>> >   tunable (#10095 Sage Weil)
>> > * crush: fix tree bucket (Rongzu Zhu)
>> > * crush: fix underflow of tree weights (Loic Dachary, Sage Weil)
>> > * crushtool: add --reweight (Sage Weil)
>> > * librbd: complete pending operations before losing image (#10299 Jason
>> >   Dillaman)
>> > * librbd: fix read caching performance regression (#9854 Jason Dillaman)
>> > * librbd: gracefully handle deleted/renamed pools (#10270 Jason
>> Dillaman)
>> > * mon: fix dump of chooseleaf_vary_r tunable (Sage Weil)
>> > * osd: fix PG ref leak in snaptrimmer on peering (#10421 Kefu Chai)
>> > * osd: handle no-op write with snapshot (#10262 Sage Weil)
>> > * radosgw-admi
>> >
>> >
>> >
>> >
>> > On 03/16/2015 12:37 PM, Alexandre DERUMIER wrote:
>> > >>> VMs are running on the same nodes than OSD
>> > > Are you sure that you didn't some kind of out of memory.
>> > > pg rebalance can be memory hungry. (depend how many osd you have).
>> >
>> > 2 OSD per host, and 5 hosts in this cluster.
>> > hosts h
>> >
>>
>>   _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.com&d=AwMFaQ&c=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQ&r=CSYA9OS6Qd7fQySI2LDvlQ&m=0MEOMMXqQGLq4weFd85B2Bxn5uBH9V9uMiuajNVb7o0&s=Ia5izwHCY5W52bW4JusE-wRH_UKmfX-03xvLZ2wMta0&e=>
>>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [SPAM] Changing pg_num => RBD VM down !

Reply via email to