Re: [ceph-users] PG status is "active+undersized+degraded"
Hi, On 06/22/2018 08:06 AM, dave.c...@dell.com wrote: I saw these statement from this link ( http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the reason which leads to the warning? " This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability." Best Regards, Dave Chen -Original Message- From: Chen2, Dave Sent: Friday, June 22, 2018 1:59 PM To: 'Burkhard Linke'; ceph-users@lists.ceph.com Cc: Chen2, Dave Subject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. Another question is if three homogeneous OSD is spread across 2 nodes, I still got the warning message, and the status is "active+undersized+degraded", so does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA consideration? Any official documents from Ceph has some guide on this? The default ceph crush rules try to distribute PG replicates among hosts. With a default replication number of 3 (pool size = 3), this requires at least three hosts. The pool also defines a minimum number of PG replicates to be available for allowing I/O to a PG. This is usually set to 2 (pool min size = 2). The above status thus means that there are enough copies for the min size (-> active), but not enough for the size (-> undersized + degraded). Using less than three hosts requires changing the pool size to 2. But this is strongly discouraged, since a sane automatic recovery of data in case of a netsplit or other temporary node failure is not possible. Do not do this in a production setup. For a production setup you should also consider node failures. The default setup uses 3 replicates, so to allow a node failure, you need 4 hosts. Otherwise the self healing feature of ceph cannot recover the third replicate. You also need to closely monitor your cluster's free space to avoid a full cluster due to replicated PGs in case of a node failure. Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG status is "active+undersized+degraded"
I saw these statement from this link ( http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the reason which leads to the warning? " This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability." Best Regards, Dave Chen -Original Message- From: Chen2, Dave Sent: Friday, June 22, 2018 1:59 PM To: 'Burkhard Linke'; ceph-users@lists.ceph.com Cc: Chen2, Dave Subject: RE: [ceph-users] PG status is "active+undersized+degraded" Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. Another question is if three homogeneous OSD is spread across 2 nodes, I still got the warning message, and the status is "active+undersized+degraded", so does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA consideration? Any official documents from Ceph has some guide on this? $ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 7.25439 root default -2 1.81360 host ceph3 2 1.81360 osd.2 up 1.0 1.0 -4 3.62720 host ceph1 0 1.81360 osd.0 up 1.0 1.0 1 1.81360 osd.1 up 1.0 1.0 Best Regards, Dave Chen -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Burkhard Linke Sent: Thursday, June 21, 2018 2:39 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] PG status is "active+undersized+degraded" Hi, On 06/21/2018 05:14 AM, dave.c...@dell.com wrote: > Hi all, > > I have setup a ceph cluster in my lab recently, the configuration per my > understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of > PG stuck with state "active+undersized+degraded", I think this should be very > generic issue, could anyone help me out? > > Here is the details about the ceph cluster, > > $ ceph -v (jewel) > ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) > > # ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 5.89049 root default > -2 1.81360 host ceph3 > 2 1.81360 osd.2 up 1.0 1.0 > -3 0.44969 host ceph4 > 3 0.44969 osd.3 up 1.0 1.0 > -4 3.62720 host ceph1 > 0 1.81360 osd.0 up 1.0 1.0 > 1 1.81360 osd.1 up 1.0 1.0 *snipsnap* You have a large difference in the capacities of the nodes. This results in a different host weight, which in turn might lead to problems with the crush algorithm. It is not able to get three different hosts for OSD placement for some of the PGs. CEPH and crush do not cope well with heterogenous setups. I would suggest to move one of the OSDs from host ceph1 to ceph4 to equalize the host weight. Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG status is "active+undersized+degraded"
Hi Burkhard, Thanks for your explanation, I created an new OSD with 2TB from another node, it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" now. Another question is if three homogeneous OSD is spread across 2 nodes, I still got the warning message, and the status is "active+undersized+degraded", so does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA consideration? Any official documents from Ceph has some guide on this? $ ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 7.25439 root default -2 1.81360 host ceph3 2 1.81360 osd.2 up 1.0 1.0 -4 3.62720 host ceph1 0 1.81360 osd.0 up 1.0 1.0 1 1.81360 osd.1 up 1.0 1.0 Best Regards, Dave Chen -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Burkhard Linke Sent: Thursday, June 21, 2018 2:39 PM To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] PG status is "active+undersized+degraded" Hi, On 06/21/2018 05:14 AM, dave.c...@dell.com wrote: > Hi all, > > I have setup a ceph cluster in my lab recently, the configuration per my > understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of > PG stuck with state "active+undersized+degraded", I think this should be very > generic issue, could anyone help me out? > > Here is the details about the ceph cluster, > > $ ceph -v (jewel) > ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) > > # ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 5.89049 root default > -2 1.81360 host ceph3 > 2 1.81360 osd.2 up 1.0 1.0 > -3 0.44969 host ceph4 > 3 0.44969 osd.3 up 1.0 1.0 > -4 3.62720 host ceph1 > 0 1.81360 osd.0 up 1.0 1.0 > 1 1.81360 osd.1 up 1.0 1.0 *snipsnap* You have a large difference in the capacities of the nodes. This results in a different host weight, which in turn might lead to problems with the crush algorithm. It is not able to get three different hosts for OSD placement for some of the PGs. CEPH and crush do not cope well with heterogenous setups. I would suggest to move one of the OSDs from host ceph1 to ceph4 to equalize the host weight. Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] PG status is "active+undersized+degraded"
Hi, On 06/21/2018 05:14 AM, dave.c...@dell.com wrote: Hi all, I have setup a ceph cluster in my lab recently, the configuration per my understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state "active+undersized+degraded", I think this should be very generic issue, could anyone help me out? Here is the details about the ceph cluster, $ ceph -v (jewel) ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 5.89049 root default -2 1.81360 host ceph3 2 1.81360 osd.2 up 1.0 1.0 -3 0.44969 host ceph4 3 0.44969 osd.3 up 1.0 1.0 -4 3.62720 host ceph1 0 1.81360 osd.0 up 1.0 1.0 1 1.81360 osd.1 up 1.0 1.0 *snipsnap* You have a large difference in the capacities of the nodes. This results in a different host weight, which in turn might lead to problems with the crush algorithm. It is not able to get three different hosts for OSD placement for some of the PGs. CEPH and crush do not cope well with heterogenous setups. I would suggest to move one of the OSDs from host ceph1 to ceph4 to equalize the host weight. Regards, Burkhard ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] PG status is "active+undersized+degraded"
Hi all, I have setup a ceph cluster in my lab recently, the configuration per my understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state "active+undersized+degraded", I think this should be very generic issue, could anyone help me out? Here is the details about the ceph cluster, $ ceph -v (jewel) ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 5.89049 root default -2 1.81360 host ceph3 2 1.81360 osd.2 up 1.0 1.0 -3 0.44969 host ceph4 3 0.44969 osd.3 up 1.0 1.0 -4 3.62720 host ceph1 0 1.81360 osd.0 up 1.0 1.0 1 1.81360 osd.1 up 1.0 1.0 # ceph health detail HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs stuck undersized; 2 pgs undersized pg 17.58 is stuck unclean for 61033.947719, current state active+undersized+degraded, last acting [2,0] pg 17.16 is stuck unclean for 61033.948201, current state active+undersized+degraded, last acting [0,2] pg 17.58 is stuck undersized for 61033.343824, current state active+undersized+degraded, last acting [2,0] pg 17.16 is stuck undersized for 61033.327566, current state active+undersized+degraded, last acting [0,2] pg 17.58 is stuck degraded for 61033.343835, current state active+undersized+degraded, last acting [2,0] pg 17.16 is stuck degraded for 61033.327576, current state active+undersized+degraded, last acting [0,2] pg 17.16 is active+undersized+degraded, acting [0,2] pg 17.58 is active+undersized+degraded, acting [2,0] # rados lspools rbdbench $ ceph osd pool get rbdbench size size: 3 Where can I get the details about the issue? Appreciate for any comments! Best Regards, Dave Chen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com