Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-25 Thread Burkhard Linke

Hi,


On 06/22/2018 08:06 AM, dave.c...@dell.com wrote:

I saw these statement from this link ( 
http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the 
reason which leads to the warning?

" This, combined with the default CRUSH failure domain, ensures that replicas or 
erasure code shards are separated across hosts and a single host failure will not affect 
availability."

Best Regards,
Dave Chen

-Original Message-
From: Chen2, Dave
Sent: Friday, June 22, 2018 1:59 PM
To: 'Burkhard Linke'; ceph-users@lists.ceph.com
Cc: Chen2, Dave
Subject: RE: [ceph-users] PG status is "active+undersized+degraded"

Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, it truly 
solved the issue, the status of Ceph cluster is " health HEALTH_OK" now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still got the 
warning message, and  the status is "active+undersized+degraded",  so does the 
three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA 
consideration? Any official documents from Ceph has some guide on this?


The default ceph crush rules try to distribute PG replicates among 
hosts. With a default replication number of 3 (pool size = 3), this 
requires at least three hosts. The pool also defines a minimum number of 
PG replicates to be available for allowing I/O to a PG. This is usually 
set to 2 (pool min size = 2). The above status thus means that there are 
enough copies for the min size (-> active), but not enough for the size 
(-> undersized + degraded).


Using less than three hosts requires changing the pool size to 2. But 
this is strongly discouraged, since a sane automatic recovery of data in 
case of a netsplit or other temporary node failure is not possible. Do 
not do this in a production setup.


For a production setup you should also consider node failures. The 
default setup uses 3 replicates, so to allow a node failure, you need 4 
hosts. Otherwise the self healing feature of ceph cannot recover the 
third replicate. You also need to closely monitor your cluster's free 
space to avoid a full cluster due to replicated PGs in case of a node 
failure.


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-22 Thread Dave.Chen
I saw these statement from this link ( 
http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the 
reason which leads to the warning?  

" This, combined with the default CRUSH failure domain, ensures that replicas 
or erasure code shards are separated across hosts and a single host failure 
will not affect availability."

Best Regards,
Dave Chen

-Original Message-
From: Chen2, Dave 
Sent: Friday, June 22, 2018 1:59 PM
To: 'Burkhard Linke'; ceph-users@lists.ceph.com
Cc: Chen2, Dave
Subject: RE: [ceph-users] PG status is "active+undersized+degraded"

Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, 
it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" 
now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still 
got the warning message, and  the status is "active+undersized+degraded",  so 
does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that 
only for the HA consideration? Any official documents from Ceph has some guide 
on this?


$ ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.25439 root default
-2 1.81360 host ceph3
 2 1.81360 osd.2   up  1.0  1.0
-4 3.62720 host ceph1
 0 1.81360 osd.0   up  1.0  1.0
 1 1.81360 osd.1   up  1.0  1.0


Best Regards,
Dave Chen

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Burkhard Linke
Sent: Thursday, June 21, 2018 2:39 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG status is "active+undersized+degraded"

Hi,


On 06/21/2018 05:14 AM, dave.c...@dell.com wrote:
> Hi all,
>
> I have setup a ceph cluster in my lab recently, the configuration per my 
> understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of 
> PG stuck with state "active+undersized+degraded", I think this should be very 
> generic issue, could anyone help me out?
>
> Here is the details about the ceph cluster,
>
> $ ceph -v  (jewel)
> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>
> # ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.89049 root default
> -2 1.81360 host ceph3
> 2 1.81360 osd.2   up  1.0  1.0
> -3 0.44969 host ceph4
> 3 0.44969 osd.3   up  1.0  1.0
> -4 3.62720 host ceph1
> 0 1.81360 osd.0   up  1.0  1.0
> 1 1.81360 osd.1   up  1.0  1.0

*snipsnap*

You have a large difference in the capacities of the nodes. This results in a 
different host weight, which in turn might lead to problems with the crush 
algorithm. It is not able to get three different hosts for OSD placement for 
some of the PGs.

CEPH and crush do not cope well with heterogenous setups. I would suggest to 
move one of the OSDs from host ceph1 to ceph4 to equalize the host weight.

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-22 Thread Dave.Chen
Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, 
it truly solved the issue, the status of Ceph cluster is " health HEALTH_OK" 
now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still 
got the warning message, and  the status is "active+undersized+degraded",  so 
does the three OSD spread across 3 nodes are mandatory rules for Ceph? Is that 
only for the HA consideration? Any official documents from Ceph has some guide 
on this?


$ ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.25439 root default
-2 1.81360 host ceph3
 2 1.81360 osd.2   up  1.0  1.0
-4 3.62720 host ceph1
 0 1.81360 osd.0   up  1.0  1.0
 1 1.81360 osd.1   up  1.0  1.0


Best Regards,
Dave Chen

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Burkhard Linke
Sent: Thursday, June 21, 2018 2:39 PM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] PG status is "active+undersized+degraded"

Hi,


On 06/21/2018 05:14 AM, dave.c...@dell.com wrote:
> Hi all,
>
> I have setup a ceph cluster in my lab recently, the configuration per my 
> understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of 
> PG stuck with state "active+undersized+degraded", I think this should be very 
> generic issue, could anyone help me out?
>
> Here is the details about the ceph cluster,
>
> $ ceph -v  (jewel)
> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>
> # ceph osd tree
> ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 5.89049 root default
> -2 1.81360 host ceph3
> 2 1.81360 osd.2   up  1.0  1.0
> -3 0.44969 host ceph4
> 3 0.44969 osd.3   up  1.0  1.0
> -4 3.62720 host ceph1
> 0 1.81360 osd.0   up  1.0  1.0
> 1 1.81360 osd.1   up  1.0  1.0

*snipsnap*

You have a large difference in the capacities of the nodes. This results in a 
different host weight, which in turn might lead to problems with the crush 
algorithm. It is not able to get three different hosts for OSD placement for 
some of the PGs.

CEPH and crush do not cope well with heterogenous setups. I would suggest to 
move one of the OSDs from host ceph1 to ceph4 to equalize the host weight.

Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG status is "active+undersized+degraded"

2018-06-21 Thread Burkhard Linke

Hi,


On 06/21/2018 05:14 AM, dave.c...@dell.com wrote:

Hi all,

I have setup a ceph cluster in my lab recently, the configuration per my understanding 
should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of PG stuck with state 
"active+undersized+degraded", I think this should be very generic issue, could 
anyone help me out?

Here is the details about the ceph cluster,

$ ceph -v  (jewel)
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)

# ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.89049 root default
-2 1.81360 host ceph3
2 1.81360 osd.2   up  1.0  1.0
-3 0.44969 host ceph4
3 0.44969 osd.3   up  1.0  1.0
-4 3.62720 host ceph1
0 1.81360 osd.0   up  1.0  1.0
1 1.81360 osd.1   up  1.0  1.0


*snipsnap*

You have a large difference in the capacities of the nodes. This results 
in a different host weight, which in turn might lead to problems with 
the crush algorithm. It is not able to get three different hosts for OSD 
placement for some of the PGs.


CEPH and crush do not cope well with heterogenous setups. I would 
suggest to move one of the OSDs from host ceph1 to ceph4 to equalize the 
host weight.


Regards,
Burkhard
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG status is "active+undersized+degraded"

2018-06-20 Thread Dave.Chen
Hi all,

I have setup a ceph cluster in my lab recently, the configuration per my 
understanding should be okay, 4 OSD across 3 nodes, 3 replicas, but couple of 
PG stuck with state "active+undersized+degraded", I think this should be very 
generic issue, could anyone help me out?

Here is the details about the ceph cluster,

$ ceph -v  (jewel)
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)

# ceph osd tree
ID WEIGHT  TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.89049 root default
-2 1.81360 host ceph3
2 1.81360 osd.2   up  1.0  1.0
-3 0.44969 host ceph4
3 0.44969 osd.3   up  1.0  1.0
-4 3.62720 host ceph1
0 1.81360 osd.0   up  1.0  1.0
1 1.81360 osd.1   up  1.0  1.0


# ceph health detail
HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 
stuck undersized; 2 pgs undersized
pg 17.58 is stuck unclean for 61033.947719, current state 
active+undersized+degraded, last acting [2,0]
pg 17.16 is stuck unclean for 61033.948201, current state 
active+undersized+degraded, last acting [0,2]
pg 17.58 is stuck undersized for 61033.343824, current state 
active+undersized+degraded, last acting [2,0]
pg 17.16 is stuck undersized for 61033.327566, current state 
active+undersized+degraded, last acting [0,2]
pg 17.58 is stuck degraded for 61033.343835, current state 
active+undersized+degraded, last acting [2,0]
pg 17.16 is stuck degraded for 61033.327576, current state 
active+undersized+degraded, last acting [0,2]
pg 17.16 is active+undersized+degraded, acting [0,2]
pg 17.58 is active+undersized+degraded, acting [2,0]



# rados lspools
rbdbench


$ ceph osd pool get rbdbench size
size: 3



Where can I get the details about the issue?   Appreciate for any comments!

Best Regards,
Dave Chen

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com