Hi,

On 06/22/2018 08:06 AM, dave.c...@dell.com wrote:
I saw these statement from this link ( 
http://docs.ceph.com/docs/master/rados/operations/crush-map/ ), it that the 
reason which leads to the warning?

" This, combined with the default CRUSH failure domain, ensures that replicas or 
erasure code shards are separated across hosts and a single host failure will not affect 
availability."

Best Regards,
Dave Chen

-----Original Message-----
From: Chen2, Dave
Sent: Friday, June 22, 2018 1:59 PM
To: 'Burkhard Linke'; ceph-users@lists.ceph.com
Cc: Chen2, Dave
Subject: RE: [ceph-users] PG status is "active+undersized+degraded"

Hi Burkhard,

Thanks for your explanation, I created an new OSD with 2TB from another node, it truly 
solved the issue, the status of Ceph cluster is " health HEALTH_OK" now.

Another question is if three homogeneous OSD is spread across 2 nodes, I still got the 
warning message, and  the status is "active+undersized+degraded",  so does the 
three OSD spread across 3 nodes are mandatory rules for Ceph? Is that only for the HA 
consideration? Any official documents from Ceph has some guide on this?

The default ceph crush rules try to distribute PG replicates among hosts. With a default replication number of 3 (pool size = 3), this requires at least three hosts. The pool also defines a minimum number of PG replicates to be available for allowing I/O to a PG. This is usually set to 2 (pool min size = 2). The above status thus means that there are enough copies for the min size (-> active), but not enough for the size (-> undersized + degraded).

Using less than three hosts requires changing the pool size to 2. But this is strongly discouraged, since a sane automatic recovery of data in case of a netsplit or other temporary node failure is not possible. Do not do this in a production setup.

For a production setup you should also consider node failures. The default setup uses 3 replicates, so to allow a node failure, you need 4 hosts. Otherwise the self healing feature of ceph cannot recover the third replicate. You also need to closely monitor your cluster's free space to avoid a full cluster due to replicated PGs in case of a node failure.

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to