Also, your min_size is set to 2.  What this means is that you need at least
2 copies of your data up to be able to access it.  You do not want to have
min_size of 1.  If you had min_size of 1 and you only have 1 copy of your
data receiving writes and then that copy goes down as well... What is to
stop one of the other 2 copies coming up before the copy that was last up
except not knowing about the current state of the data.  Now you're in a
state where your data is corrupt in that the client doesn't know what state
the data is in.

On Fri, Jun 2, 2017 at 10:34 AM Ashley Merrick <ash...@amerrick.co.uk>
wrote:

> You only have 3 osd's hence with one down you only have 2 left for
> replication of 3 objects.
>
> No spare OSD to place the 3rd object on, if you was to add a 4th node the
> issue would be removed.
>
> ,Ashley
> On 2 Jun 2017, at 10:31 PM, Oleg Obleukhov <leoleov...@gmail.com> wrote:
>
> Hello,
> I am playing around with ceph (ceph version 10.2.7
> (50e863e0f4bc8f4b9e31156de690d765af245185)) on Debian Jessie and I build a
> test setup:
>
> $ ceph osd tree
> ID WEIGHT  TYPE NAME                  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 0.01497 root default
> -2 0.00499     host af-staging-ceph01
>  0 0.00499         osd.0                   up  1.00000          1.00000
> -3 0.00499     host af-staging-ceph02
>  1 0.00499         osd.1                   up  1.00000          1.00000
> -4 0.00499     host af-staging-ceph03
>  2 0.00499         osd.2                   up  1.00000          1.00000
>
> So I have 3 osd on 3 servers.
> I also created 2 pools:
>
> ceph osd dump | grep 'replicated size'
> pool 1 'cephfs_data' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 32 pgp_num 32 last_change 33 flags hashpspool
> crash_replay_interval 45 stripe_width 0
> pool 2 'cephfs_metadata' replicated size 3 min_size 2 crush_ruleset 0
> object_hash rjenkins pg_num 32 pgp_num 32 last_change 31 flags hashpspool
> stripe_width 0
>
> Now I am testing failover and kill one of servers:
> ceph osd tree
> ID WEIGHT  TYPE NAME                  UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 0.01497 root default
> -2 0.00499     host af-staging-ceph01
>  0 0.00499         osd.0                   up  1.00000          1.00000
> -3 0.00499     host af-staging-ceph02
>  1 0.00499         osd.1                 down  1.00000          1.00000
> -4 0.00499     host af-staging-ceph03
>  2 0.00499         osd.2                   up  1.00000          1.00000
>
> And now it stuck in the recovery state:
> ceph -s
>     cluster 6b5ff07a-7232-4840-b486-6b7906248de7
>      health HEALTH_WARN
>             64 pgs degraded
>             18 pgs stuck unclean
>             64 pgs undersized
>             recovery 21/63 objects degraded (33.333%)
>             1/3 in osds are down
>             1 mons down, quorum 0,2 af-staging-ceph01,af-staging-ceph03
>      monmap e1: 3 mons at {af-staging-ceph01=
> 10.36.0.121:6789/0,af-staging-ceph02=10.36.0.122:6789/0,af-staging-ceph03=10.36.0.123:6789/0
> }
>             election epoch 38, quorum 0,2
> af-staging-ceph01,af-staging-ceph03
>       fsmap e29: 1/1/1 up {0=af-staging-ceph03.crm.ig.local=up:active}, 2
> up:standby
>      osdmap e78: 3 osds: 2 up, 3 in; 64 remapped pgs
>             flags sortbitwise,require_jewel_osds
>       pgmap v334: 64 pgs, 2 pools, 47129 bytes data, 21 objects
>             122 MB used, 15204 MB / 15326 MB avail
>             21/63 objects degraded (33.333%)
>                   64 active+undersized+degraded
>
> And if I kill one more node I lose access to mounted file system on client.
> Normally I would expect replica-factor to be respected and ceph should
> create the missing copies of degraded pg.
>
> I was trying to rebuild the crush map and it looks like this, but this did
> not help:
> rule replicated_ruleset {
> ruleset 0
> type replicated
> min_size 1
> max_size 10
> step take default
> step chooseleaf firstn 0 type osd
> step emit
> }
>
> # end crush map
>
> Would very appreciate help,
> Thank you very much in advance,
> Oleg.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to