[ceph-users] Re: PG inactive when host is down despite CRUSH failure domain being host

2021-03-10 Thread Eugen Block

Hi,

I only took a quick look, but is that pool configured with size 2? The  
crush_rule says min_size 2 which would explain what you're describing.




Zitat von Janek Bevendorff :


Hi,

I am having a weird phenomenon, which I am having trouble to debug.  
We have 16 OSDs per host, so when I reboot one node, 16 OSDs will be  
missing for a short time. Since our minimum CRUSH failure domain is  
host, this should not cause any problems. Unfortunately, I always  
have handful (1-5) PGs that become inactive nonetheless and are  
stuck in the state undersized+degraded+peered until the host and its  
OSDs are back up. The other 2000+ PGs that are also on these OSDs do  
not have this problem. In total, we have between 110 and 150 PGs per  
OSD with a configured maximum of 250, which should give us enough  
headspace.


The affected pools always seem to be RBD pools or at least I haven't  
seen it on our much larger RGW pool yet. The pool's CRUSH rule looks  
like this:


rule rbd-data {
    id 8
    type replicated
    min_size 2
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

ceph pg dump_stuck inactive gives me this:

PG_STAT  STATE   UP  UP_PRIMARY  
ACTING  ACTING_PRIMARY
115.3    undersized+degraded+peered   [194,267] 194  
[194,267] 194
115.13   undersized+degraded+peered  [151,1122] 151  
[151,1122] 151
116.12   undersized+degraded+peered   [288,726] 288  
[288,726] 288


and when I query one of the inactive PGs, I see (among other things):

    "up": [
    288,
    726
    ],
    "acting": [
    288,
    726
    ],
    "acting_recovery_backfill": [
    "288",
    "726"
    ],

    "recovery_state": [
    {
    "name": "Started/Primary/Active",
    "enter_time": "2021-03-10T16:23:09.301174+0100",
    "might_have_unfound": [],
    "recovery_progress": {
    "backfill_targets": [],
    "waiting_on_backfill": [],
    "last_backfill_started": "MIN",
    "backfill_info": {
    "begin": "MIN",
    "end": "MIN",
    "objects": []
    },
    "peer_backfill_info": [],
    "backfills_in_flight": [],
    "recovering": [],
    "pg_backend": {
    "pull_from_peer": [],
    "pushing": []
    }
    }
    },
    {
    "name": "Started",
    "enter_time": "2021-03-10T16:23:08.297622+0100"
    }
    ],

So you can see that two out of three OSDs on other hosts are indeed  
up and active and the . I also see the ceph-osd daemons running on  
those hosts, so the data is definitely there and the PG should be  
available. Do you have any idea why these PGs may be becoming  
inactive nonetheless? I am suspecting some kind of concurrency  
limit, but I wouldn't know which one that could be.


Thanks
Janek
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inactive when host is down despite CRUSH failure domain being host

2021-03-10 Thread Janek Bevendorff
No, he pool is size 3. But you put me on the right track. The pool had 
an explicit min_size set that was equal to the size. No idea why I 
didn't check that in the first place. Reducing it to 2 seems to solve 
the problem. How embarrassing, thanks! :-D


May I suggest giving this a better error description perhaps? 
undersized+degraded+peered is kind of a meaningless PG state and ceph pg 
query could have mentioned at least something.


Janek


On 10/03/2021 17:11, Eugen Block wrote:

Hi,

I only took a quick look, but is that pool configured with size 2? The 
crush_rule says min_size 2 which would explain what you're describing.




Zitat von Janek Bevendorff :


Hi,

I am having a weird phenomenon, which I am having trouble to debug. 
We have 16 OSDs per host, so when I reboot one node, 16 OSDs will be 
missing for a short time. Since our minimum CRUSH failure domain is 
host, this should not cause any problems. Unfortunately, I always 
have handful (1-5) PGs that become inactive nonetheless and are stuck 
in the state undersized+degraded+peered until the host and its OSDs 
are back up. The other 2000+ PGs that are also on these OSDs do not 
have this problem. In total, we have between 110 and 150 PGs per OSD 
with a configured maximum of 250, which should give us enough headspace.


The affected pools always seem to be RBD pools or at least I haven't 
seen it on our much larger RGW pool yet. The pool's CRUSH rule looks 
like this:


rule rbd-data {
    id 8
    type replicated
    min_size 2
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

ceph pg dump_stuck inactive gives me this:

PG_STAT  STATE   UP  UP_PRIMARY 
ACTING  ACTING_PRIMARY
115.3    undersized+degraded+peered   [194,267] 194 
[194,267] 194
115.13   undersized+degraded+peered  [151,1122] 151 
[151,1122] 151
116.12   undersized+degraded+peered   [288,726] 288 
[288,726] 288


and when I query one of the inactive PGs, I see (among other things):

    "up": [
    288,
    726
    ],
    "acting": [
    288,
    726
    ],
    "acting_recovery_backfill": [
    "288",
    "726"
    ],

    "recovery_state": [
    {
    "name": "Started/Primary/Active",
    "enter_time": "2021-03-10T16:23:09.301174+0100",
    "might_have_unfound": [],
    "recovery_progress": {
    "backfill_targets": [],
    "waiting_on_backfill": [],
    "last_backfill_started": "MIN",
    "backfill_info": {
    "begin": "MIN",
    "end": "MIN",
    "objects": []
    },
    "peer_backfill_info": [],
    "backfills_in_flight": [],
    "recovering": [],
    "pg_backend": {
    "pull_from_peer": [],
    "pushing": []
    }
    }
    },
    {
    "name": "Started",
    "enter_time": "2021-03-10T16:23:08.297622+0100"
    }
    ],

So you can see that two out of three OSDs on other hosts are indeed 
up and active and the . I also see the ceph-osd daemons running on 
those hosts, so the data is definitely there and the PG should be 
available. Do you have any idea why these PGs may be becoming 
inactive nonetheless? I am suspecting some kind of concurrency limit, 
but I wouldn't know which one that could be.


Thanks
Janek
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io