Sean;

These lines look bad:
14 scrub errors
Reduced data availability: 2 pgs inactive
Possible data damage: 8 pgs inconsistent
osd.95 (root=default,host=hqosd8) is down

I suspect you ran into a hardware issue with one more drives in some of the 
servers that did not go offline.

osd.95 is offline, you need to resolve this.

You should fix your tunables, when you can (probably not part of your current 
issues).

Thank you,

Dominic L. Hilsbos, MBA 
Vice President – Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com

-----Original Message-----
From: Shain Miley [mailto:smi...@npr.org] 
Sent: Friday, July 23, 2021 10:48 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Luminous won't fully recover

We recently had a few Ceph nodes go offline which required a reboot.  I have 
been able to get the cluster back to the state listed below however it does not 
seem like it will progress past the point of 23473/287823588 objects misplaced.



Yesterday it was about 13% of the data that was misplaced…however this morning 
it has goteen to 0.008% but has not moved past this point in about an hour.



Does anyone see anything in the output below that points to the problem and/or 
are there any suggestions that I can follow in order to figure out why the 
cluster health is not moving beyond this point?





---------------------------------------------------

root@rbd1:~# ceph -s

cluster:

    id:     504b5794-34bd-44e7-a8c3-0494cf800c23

    health: HEALTH_ERR

            crush map has legacy tunables (require argonaut, min is firefly)

            23473/287823588 objects misplaced (0.008%)

            14 scrub errors

            Reduced data availability: 2 pgs inactive

            Possible data damage: 8 pgs inconsistent



  services:

    mon: 3 daemons, quorum hqceph1,hqceph2,hqceph3

    mgr: hqceph2(active), standbys: hqceph3

    osd: 288 osds: 270 up, 270 in; 2 remapped pgs

    rgw: 1 daemon active



  data:

    pools:   17 pools, 9411 pgs

    objects: 95.95M objects, 309TiB

    usage:   936TiB used, 627TiB / 1.53PiB avail

    pgs:     0.021% pgs not active

             23473/287823588 objects misplaced (0.008%)

             9369 active+clean

             30   active+clean+scrubbing+deep

             8    active+clean+inconsistent

             2    activating+remapped

             2    active+clean+scrubbing



  io:

    client:   1000B/s rd, 0B/s wr, 0op/s rd, 0op/s wr



root@rbd1:~# ceph health detail

HEALTH_ERR crush map has legacy tunables (require argonaut, min is firefly); 1 
osds down; 23473/287823588 objects misplaced (0.008%); 14 scrub errors; Reduced 
data availability: 3 pgs inactive, 13 pgs peering; Possible data damage: 8 pgs 
inconsistent; Degraded data redundancy: 408658/287823588 objects degraded 
(0.142%), 38 pgs degraded

OLD_CRUSH_TUNABLES crush map has legacy tunables (require argonaut, min is 
firefly)

    see http://docs.ceph.com/docs/master/rados/operations/crush-map/#tunables

OSD_DOWN 1 osds down

    osd.95 (root=default,host=hqosd8) is down

OBJECT_MISPLACED 23473/287823588 objects misplaced (0.008%)

OSD_SCRUB_ERRORS 14 scrub errors

PG_AVAILABILITY Reduced data availability: 3 pgs inactive, 13 pgs peering

    pg 3.b41 is stuck peering for 106.682058, current state peering, last 
acting [204,190]

    pg 3.c33 is stuck peering for 103.403643, current state peering, last 
acting [228,274]

    pg 3.d15 is stuck peering for 128.537454, current state peering, last 
acting [286,24]

    pg 3.fa9 is stuck peering for 106.526146, current state peering, last 
acting [286,47]

    pg 3.fb7 is stuck peering for 105.878878, current state peering, last 
acting [62,97]

    pg 3.13a2 is stuck peering for 106.491138, current state peering, last 
acting [270,219]

    pg 3.1521 is stuck inactive for 170180.165265, current state 
activating+remapped, last acting [94,186,188]

    pg 3.1565 is stuck peering for 106.782784, current state peering, last 
acting [121,60]

    pg 3.157c is stuck peering for 128.557448, current state peering, last 
acting [128,268]

    pg 3.1744 is stuck peering for 106.639603, current state peering, last 
acting [192,142]

    pg 3.1ac8 is stuck peering for 127.839550, current state peering, last 
acting [221,190]

    pg 3.1e24 is stuck peering for 128.201670, current state peering, last 
acting [118,158]

    pg 3.1e46 is stuck inactive for 169121.764376, current state 
activating+remapped, last acting [87,199,170]

    pg 18.36 is stuck peering for 128.554121, current state peering, last 
acting [204]

    pg 21.1ce is stuck peering for 106.582584, current state peering, last 
acting [266,192]

PG_DAMAGED Possible data damage: 8 pgs inconsistent

    pg 3.1ca is active+clean+inconsistent, acting [201,8,180]

    pg 3.56a is active+clean+inconsistent, acting [148,240,8]

    pg 3.b0f is active+clean+inconsistent, acting [148,260,8]

    pg 3.b56 is active+clean+inconsistent, acting [218,8,240]

    pg 3.10ff is active+clean+inconsistent, acting [262,8,211]

    pg 3.1192 is active+clean+inconsistent, acting [192,8,187]

    pg 3.124a is active+clean+inconsistent, acting [123,8,222]

    pg 3.1c55 is active+clean+inconsistent, acting [180,8,287]

PG_DEGRADED Degraded data redundancy: 408658/287823588 objects degraded 
(0.142%), 38 pgs degraded

    pg 3.8f is active+undersized+degraded, acting [163,149]

    pg 3.ba is active+undersized+degraded, acting [68,280]

    pg 3.1aa is active+undersized+degraded, acting [176,211]

    pg 3.29e is active+undersized+degraded, acting [241,194]

    pg 3.323 is active+undersized+degraded, acting [78,194]

    pg 3.343 is active+undersized+degraded, acting [242,144]

    pg 3.4ae is active+undersized+degraded, acting [153,237]

    pg 3.524 is active+undersized+degraded, acting [252,222]

    pg 3.5c9 is active+undersized+degraded, acting [272,252]

    pg 3.713 is active+undersized+degraded, acting [273,80]

    pg 3.730 is active+undersized+degraded, acting [235,212]

    pg 3.88f is active+undersized+degraded, acting [222,285]

    pg 3.8cb is active+undersized+degraded, acting [285,20]

    pg 3.9a0 is active+undersized+degraded, acting [240,200]

    pg 3.c19 is active+undersized+degraded, acting [165,276]

    pg 3.ec8 is active+undersized+degraded, acting [158,40]

    pg 3.1025 is active+undersized+degraded, acting [258,274]

    pg 3.1058 is active+undersized+degraded, acting [38,68]

    pg 3.14e4 is active+undersized+degraded, acting [185,39]

    pg 3.150c is active+undersized+degraded, acting [138,140]

    pg 3.1545 is active+undersized+degraded, acting [222,55]

    pg 3.15a6 is active+undersized+degraded, acting [242,272]

    pg 3.1620 is active+undersized+degraded, acting [200,164]

    pg 3.1710 is active+undersized+degraded, acting [176,285]

    pg 3.1792 is active+undersized+degraded, acting [190,11]

    pg 3.17bd is active+undersized+degraded, acting [207,15]

    pg 3.17da is active+undersized+degraded, acting [5,160]

    pg 3.183e is active+undersized+degraded, acting [273,136]

    pg 3.197d is active+undersized+degraded, acting [241,139]

    pg 3.1a3d is active+undersized+degraded, acting [184,121]

    pg 3.1ba6 is active+undersized+degraded, acting [47,249]

    pg 3.1c2b is active+undersized+degraded, acting [268,80]

    pg 3.1ca2 is active+undersized+degraded, acting [280,152]

    pg 3.1cd4 is active+undersized+degraded, acting [2,129]

    pg 3.1e13 is active+undersized+degraded, acting [247,114]

    pg 12.56 is active+undersized+degraded, acting [54]

    pg 18.8 is undersized+degraded+peered, acting [260]

    pg 21.9f is active+undersized+degraded, acting [215,201]
--------------------------------------------------------------------------------------------------


Thanks,
Shain

Shain Miley | Director of Platform and Infrastructure | Digital Media | 
smi...@npr.org
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to