Re: [ceph-users] Major ceph disaster

Kevin Flöh Wed, 22 May 2019 04:31:51 -0700

Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we haveanother problem, there are 7 PGs inconsistent and a cpeh pg repair isnot doing anything. I just get "instructing pg 1.5dd on osd.24 torepair" and nothing happens. Does somebody know how we can get the PGsto repair?


Regards,

Kevin

On 21.05.19 4:52 nachm., Wido den Hollander wrote:


On 5/21/19 4:48 PM, Kevin Flöh wrote:

Hi,

we gave up on the incomplete pgs since we do not have enough complete
shards to restore them. What is the procedure to get rid of these pgs?

You need to start with marking the OSDs as 'lost' and then you can
force_create_pg to get the PGs back (empty).

Wido

regards,

Kevin

On 20.05.19 9:22 vorm., Kevin Flöh wrote:

Hi Frederic,

we do not have access to the original OSDs. We exported the remaining
shards of the two pgs but we are only left with two shards (of
reasonable size) per pg. The rest of the shards displayed by ceph pg
query are empty. I guess marking the OSD as complete doesn't make
sense then.

Best,
Kevin

On 17.05.19 2:36 nachm., Frédéric Nass wrote:


Le 14/05/2019 à 10:04, Kevin Flöh a écrit :

On 13.05.19 11:21 nachm., Dan van der Ster wrote:

Presumably the 2 OSDs you marked as lost were hosting those
incomplete PGs?
It would be useful to double confirm that: check with `ceph pg <id>
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)

yes, but as written in my other mail, we still have enough shards,
at least I think so.

If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

guess that is not possible.

Hi Kevin,

You want to make sure of this.

Unless you recreated the OSDs 4 and 23 and had new data written on
them, they should still host the data you need.
What Dan suggested (export the 7 inconsistent PGs and import them on
a healthy OSD) seems to be the only way to recover your lost data, as
with 4 hosts and 2 OSDs lost, you're left with 2 chunks of
data/parity when you actually need 3 to access it. Reducing min_size
to 3 will not help.

Have a look here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html


This is probably the best way you want to follow form now on.

Regards,
Frédéric.

If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan

would this let us recover at least some of the data on the pgs? If
not we would just set up a new ceph directly without fixing the old
one and copy whatever is left.

Best regards,

Kevin

On Mon, May 13, 2019 at 4:20 PM Kevin Flöh <kevin.fl...@kit.edu>
wrote:

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster,
let me
first show you the current ceph status:

     cluster:
       id:     23e72372-0d44-4cad-b24f-3641b14b86f4
       health: HEALTH_ERR
               1 MDSs report slow metadata IOs
               1 MDSs report slow requests
               1 MDSs behind on trimming
               1/126319678 objects unfound (0.000%)
               19 scrub errors
               Reduced data availability: 2 pgs inactive, 2 pgs
incomplete
               Possible data damage: 7 pgs inconsistent
               Degraded data redundancy: 1/500333881 objects degraded
(0.000%), 1 pg degraded
               118 stuck requests are blocked > 4096 sec.
Implicated osds
24,32,91

     services:
       mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
       mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
       mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3
up:standby
       osd: 96 osds: 96 up, 96 in

     data:
       pools:   2 pools, 4096 pgs
       objects: 126.32M objects, 260TiB
       usage:   372TiB used, 152TiB / 524TiB avail
       pgs:     0.049% pgs not active
                1/500333881 objects degraded (0.000%)
                1/126319678 objects unfound (0.000%)
                4076 active+clean
                10   active+clean+scrubbing+deep
                7    active+clean+inconsistent
                2    incomplete
                1    active+recovery_wait+degraded

     io:
       client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow
requests;
1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
incomplete; Possible data damage: 7 pgs inconsistent; Degraded data
redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118
stuck requests are blocked > 4096 sec. Implicated osds 24,32,91
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
       mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are
blocked > 30 secs, oldest blocked for 351193 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
       mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are
blocked > 30 sec
MDS_TRIM 1 MDSs behind on trimming
       mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming
(46034/128)
max_segments: 128, num_segments: 46034
OBJECT_UNFOUND 1/126319687 objects unfound (0.000%)
       pg 1.24c has 1 unfound objects
OSD_SCRUB_ERRORS 19 scrub errors
PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs
incomplete
       pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31
min_size from 3 may help; search ceph.com/docs for 'incomplete')
       pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31
min_size from 3 may help; search ceph.com/docs for 'incomplete')
PG_DAMAGED Possible data damage: 7 pgs inconsistent
       pg 1.17f is active+clean+inconsistent, acting [65,49,25,4]
       pg 1.1e0 is active+clean+inconsistent, acting [11,32,4,81]
       pg 1.203 is active+clean+inconsistent, acting [43,49,4,72]
       pg 1.5d3 is active+clean+inconsistent, acting [37,27,85,4]
       pg 1.779 is active+clean+inconsistent, acting [50,4,77,62]
       pg 1.77c is active+clean+inconsistent, acting [21,49,40,4]
       pg 1.7c3 is active+clean+inconsistent, acting [1,14,68,4]
PG_DEGRADED Degraded data redundancy: 1/500333908 objects degraded
(0.000%), 1 pg degraded
       pg 1.24c is active+recovery_wait+degraded, acting
[32,4,61,36], 1
unfound
REQUEST_STUCK 118 stuck requests are blocked > 4096 sec.
Implicated osds
24,32,91
       118 ops are blocked > 536871 sec
       osds 24,32,91 have stuck requests > 536871 sec


Let me briefly summarize the setup: We have 4 nodes with 24 osds each
and use 3+1 erasure coding. The nodes run on centos7 and we use,
due to
a major mistake when setting up the cluster, more than one ceph
version
on the nodes, 3 nodes run on 12.2.12 and one runs on 13.2.5. We are
currently not daring to update all nodes to 13.2.5. For all the
version
details see:

{
       "mon": {
           "ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 3
       },
       "mgr": {
           "ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 2
       },
       "osd": {
           "ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 72,
           "ceph version 13.2.5
(cbff874f9007f1869bfd3821b7e33b2a6ffd4988)
mimic (stable)": 24
       },
       "mds": {
           "ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 4
       },
       "overall": {
           "ceph version 12.2.12
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 81,
           "ceph version 13.2.5
(cbff874f9007f1869bfd3821b7e33b2a6ffd4988)
mimic (stable)": 24
       }
}

Here is what happened: One osd daemon could not be started and
therefore
we decided to mark the osd as lost and set it up from scratch. Ceph
started recovering and then we lost another osd with the same
behavior.
We did the same as for the first osd. And now we are stuck with 2
pgs in
incomplete. Ceph pg query gives the following problem:

               "down_osds_we_would_probe": [],
               "peering_blocked_by": [],
               "peering_blocked_by_detail": [
                   {
                       "detail":
"peering_blocked_by_history_les_bound"
                   }

We already tried to set "osd_find_best_info_ignore_history_les":
"true"
for the affected osds, which had no effect. Furthermore, the
cluster is
behind on trimming by more than 40,000 segments and we have
folders and
files which cannot be deleted or moved. (which are not on the 2
incomplete pgs). Is there any way to solve these problems?

Best regards,

Kevin

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Major ceph disaster

Reply via email to