[ceph-users] Major ceph disaster

2019-05-13 Thread Kevin Flöh

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster, let me 
first show you the current ceph status:


  cluster:
    id: 23e72372-0d44-4cad-b24f-3641b14b86f4
    health: HEALTH_ERR
    1 MDSs report slow metadata IOs
    1 MDSs report slow requests
    1 MDSs behind on trimming
    1/126319678 objects unfound (0.000%)
    19 scrub errors
    Reduced data availability: 2 pgs inactive, 2 pgs incomplete
    Possible data damage: 7 pgs inconsistent
    Degraded data redundancy: 1/500333881 objects degraded 
(0.000%), 1 pg degraded
    118 stuck requests are blocked > 4096 sec. Implicated osds 
24,32,91


  services:
    mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
    mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
    mds: cephfs-1/1/1 up  {0=ceph-node02.etp.kit.edu=up:active}, 3 
up:standby

    osd: 96 osds: 96 up, 96 in

  data:
    pools:   2 pools, 4096 pgs
    objects: 126.32M objects, 260TiB
    usage:   372TiB used, 152TiB / 524TiB avail
    pgs: 0.049% pgs not active
 1/500333881 objects degraded (0.000%)
 1/126319678 objects unfound (0.000%)
 4076 active+clean
 10   active+clean+scrubbing+deep
 7    active+clean+inconsistent
 2    incomplete
 1    active+recovery_wait+degraded

  io:
    client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests; 
1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19 
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs 
incomplete; Possible data damage: 7 pgs inconsistent; Degraded data 
redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118 
stuck requests are blocked > 4096 sec. Implicated osds 24,32,91

MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
    mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are 
blocked > 30 secs, oldest blocked for 351193 secs

MDS_SLOW_REQUEST 1 MDSs report slow requests
    mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are blocked > 30 sec
MDS_TRIM 1 MDSs behind on trimming
    mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming (46034/128) 
max_segments: 128, num_segments: 46034

OBJECT_UNFOUND 1/126319687 objects unfound (0.000%)
    pg 1.24c has 1 unfound objects
OSD_SCRUB_ERRORS 19 scrub errors
PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs incomplete
    pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31 
min_size from 3 may help; search ceph.com/docs for 'incomplete')
    pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31 
min_size from 3 may help; search ceph.com/docs for 'incomplete')

PG_DAMAGED Possible data damage: 7 pgs inconsistent
    pg 1.17f is active+clean+inconsistent, acting [65,49,25,4]
    pg 1.1e0 is active+clean+inconsistent, acting [11,32,4,81]
    pg 1.203 is active+clean+inconsistent, acting [43,49,4,72]
    pg 1.5d3 is active+clean+inconsistent, acting [37,27,85,4]
    pg 1.779 is active+clean+inconsistent, acting [50,4,77,62]
    pg 1.77c is active+clean+inconsistent, acting [21,49,40,4]
    pg 1.7c3 is active+clean+inconsistent, acting [1,14,68,4]
PG_DEGRADED Degraded data redundancy: 1/500333908 objects degraded 
(0.000%), 1 pg degraded
    pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1 
unfound
REQUEST_STUCK 118 stuck requests are blocked > 4096 sec. Implicated osds 
24,32,91

    118 ops are blocked > 536871 sec
    osds 24,32,91 have stuck requests > 536871 sec


Let me briefly summarize the setup: We have 4 nodes with 24 osds each 
and use 3+1 erasure coding. The nodes run on centos7 and we use, due to 
a major mistake when setting up the cluster, more than one ceph version 
on the nodes, 3 nodes run on 12.2.12 and one runs on 13.2.5. We are 
currently not daring to update all nodes to 13.2.5. For all the version 
details see:


{
    "mon": {
    "ceph version 12.2.12 
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 3

    },
    "mgr": {
    "ceph version 12.2.12 
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 2

    },
    "osd": {
    "ceph version 12.2.12 
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 72,
    "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) 
mimic (stable)": 24

    },
    "mds": {
    "ceph version 12.2.12 
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 4

    },
    "overall": {
    "ceph version 12.2.12 
(1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)": 81,
    "ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) 
mimic (stable)": 24

    }
}

Here is what happened: One osd daemon could not be started and therefore 
we decided to mark the osd as lost and set it up from scratch. Ceph 
started recovering and t

Re: [ceph-users] Major ceph disaster

2019-05-13 Thread Lionel Bouton
Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
> Dear ceph experts,
>
> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> Here is what happened: One osd daemon could not be started and
> therefore we decided to mark the osd as lost and set it up from
> scratch. Ceph started recovering and then we lost another osd with the
> same behavior. We did the same as for the first osd.

With 3+1 you only allow a single OSD failure per pg at a given time. You
have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
separate servers (assuming standard crush rules) is a death sentence for
the data on some pgs using both of those OSD (the ones not fully
recovered before the second failure).

Depending on the data stored (CephFS ?) you probably can recover most of
it but some of it is irremediably lost.

If you can recover the data from the failed OSD at the time they failed
you might be able to recover some of your lost data (with the help of
Ceph devs), if not there's nothing to do.

In the later case I'd add a new server to use at least 3+2 for a fresh
pool instead of 3+1 and begin moving the data to it.

The 12.2 + 13.2 mix is a potential problem in addition to the one above
but it's a different one.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-13 Thread Dan van der Ster
Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs?
It would be useful to double confirm that: check with `ceph pg 
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)

If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan


On Mon, May 13, 2019 at 4:20 PM Kevin Flöh  wrote:
>
> Dear ceph experts,
>
> we have several (maybe related) problems with our ceph cluster, let me
> first show you the current ceph status:
>
>cluster:
>  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
>  health: HEALTH_ERR
>  1 MDSs report slow metadata IOs
>  1 MDSs report slow requests
>  1 MDSs behind on trimming
>  1/126319678 objects unfound (0.000%)
>  19 scrub errors
>  Reduced data availability: 2 pgs inactive, 2 pgs incomplete
>  Possible data damage: 7 pgs inconsistent
>  Degraded data redundancy: 1/500333881 objects degraded
> (0.000%), 1 pg degraded
>  118 stuck requests are blocked > 4096 sec. Implicated osds
> 24,32,91
>
>services:
>  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>  mds: cephfs-1/1/1 up  {0=ceph-node02.etp.kit.edu=up:active}, 3
> up:standby
>  osd: 96 osds: 96 up, 96 in
>
>data:
>  pools:   2 pools, 4096 pgs
>  objects: 126.32M objects, 260TiB
>  usage:   372TiB used, 152TiB / 524TiB avail
>  pgs: 0.049% pgs not active
>   1/500333881 objects degraded (0.000%)
>   1/126319678 objects unfound (0.000%)
>   4076 active+clean
>   10   active+clean+scrubbing+deep
>   7active+clean+inconsistent
>   2incomplete
>   1active+recovery_wait+degraded
>
>io:
>  client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr
>
>
> and ceph health detail:
>
>
> HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests;
> 1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
> scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
> incomplete; Possible data damage: 7 pgs inconsistent; Degraded data
> redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118
> stuck requests are blocked > 4096 sec. Implicated osds 24,32,91
> MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
>  mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are
> blocked > 30 secs, oldest blocked for 351193 secs
> MDS_SLOW_REQUEST 1 MDSs report slow requests
>  mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are blocked > 30 sec
> MDS_TRIM 1 MDSs behind on trimming
>  mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming (46034/128)
> max_segments: 128, num_segments: 46034
> OBJECT_UNFOUND 1/126319687 objects unfound (0.000%)
>  pg 1.24c has 1 unfound objects
> OSD_SCRUB_ERRORS 19 scrub errors
> PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs incomplete
>  pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31
> min_size from 3 may help; search ceph.com/docs for 'incomplete')
>  pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31
> min_size from 3 may help; search ceph.com/docs for 'incomplete')
> PG_DAMAGED Possible data damage: 7 pgs inconsistent
>  pg 1.17f is active+clean+inconsistent, acting [65,49,25,4]
>  pg 1.1e0 is active+clean+inconsistent, acting [11,32,4,81]
>  pg 1.203 is active+clean+inconsistent, acting [43,49,4,72]
>  pg 1.5d3 is active+clean+inconsistent, acting [37,27,85,4]
>  pg 1.779 is active+clean+inconsistent, acting [50,4,77,62]
>  pg 1.77c is active+clean+inconsistent, acting [21,49,40,4]
>  pg 1.7c3 is active+clean+inconsistent, acting [1,14,68,4]
> PG_DEGRADED Degraded data redundancy: 1/500333908 objects degraded
> (0.000%), 1 pg degraded
>  pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1
> unfound
> REQUEST_STUCK 118 stuck requests are blocked > 4096 sec. Implicated osds
> 24,32,91
>  118 ops are blocked > 536871 sec
>  osds 24,32,91 have stuck requests > 536871 sec
>
>
> Let me briefly summarize the setup: We have 4 nodes with 24 osds each
> and use 3+1 erasure coding. The nodes run on centos7 and we use, due to
> a major mistake when setting up the cluster, more than one cep

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh

On 13.05.19 10:51 nachm., Lionel Bouton wrote:

Le 13/05/2019 à 16:20, Kevin Flöh a écrit :

Dear ceph experts,

[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here is what happened: One osd daemon could not be started and 
therefore we decided to mark the osd as lost and set it up from 
scratch. Ceph started recovering and then we lost another osd with 
the same behavior. We did the same as for the first osd.


With 3+1 you only allow a single OSD failure per pg at a given time. 
You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2 
separate servers (assuming standard crush rules) is a death sentence 
for the data on some pgs using both of those OSD (the ones not fully 
recovered before the second failure).


OK, so the 2 OSDs (4,23) failed shortly one after the other but we think 
that the recovery of the first was finished before the second failed. 
Nonetheless, both problematic pgs have been on both OSDs. We think, that 
we still have enough shards left. For one of the pgs, the recovery state 
looks like this:


    "recovery_state": [
    {
    "name": "Started/Primary/Peering/Incomplete",
    "enter_time": "2019-05-09 16:11:48.625966",
    "comment": "not enough complete instances of this PG"
    },
    {
    "name": "Started/Primary/Peering",
    "enter_time": "2019-05-09 16:11:48.611171",
    "past_intervals": [
    {
    "first": "49767",
    "last": "59313",
    "all_participants": [
    {
    "osd": 2,
    "shard": 0
    },
    {
    "osd": 4,
    "shard": 1
    },
    {
    "osd": 23,
    "shard": 2
    },
    {
    "osd": 24,
    "shard": 0
    },
    {
    "osd": 72,
    "shard": 1
    },
    {
    "osd": 79,
    "shard": 3
    }
    ],
    "intervals": [
    {
    "first": "58860",
    "last": "58861",
    "acting": "4(1),24(0),79(3)"
    },
    {
    "first": "58875",
    "last": "58877",
    "acting": "4(1),23(2),24(0)"
    },
    {
    "first": "59002",
    "last": "59009",
    "acting": "4(1),23(2),79(3)"
    },
    {
    "first": "59010",
    "last": "59012",
    "acting": "2(0),4(1),23(2),79(3)"
    },
    {
    "first": "59197",
    "last": "59233",
    "acting": "23(2),24(0),79(3)"
    },
    {
    "first": "59234",
    "last": "59313",
    "acting": "23(2),24(0),72(1),79(3)"
    }
    ]
    }
    ],
    "probing_osds": [
    "2(0)",
    "4(1)",
    "23(2)",
    "24(0)",
    "72(1)",
    "79(3)"
    ],
    "down_osds_we_would_probe": [],
    "peering_blocked_by": [],
    "peering_blocked_by_detail": [
    {
    "detail": "peering_blocked_by_history_les_bound"
    }
    ]
    },
    {
    "name": "Started",
    "enter_time": "2019-05-09 16:11:48.611121"
    }
    ],
Is there a chance to recover this pg from the shards on OSDs 2, 72, 79? 
ceph pg repair/deep-scrub/scrub did not work.


We are also worried about the behind on trimming of the mds or is this 
not too problematic?



MDS_TRIM 1 MDSs behind on trimming
    mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming (46178/128) 
max_segments: 128, num_segments: 46178



Depending on the data stored (CephFS ?) you probably can recover most 
of it but some of it is irremediably lost.


If you can recover the data from the failed OSD at the time they 
failed you might be able to recover some of your lost data (with the 
help of Ceph devs), if not there's nothing to do.


In the later case I'd add a new server to use at least 3+2 for a fresh 
p

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh


On 13.05.19 11:21 nachm., Dan van der Ster wrote:

Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs?
It would be useful to double confirm that: check with `ceph pg 
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)


yes, but as written in my other mail, we still have enough shards, at 
least I think so.




If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

guess that is not possible.


If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan


would this let us recover at least some of the data on the pgs? If not 
we would just set up a new ceph directly without fixing the old one and 
copy whatever is left.


Best regards,

Kevin





On Mon, May 13, 2019 at 4:20 PM Kevin Flöh  wrote:

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster, let me
first show you the current ceph status:

cluster:
  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
  health: HEALTH_ERR
  1 MDSs report slow metadata IOs
  1 MDSs report slow requests
  1 MDSs behind on trimming
  1/126319678 objects unfound (0.000%)
  19 scrub errors
  Reduced data availability: 2 pgs inactive, 2 pgs incomplete
  Possible data damage: 7 pgs inconsistent
  Degraded data redundancy: 1/500333881 objects degraded
(0.000%), 1 pg degraded
  118 stuck requests are blocked > 4096 sec. Implicated osds
24,32,91

services:
  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
  mds: cephfs-1/1/1 up  {0=ceph-node02.etp.kit.edu=up:active}, 3
up:standby
  osd: 96 osds: 96 up, 96 in

data:
  pools:   2 pools, 4096 pgs
  objects: 126.32M objects, 260TiB
  usage:   372TiB used, 152TiB / 524TiB avail
  pgs: 0.049% pgs not active
   1/500333881 objects degraded (0.000%)
   1/126319678 objects unfound (0.000%)
   4076 active+clean
   10   active+clean+scrubbing+deep
   7active+clean+inconsistent
   2incomplete
   1active+recovery_wait+degraded

io:
  client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow requests;
1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
incomplete; Possible data damage: 7 pgs inconsistent; Degraded data
redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118
stuck requests are blocked > 4096 sec. Implicated osds 24,32,91
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
  mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are
blocked > 30 secs, oldest blocked for 351193 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
  mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are blocked > 30 sec
MDS_TRIM 1 MDSs behind on trimming
  mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming (46034/128)
max_segments: 128, num_segments: 46034
OBJECT_UNFOUND 1/126319687 objects unfound (0.000%)
  pg 1.24c has 1 unfound objects
OSD_SCRUB_ERRORS 19 scrub errors
PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs incomplete
  pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31
min_size from 3 may help; search ceph.com/docs for 'incomplete')
  pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31
min_size from 3 may help; search ceph.com/docs for 'incomplete')
PG_DAMAGED Possible data damage: 7 pgs inconsistent
  pg 1.17f is active+clean+inconsistent, acting [65,49,25,4]
  pg 1.1e0 is active+clean+inconsistent, acting [11,32,4,81]
  pg 1.203 is active+clean+inconsistent, acting [43,49,4,72]
  pg 1.5d3 is active+clean+inconsistent, acting [37,27,85,4]
  pg 1.779 is active+clean+inconsistent, acting [50,4,77,62]
  pg 1.77c is active+clean+inconsistent, acting [21,49,40,4]
  pg 1.7c3 is active+clean+inconsistent, acting [1,14,68,4]
PG_DEGRADED Degraded data redundancy: 1/500333908 objects degraded
(0.000%), 1 pg degraded
  pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1
unfound
REQUEST_STUCK 118 stuck requests are blocked > 4096 sec. Implicated osds
24,32,91
  118 ops are blocked > 536871 sec
  osd

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  wrote:
>
> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
> > Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
> >> Dear ceph experts,
> >>
> >> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> >> Here is what happened: One osd daemon could not be started and
> >> therefore we decided to mark the osd as lost and set it up from
> >> scratch. Ceph started recovering and then we lost another osd with
> >> the same behavior. We did the same as for the first osd.
> >
> > With 3+1 you only allow a single OSD failure per pg at a given time.
> > You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
> > separate servers (assuming standard crush rules) is a death sentence
> > for the data on some pgs using both of those OSD (the ones not fully
> > recovered before the second failure).
>
> OK, so the 2 OSDs (4,23) failed shortly one after the other but we think
> that the recovery of the first was finished before the second failed.
> Nonetheless, both problematic pgs have been on both OSDs. We think, that
> we still have enough shards left. For one of the pgs, the recovery state
> looks like this:
>
>  "recovery_state": [
>  {
>  "name": "Started/Primary/Peering/Incomplete",
>  "enter_time": "2019-05-09 16:11:48.625966",
>  "comment": "not enough complete instances of this PG"
>  },
>  {
>  "name": "Started/Primary/Peering",
>  "enter_time": "2019-05-09 16:11:48.611171",
>  "past_intervals": [
>  {
>  "first": "49767",
>  "last": "59313",
>  "all_participants": [
>  {
>  "osd": 2,
>  "shard": 0
>  },
>  {
>  "osd": 4,
>  "shard": 1
>  },
>  {
>  "osd": 23,
>  "shard": 2
>  },
>  {
>  "osd": 24,
>  "shard": 0
>  },
>  {
>  "osd": 72,
>  "shard": 1
>  },
>  {
>  "osd": 79,
>  "shard": 3
>  }
>  ],
>  "intervals": [
>  {
>  "first": "58860",
>  "last": "58861",
>  "acting": "4(1),24(0),79(3)"
>  },
>  {
>  "first": "58875",
>  "last": "58877",
>  "acting": "4(1),23(2),24(0)"
>  },
>  {
>  "first": "59002",
>  "last": "59009",
>  "acting": "4(1),23(2),79(3)"
>  },
>  {
>  "first": "59010",
>  "last": "59012",
>  "acting": "2(0),4(1),23(2),79(3)"
>  },
>  {
>  "first": "59197",
>  "last": "59233",
>  "acting": "23(2),24(0),79(3)"
>  },
>  {
>  "first": "59234",
>  "last": "59313",
>  "acting": "23(2),24(0),72(1),79(3)"
>  }
>  ]
>  }
>  ],
>  "probing_osds": [
>  "2(0)",
>  "4(1)",
>  "23(2)",
>  "24(0)",
>  "72(1)",
>  "79(3)"
>  ],
>  "down_osds_we_would_probe": [],
>  "peering_blocked_by": [],
>  "peering_blocked_by_detail": [
>  {
>  "detail": "peering_blocked_by_history_les_bound"
>  }
>  ]
>  },
>  {
>  "name": "Started",
>  "enter_time": "2019-05-09 16:11:48.611121"
>  }
>  ],
> Is there a chance to recover this pg from the shards on OSDs 2, 72, 79?
> ceph pg repair/deep-scrub/scrub did not work.

repair/scrub are not related to this problem so they won't help.

How exactly did you use the osd_find_best_info_ignore_history_les option?

One correct procedure would be to set it to true in ceph.conf, then
restart each of 

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh


On 14.05.19 10:08 vorm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  wrote:

On 13.05.19 10:51 nachm., Lionel Bouton wrote:

Le 13/05/2019 à 16:20, Kevin Flöh a écrit :

Dear ceph experts,

[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here is what happened: One osd daemon could not be started and
therefore we decided to mark the osd as lost and set it up from
scratch. Ceph started recovering and then we lost another osd with
the same behavior. We did the same as for the first osd.

With 3+1 you only allow a single OSD failure per pg at a given time.
You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
separate servers (assuming standard crush rules) is a death sentence
for the data on some pgs using both of those OSD (the ones not fully
recovered before the second failure).

OK, so the 2 OSDs (4,23) failed shortly one after the other but we think
that the recovery of the first was finished before the second failed.
Nonetheless, both problematic pgs have been on both OSDs. We think, that
we still have enough shards left. For one of the pgs, the recovery state
looks like this:

  "recovery_state": [
  {
  "name": "Started/Primary/Peering/Incomplete",
  "enter_time": "2019-05-09 16:11:48.625966",
  "comment": "not enough complete instances of this PG"
  },
  {
  "name": "Started/Primary/Peering",
  "enter_time": "2019-05-09 16:11:48.611171",
  "past_intervals": [
  {
  "first": "49767",
  "last": "59313",
  "all_participants": [
  {
  "osd": 2,
  "shard": 0
  },
  {
  "osd": 4,
  "shard": 1
  },
  {
  "osd": 23,
  "shard": 2
  },
  {
  "osd": 24,
  "shard": 0
  },
  {
  "osd": 72,
  "shard": 1
  },
  {
  "osd": 79,
  "shard": 3
  }
  ],
  "intervals": [
  {
  "first": "58860",
  "last": "58861",
  "acting": "4(1),24(0),79(3)"
  },
  {
  "first": "58875",
  "last": "58877",
  "acting": "4(1),23(2),24(0)"
  },
  {
  "first": "59002",
  "last": "59009",
  "acting": "4(1),23(2),79(3)"
  },
  {
  "first": "59010",
  "last": "59012",
  "acting": "2(0),4(1),23(2),79(3)"
  },
  {
  "first": "59197",
  "last": "59233",
  "acting": "23(2),24(0),79(3)"
  },
  {
  "first": "59234",
  "last": "59313",
  "acting": "23(2),24(0),72(1),79(3)"
  }
  ]
  }
  ],
  "probing_osds": [
  "2(0)",
  "4(1)",
  "23(2)",
  "24(0)",
  "72(1)",
  "79(3)"
  ],
  "down_osds_we_would_probe": [],
  "peering_blocked_by": [],
  "peering_blocked_by_detail": [
  {
  "detail": "peering_blocked_by_history_les_bound"
  }
  ]
  },
  {
  "name": "Started",
  "enter_time": "2019-05-09 16:11:48.611121"
  }
  ],
Is there a chance to recover this pg from the shards on OSDs 2, 72, 79?
ceph pg repair/deep-scrub/scrub did not work.

repair/scrub are not related to this problem so they won't help.

How exactly did you use the osd_find_best_info_ignore_history_les option?

One correct procedure would be to set it to true in ceph.conf, then
restart each of the probing_osd's above.
(Once the PG has peered, you need to unset the option and restart
those osds again).


We execu

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 10:59 AM Kevin Flöh  wrote:
>
>
> On 14.05.19 10:08 vorm., Dan van der Ster wrote:
>
> On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  wrote:
>
> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
>
> Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
>
> Dear ceph experts,
>
> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> Here is what happened: One osd daemon could not be started and
> therefore we decided to mark the osd as lost and set it up from
> scratch. Ceph started recovering and then we lost another osd with
> the same behavior. We did the same as for the first osd.
>
> With 3+1 you only allow a single OSD failure per pg at a given time.
> You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
> separate servers (assuming standard crush rules) is a death sentence
> for the data on some pgs using both of those OSD (the ones not fully
> recovered before the second failure).
>
> OK, so the 2 OSDs (4,23) failed shortly one after the other but we think
> that the recovery of the first was finished before the second failed.
> Nonetheless, both problematic pgs have been on both OSDs. We think, that
> we still have enough shards left. For one of the pgs, the recovery state
> looks like this:
>
>  "recovery_state": [
>  {
>  "name": "Started/Primary/Peering/Incomplete",
>  "enter_time": "2019-05-09 16:11:48.625966",
>  "comment": "not enough complete instances of this PG"
>  },
>  {
>  "name": "Started/Primary/Peering",
>  "enter_time": "2019-05-09 16:11:48.611171",
>  "past_intervals": [
>  {
>  "first": "49767",
>  "last": "59313",
>  "all_participants": [
>  {
>  "osd": 2,
>  "shard": 0
>  },
>  {
>  "osd": 4,
>  "shard": 1
>  },
>  {
>  "osd": 23,
>  "shard": 2
>  },
>  {
>  "osd": 24,
>  "shard": 0
>  },
>  {
>  "osd": 72,
>  "shard": 1
>  },
>  {
>  "osd": 79,
>  "shard": 3
>  }
>  ],
>  "intervals": [
>  {
>  "first": "58860",
>  "last": "58861",
>  "acting": "4(1),24(0),79(3)"
>  },
>  {
>  "first": "58875",
>  "last": "58877",
>  "acting": "4(1),23(2),24(0)"
>  },
>  {
>  "first": "59002",
>  "last": "59009",
>  "acting": "4(1),23(2),79(3)"
>  },
>  {
>  "first": "59010",
>  "last": "59012",
>  "acting": "2(0),4(1),23(2),79(3)"
>  },
>  {
>  "first": "59197",
>  "last": "59233",
>  "acting": "23(2),24(0),79(3)"
>  },
>  {
>  "first": "59234",
>  "last": "59313",
>  "acting": "23(2),24(0),72(1),79(3)"
>  }
>  ]
>  }
>  ],
>  "probing_osds": [
>  "2(0)",
>  "4(1)",
>  "23(2)",
>  "24(0)",
>  "72(1)",
>  "79(3)"
>  ],
>  "down_osds_we_would_probe": [],
>  "peering_blocked_by": [],
>  "peering_blocked_by_detail": [
>  {
>  "detail": "peering_blocked_by_history_les_bound"
>  }
>  ]
>  },
>  {
>  "name": "Started",
>  "enter_time": "2019-05-09 16:11:48.611121"
>  }
>  ],
> Is there a chance to recover this pg from the shards on OSDs 2, 72, 79?
> ceph pg repair/deep-scrub/scrub did not work.
>
> repair/scrub are not related to this problem so they won't help.
>
> How exactly did you use the osd_find_best_info_ignore_history_les option?

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh

ok, so now we see at least a diffrence in the recovery state:

    "recovery_state": [
    {
    "name": "Started/Primary/Peering/Incomplete",
    "enter_time": "2019-05-14 14:15:15.650517",
    "comment": "not enough complete instances of this PG"
    },
    {
    "name": "Started/Primary/Peering",
    "enter_time": "2019-05-14 14:15:15.243756",
    "past_intervals": [
    {
    "first": "49767",
    "last": "59580",
    "all_participants": [
    {
    "osd": 2,
    "shard": 0
    },
    {
    "osd": 4,
    "shard": 1
    },
    {
    "osd": 23,
    "shard": 2
    },
    {
    "osd": 24,
    "shard": 0
    },
    {
    "osd": 72,
    "shard": 1
    },
    {
    "osd": 79,
    "shard": 3
    }
    ],
    "intervals": [
    {
    "first": "59562",
    "last": "59563",
    "acting": "4(1),24(0),79(3)"
    },
    {
    "first": "59564",
    "last": "59567",
    "acting": "23(2),24(0),79(3)"
    },
    {
    "first": "59570",
    "last": "59574",
    "acting": "4(1),23(2),79(3)"
    },
    {
    "first": "59577",
    "last": "59580",
    "acting": "4(1),23(2),24(0)"
    }
    ]
    }
    ],
    "probing_osds": [
    "2(0)",
    "4(1)",
    "23(2)",
    "24(0)",
    "72(1)",
    "79(3)"
    ],
    "down_osds_we_would_probe": [],
    "peering_blocked_by": []
    },
    {
    "name": "Started",
    "enter_time": "2019-05-14 14:15:15.243663"
    }
    ],

the peering does not seem to be blocked anymore. But still there is no 
recovery going on. Is there anything else we can try?



On 14.05.19 11:02 vorm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 10:59 AM Kevin Flöh  wrote:


On 14.05.19 10:08 vorm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  wrote:

On 13.05.19 10:51 nachm., Lionel Bouton wrote:

Le 13/05/2019 à 16:20, Kevin Flöh a écrit :

Dear ceph experts,

[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here is what happened: One osd daemon could not be started and
therefore we decided to mark the osd as lost and set it up from
scratch. Ceph started recovering and then we lost another osd with
the same behavior. We did the same as for the first osd.

With 3+1 you only allow a single OSD failure per pg at a given time.
You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
separate servers (assuming standard crush rules) is a death sentence
for the data on some pgs using both of those OSD (the ones not fully
recovered before the second failure).

OK, so the 2 OSDs (4,23) failed shortly one after the other but we think
that the recovery of the first was finished before the second failed.
Nonetheless, both problematic pgs have been on both OSDs. We think, that
we still have enough shards left. For one of the pgs, the recovery state
looks like this:

  "recovery_state": [
  {
  "name": "Started/Primary/Peering/Incomplete",
  "enter_time": "2019-05-09 16:11:48.625966",
  "comment": "not enough complete instances of this PG"
  },
  {
  "name": "Started/Primary/Peering",
  "enter_time": "2019-05-09 16:11:48.611171",
  "past_intervals": [
  {
  "first": "49767",
  "last": "59313",
  "all_participants": [
  {
  "osd": 2,
  "shard": 0
  },
  {
  "osd": 4,
  "shard": 1
  },
  {
  "osd": 23,
  "shard": 2
  

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Dan van der Ster
On Tue, May 14, 2019 at 5:13 PM Kevin Flöh  wrote:
>
> ok, so now we see at least a diffrence in the recovery state:
>
>  "recovery_state": [
>  {
>  "name": "Started/Primary/Peering/Incomplete",
>  "enter_time": "2019-05-14 14:15:15.650517",
>  "comment": "not enough complete instances of this PG"
>  },
>  {
>  "name": "Started/Primary/Peering",
>  "enter_time": "2019-05-14 14:15:15.243756",
>  "past_intervals": [
>  {
>  "first": "49767",
>  "last": "59580",
>  "all_participants": [
>  {
>  "osd": 2,
>  "shard": 0
>  },
>  {
>  "osd": 4,
>  "shard": 1
>  },
>  {
>  "osd": 23,
>  "shard": 2
>  },
>  {
>  "osd": 24,
>  "shard": 0
>  },
>  {
>  "osd": 72,
>  "shard": 1
>  },
>  {
>  "osd": 79,
>  "shard": 3
>  }
>  ],
>  "intervals": [
>  {
>  "first": "59562",
>  "last": "59563",
>  "acting": "4(1),24(0),79(3)"
>  },
>  {
>  "first": "59564",
>  "last": "59567",
>  "acting": "23(2),24(0),79(3)"
>  },
>  {
>  "first": "59570",
>  "last": "59574",
>  "acting": "4(1),23(2),79(3)"
>  },
>  {
>  "first": "59577",
>  "last": "59580",
>  "acting": "4(1),23(2),24(0)"
>  }
>  ]
>  }
>  ],
>  "probing_osds": [
>  "2(0)",
>  "4(1)",
>  "23(2)",
>  "24(0)",
>  "72(1)",
>  "79(3)"
>  ],
>  "down_osds_we_would_probe": [],
>  "peering_blocked_by": []
>  },
>  {
>  "name": "Started",
>  "enter_time": "2019-05-14 14:15:15.243663"
>  }
>  ],
>
> the peering does not seem to be blocked anymore. But still there is no
> recovery going on. Is there anything else we can try?

What is the state of the hdd's which had osds 4 & 23?
You may be able to use ceph-objectstore-tool to export those PG shards
and import to another operable OSD.

-- dan



>
>
> On 14.05.19 11:02 vorm., Dan van der Ster wrote:
> > On Tue, May 14, 2019 at 10:59 AM Kevin Flöh  wrote:
> >>
> >> On 14.05.19 10:08 vorm., Dan van der Ster wrote:
> >>
> >> On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  wrote:
> >>
> >> On 13.05.19 10:51 nachm., Lionel Bouton wrote:
> >>
> >> Le 13/05/2019 à 16:20, Kevin Flöh a écrit :
> >>
> >> Dear ceph experts,
> >>
> >> [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
> >> Here is what happened: One osd daemon could not be started and
> >> therefore we decided to mark the osd as lost and set it up from
> >> scratch. Ceph started recovering and then we lost another osd with
> >> the same behavior. We did the same as for the first osd.
> >>
> >> With 3+1 you only allow a single OSD failure per pg at a given time.
> >> You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
> >> separate servers (assuming standard crush rules) is a death sentence
> >> for the data on some pgs using both of those OSD (the ones not fully
> >> recovered before the second failure).
> >>
> >> OK, so the 2 OSDs (4,23) failed shortly one after the other but we think
> >> that the recovery of the first was finished before the second failed.
> >> Nonetheless, both problematic pgs have been on both OSDs. We think, that
> >> we still have enough shards left. For one of the pgs, the recovery state
> >> looks like this:
> >>
> >>   "recovery_state": [
> >>   {
> >>   "name": "Started/Primary/Peering/Incomplete",
> >>   "enter_time": "2019-05-09 16:11:48.625966",
> >>   "comment": "not enough complete instances of this PG"
> >>   },
> >>   {
> >>   "name": "Started/Pri

Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Konstantin Shalygin

  peering does not seem to be blocked anymore. But still there is no
recovery going on. Is there anything else we can try?



Try to reduce min_size for problem pool as 'health detail' suggested: 
`ceph osd pool set ec31 min_size 2`.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh

Hi,

since we have 3+1 ec I didn't try before. But when I run the command you 
suggested I get the following error:


ceph osd pool set ec31 min_size 2
Error EINVAL: pool min_size must be between 3 and 4

On 14.05.19 6:18 nachm., Konstantin Shalygin wrote:



  peering does not seem to be blocked anymore. But still there is no
recovery going on. Is there anything else we can try?



Try to reduce min_size for problem pool as 'health detail' suggested: 
`ceph osd pool set ec31 min_size 2`.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-14 Thread Kevin Flöh
The hdds of OSDs 4 and 23 are completely lost, we cannot access them in 
any way. Is it possible to use the shards which are maybe stored on 
working OSDs as shown in the all_participants list?


On 14.05.19 5:24 nachm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 5:13 PM Kevin Flöh  wrote:

ok, so now we see at least a diffrence in the recovery state:

  "recovery_state": [
  {
  "name": "Started/Primary/Peering/Incomplete",
  "enter_time": "2019-05-14 14:15:15.650517",
  "comment": "not enough complete instances of this PG"
  },
  {
  "name": "Started/Primary/Peering",
  "enter_time": "2019-05-14 14:15:15.243756",
  "past_intervals": [
  {
  "first": "49767",
  "last": "59580",
  "all_participants": [
  {
  "osd": 2,
  "shard": 0
  },
  {
  "osd": 4,
  "shard": 1
  },
  {
  "osd": 23,
  "shard": 2
  },
  {
  "osd": 24,
  "shard": 0
  },
  {
  "osd": 72,
  "shard": 1
  },
  {
  "osd": 79,
  "shard": 3
  }
  ],
  "intervals": [
  {
  "first": "59562",
  "last": "59563",
  "acting": "4(1),24(0),79(3)"
  },
  {
  "first": "59564",
  "last": "59567",
  "acting": "23(2),24(0),79(3)"
  },
  {
  "first": "59570",
  "last": "59574",
  "acting": "4(1),23(2),79(3)"
  },
  {
  "first": "59577",
  "last": "59580",
  "acting": "4(1),23(2),24(0)"
  }
  ]
  }
  ],
  "probing_osds": [
  "2(0)",
  "4(1)",
  "23(2)",
  "24(0)",
  "72(1)",
  "79(3)"
  ],
  "down_osds_we_would_probe": [],
  "peering_blocked_by": []
  },
  {
  "name": "Started",
  "enter_time": "2019-05-14 14:15:15.243663"
  }
  ],

the peering does not seem to be blocked anymore. But still there is no
recovery going on. Is there anything else we can try?

What is the state of the hdd's which had osds 4 & 23?
You may be able to use ceph-objectstore-tool to export those PG shards
and import to another operable OSD.

-- dan





On 14.05.19 11:02 vorm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 10:59 AM Kevin Flöh  wrote:

On 14.05.19 10:08 vorm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  wrote:

On 13.05.19 10:51 nachm., Lionel Bouton wrote:

Le 13/05/2019 à 16:20, Kevin Flöh a écrit :

Dear ceph experts,

[...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...]
Here is what happened: One osd daemon could not be started and
therefore we decided to mark the osd as lost and set it up from
scratch. Ceph started recovering and then we lost another osd with
the same behavior. We did the same as for the first osd.

With 3+1 you only allow a single OSD failure per pg at a given time.
You have 4096 pgs and 96 osds, having 2 OSD fail at the same time on 2
separate servers (assuming standard crush rules) is a death sentence
for the data on some pgs using both of those OSD (the ones not fully
recovered before the second failure).

OK, so the 2 OSDs (4,23) failed shortly one after the other but we think
that the recovery of the first was finished before the second failed.
Nonetheless, both problematic pgs have been on both OSDs. We think, that
we still have enough shards left. For one of the pgs, the recovery state
looks like this:

   "recovery_state": [
   {
   "name": "Started/Primary/Peering/Incomplete",
   "enter_time": "2019-05-09 16:11:48.625966",
   "comment": "not enough complete instances of this PG"
   },
   {
   "name": "Started/Primary/Peering",
  

Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Konstantin Shalygin


On 5/15/19 1:49 PM, Kevin Flöh wrote:


since we have 3+1 ec I didn't try before. But when I run the command 
you suggested I get the following error:


ceph osd pool set ec31 min_size 2
Error EINVAL: pool min_size must be between 3 and 4



What is your current min size? `ceph osd pool get ec31 min_size`



k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-15 Thread Kevin Flöh

ceph osd pool get ec31 min_size
min_size: 3

On 15.05.19 9:09 vorm., Konstantin Shalygin wrote:

ceph osd pool get ec31 min_size

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-17 Thread Kevin Flöh
We tried to export the shards from the OSDs but there are only two 
shards left for each of the pgs, so we decided to give up these pgs. 
Will the files of these pgs be deleted from the mds or do we have to 
delete them manually. Is this the correct command to mark the pgs as lost:


ceph pg {pg-id} mark_unfound_lost revert|delete

Cheers,
Kevin

On 15.05.19 8:55 vorm., Kevin Flöh wrote:
The hdds of OSDs 4 and 23 are completely lost, we cannot access them 
in any way. Is it possible to use the shards which are maybe stored on 
working OSDs as shown in the all_participants list?


On 14.05.19 5:24 nachm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 5:13 PM Kevin Flöh  wrote:

ok, so now we see at least a diffrence in the recovery state:

  "recovery_state": [
  {
  "name": "Started/Primary/Peering/Incomplete",
  "enter_time": "2019-05-14 14:15:15.650517",
  "comment": "not enough complete instances of this PG"
  },
  {
  "name": "Started/Primary/Peering",
  "enter_time": "2019-05-14 14:15:15.243756",
  "past_intervals": [
  {
  "first": "49767",
  "last": "59580",
  "all_participants": [
  {
  "osd": 2,
  "shard": 0
  },
  {
  "osd": 4,
  "shard": 1
  },
  {
  "osd": 23,
  "shard": 2
  },
  {
  "osd": 24,
  "shard": 0
  },
  {
  "osd": 72,
  "shard": 1
  },
  {
  "osd": 79,
  "shard": 3
  }
  ],
  "intervals": [
  {
  "first": "59562",
  "last": "59563",
  "acting": "4(1),24(0),79(3)"
  },
  {
  "first": "59564",
  "last": "59567",
  "acting": "23(2),24(0),79(3)"
  },
  {
  "first": "59570",
  "last": "59574",
  "acting": "4(1),23(2),79(3)"
  },
  {
  "first": "59577",
  "last": "59580",
  "acting": "4(1),23(2),24(0)"
  }
  ]
  }
  ],
  "probing_osds": [
  "2(0)",
  "4(1)",
  "23(2)",
  "24(0)",
  "72(1)",
  "79(3)"
  ],
  "down_osds_we_would_probe": [],
  "peering_blocked_by": []
  },
  {
  "name": "Started",
  "enter_time": "2019-05-14 14:15:15.243663"
  }
  ],

the peering does not seem to be blocked anymore. But still there is no
recovery going on. Is there anything else we can try?

What is the state of the hdd's which had osds 4 & 23?
You may be able to use ceph-objectstore-tool to export those PG shards
and import to another operable OSD.

-- dan





On 14.05.19 11:02 vorm., Dan van der Ster wrote:
On Tue, May 14, 2019 at 10:59 AM Kevin Flöh  
wrote:

On 14.05.19 10:08 vorm., Dan van der Ster wrote:

On Tue, May 14, 2019 at 10:02 AM Kevin Flöh  
wrote:


On 13.05.19 10:51 nachm., Lionel Bouton wrote:

Le 13/05/2019 à 16:20, Kevin Flöh a écrit :

Dear ceph experts,

[...] We have 4 nodes with 24 osds each and use 3+1 erasure 
coding. [...]

Here is what happened: One osd daemon could not be started and
therefore we decided to mark the osd as lost and set it up from
scratch. Ceph started recovering and then we lost another osd with
the same behavior. We did the same as for the first osd.

With 3+1 you only allow a single OSD failure per pg at a given time.
You have 4096 pgs and 96 osds, having 2 OSD fail at the same time 
on 2

separate servers (assuming standard crush rules) is a death sentence
for the data on some pgs using both of those OSD (the ones not fully
recovered before the second failure).

OK, so the 2 OSDs (4,23) failed shortly one after the other but we 
think

that the recovery of the first was finished before the second failed.
Nonetheless, both problematic pgs have been on both OSDs. We 
think, that

Re: [ceph-users] Major ceph disaster

2019-05-17 Thread Frédéric Nass



Le 14/05/2019 à 10:04, Kevin Flöh a écrit :


On 13.05.19 11:21 nachm., Dan van der Ster wrote:
Presumably the 2 OSDs you marked as lost were hosting those 
incomplete PGs?

It would be useful to double confirm that: check with `ceph pg 
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)


yes, but as written in my other mail, we still have enough shards, at 
least I think so.




If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

guess that is not possible.


Hi Kevin,

You want to make sure of this.

Unless you recreated the OSDs 4 and 23 and had new data written on them, 
they should still host the data you need.
What Dan suggested (export the 7 inconsistent PGs and import them on a 
healthy OSD) seems to be the only way to recover your lost data, as with 
4 hosts and 2 OSDs lost, you're left with 2 chunks of data/parity when 
you actually need 3 to access it. Reducing min_size to 3 will not help.


Have a look here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html

This is probably the best way you want to follow form now on.

Regards,
Frédéric.



If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan


would this let us recover at least some of the data on the pgs? If not 
we would just set up a new ceph directly without fixing the old one 
and copy whatever is left.


Best regards,

Kevin





On Mon, May 13, 2019 at 4:20 PM Kevin Flöh  wrote:

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster, let me
first show you the current ceph status:

    cluster:
  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
  health: HEALTH_ERR
  1 MDSs report slow metadata IOs
  1 MDSs report slow requests
  1 MDSs behind on trimming
  1/126319678 objects unfound (0.000%)
  19 scrub errors
  Reduced data availability: 2 pgs inactive, 2 pgs 
incomplete

  Possible data damage: 7 pgs inconsistent
  Degraded data redundancy: 1/500333881 objects degraded
(0.000%), 1 pg degraded
  118 stuck requests are blocked > 4096 sec. Implicated 
osds

24,32,91

    services:
  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
  mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3
up:standby
  osd: 96 osds: 96 up, 96 in

    data:
  pools:   2 pools, 4096 pgs
  objects: 126.32M objects, 260TiB
  usage:   372TiB used, 152TiB / 524TiB avail
  pgs: 0.049% pgs not active
   1/500333881 objects degraded (0.000%)
   1/126319678 objects unfound (0.000%)
   4076 active+clean
   10   active+clean+scrubbing+deep
   7    active+clean+inconsistent
   2    incomplete
   1    active+recovery_wait+degraded

    io:
  client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow 
requests;

1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
incomplete; Possible data damage: 7 pgs inconsistent; Degraded data
redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118
stuck requests are blocked > 4096 sec. Implicated osds 24,32,91
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
  mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are
blocked > 30 secs, oldest blocked for 351193 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
  mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are blocked 
> 30 sec

MDS_TRIM 1 MDSs behind on trimming
  mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming (46034/128)
max_segments: 128, num_segments: 46034
OBJECT_UNFOUND 1/126319687 objects unfound (0.000%)
  pg 1.24c has 1 unfound objects
OSD_SCRUB_ERRORS 19 scrub errors
PG_AVAILABILITY Reduced data availability: 2 pgs inactive, 2 pgs 
incomplete

  pg 1.5dd is incomplete, acting [24,4,23,79] (reducing pool ec31
min_size from 3 may help; search ceph.com/docs for 'incomplete')
  pg 1.619 is incomplete, acting [91,23,4,81] (reducing pool ec31
min_size from 3 may help; search ceph.com/docs for 'incomplete')
PG_DAMAGED Possible data damage: 7 pgs inco

Re: [ceph-users] Major ceph disaster

2019-05-20 Thread Kevin Flöh

Hi Frederic,

we do not have access to the original OSDs. We exported the remaining 
shards of the two pgs but we are only left with two shards (of 
reasonable size) per pg. The rest of the shards displayed by ceph pg 
query are empty. I guess marking the OSD as complete doesn't make sense 
then.


Best,
Kevin

On 17.05.19 2:36 nachm., Frédéric Nass wrote:



Le 14/05/2019 à 10:04, Kevin Flöh a écrit :


On 13.05.19 11:21 nachm., Dan van der Ster wrote:
Presumably the 2 OSDs you marked as lost were hosting those 
incomplete PGs?

It would be useful to double confirm that: check with `ceph pg 
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)


yes, but as written in my other mail, we still have enough shards, at 
least I think so.




If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

guess that is not possible.


Hi Kevin,

You want to make sure of this.

Unless you recreated the OSDs 4 and 23 and had new data written on 
them, they should still host the data you need.
What Dan suggested (export the 7 inconsistent PGs and import them on a 
healthy OSD) seems to be the only way to recover your lost data, as 
with 4 hosts and 2 OSDs lost, you're left with 2 chunks of data/parity 
when you actually need 3 to access it. Reducing min_size to 3 will not 
help.


Have a look here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html 



This is probably the best way you want to follow form now on.

Regards,
Frédéric.



If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan


would this let us recover at least some of the data on the pgs? If 
not we would just set up a new ceph directly without fixing the old 
one and copy whatever is left.


Best regards,

Kevin





On Mon, May 13, 2019 at 4:20 PM Kevin Flöh  wrote:

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster, let me
first show you the current ceph status:

    cluster:
  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
  health: HEALTH_ERR
  1 MDSs report slow metadata IOs
  1 MDSs report slow requests
  1 MDSs behind on trimming
  1/126319678 objects unfound (0.000%)
  19 scrub errors
  Reduced data availability: 2 pgs inactive, 2 pgs 
incomplete

  Possible data damage: 7 pgs inconsistent
  Degraded data redundancy: 1/500333881 objects degraded
(0.000%), 1 pg degraded
  118 stuck requests are blocked > 4096 sec. Implicated 
osds

24,32,91

    services:
  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
  mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3
up:standby
  osd: 96 osds: 96 up, 96 in

    data:
  pools:   2 pools, 4096 pgs
  objects: 126.32M objects, 260TiB
  usage:   372TiB used, 152TiB / 524TiB avail
  pgs: 0.049% pgs not active
   1/500333881 objects degraded (0.000%)
   1/126319678 objects unfound (0.000%)
   4076 active+clean
   10   active+clean+scrubbing+deep
   7    active+clean+inconsistent
   2    incomplete
   1    active+recovery_wait+degraded

    io:
  client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow 
requests;

1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
incomplete; Possible data damage: 7 pgs inconsistent; Degraded data
redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118
stuck requests are blocked > 4096 sec. Implicated osds 24,32,91
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
  mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are
blocked > 30 secs, oldest blocked for 351193 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
  mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are 
blocked > 30 sec

MDS_TRIM 1 MDSs behind on trimming
  mdsceph-node02.etp.kit.edu(mds.0): Behind on trimming 
(46034/128)

max_segments: 128, num_segments: 46034
OBJECT_UNFOUND 1/126319687 objects unfound (0.000%)
  pg 1.24c has 1 unfound objects
OSD_SCRUB_ERRORS 19 scrub errors
PG_AVAILABILITY Reduced d

Re: [ceph-users] Major ceph disaster

2019-05-21 Thread Kevin Flöh

Hi,

we gave up on the incomplete pgs since we do not have enough complete 
shards to restore them. What is the procedure to get rid of these pgs?


regards,

Kevin

On 20.05.19 9:22 vorm., Kevin Flöh wrote:

Hi Frederic,

we do not have access to the original OSDs. We exported the remaining 
shards of the two pgs but we are only left with two shards (of 
reasonable size) per pg. The rest of the shards displayed by ceph pg 
query are empty. I guess marking the OSD as complete doesn't make 
sense then.


Best,
Kevin

On 17.05.19 2:36 nachm., Frédéric Nass wrote:



Le 14/05/2019 à 10:04, Kevin Flöh a écrit :


On 13.05.19 11:21 nachm., Dan van der Ster wrote:
Presumably the 2 OSDs you marked as lost were hosting those 
incomplete PGs?

It would be useful to double confirm that: check with `ceph pg 
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)


yes, but as written in my other mail, we still have enough shards, 
at least I think so.




If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

guess that is not possible.


Hi Kevin,

You want to make sure of this.

Unless you recreated the OSDs 4 and 23 and had new data written on 
them, they should still host the data you need.
What Dan suggested (export the 7 inconsistent PGs and import them on 
a healthy OSD) seems to be the only way to recover your lost data, as 
with 4 hosts and 2 OSDs lost, you're left with 2 chunks of 
data/parity when you actually need 3 to access it. Reducing min_size 
to 3 will not help.


Have a look here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html 

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html 



This is probably the best way you want to follow form now on.

Regards,
Frédéric.



If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan


would this let us recover at least some of the data on the pgs? If 
not we would just set up a new ceph directly without fixing the old 
one and copy whatever is left.


Best regards,

Kevin





On Mon, May 13, 2019 at 4:20 PM Kevin Flöh  
wrote:

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster, 
let me

first show you the current ceph status:

    cluster:
  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
  health: HEALTH_ERR
  1 MDSs report slow metadata IOs
  1 MDSs report slow requests
  1 MDSs behind on trimming
  1/126319678 objects unfound (0.000%)
  19 scrub errors
  Reduced data availability: 2 pgs inactive, 2 pgs 
incomplete

  Possible data damage: 7 pgs inconsistent
  Degraded data redundancy: 1/500333881 objects degraded
(0.000%), 1 pg degraded
  118 stuck requests are blocked > 4096 sec. 
Implicated osds

24,32,91

    services:
  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
  mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3
up:standby
  osd: 96 osds: 96 up, 96 in

    data:
  pools:   2 pools, 4096 pgs
  objects: 126.32M objects, 260TiB
  usage:   372TiB used, 152TiB / 524TiB avail
  pgs: 0.049% pgs not active
   1/500333881 objects degraded (0.000%)
   1/126319678 objects unfound (0.000%)
   4076 active+clean
   10   active+clean+scrubbing+deep
   7    active+clean+inconsistent
   2    incomplete
   1    active+recovery_wait+degraded

    io:
  client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow 
requests;

1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
incomplete; Possible data damage: 7 pgs inconsistent; Degraded data
redundancy: 1/500333908 objects degraded (0.000%), 1 pg degraded; 118
stuck requests are blocked > 4096 sec. Implicated osds 24,32,91
MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs
  mdsceph-node02.etp.kit.edu(mds.0): 100+ slow metadata IOs are
blocked > 30 secs, oldest blocked for 351193 secs
MDS_SLOW_REQUEST 1 MDSs report slow requests
  mdsceph-node02.etp.kit.edu(mds.0): 4 slow requests are 
blocked > 30 sec

MDS_TRIM 1 MDSs behind on trimming
  mdsceph-node02.etp.kit.edu(mds.0): Behin

Re: [ceph-users] Major ceph disaster

2019-05-21 Thread Wido den Hollander


On 5/21/19 4:48 PM, Kevin Flöh wrote:
> Hi,
> 
> we gave up on the incomplete pgs since we do not have enough complete
> shards to restore them. What is the procedure to get rid of these pgs?
> 

You need to start with marking the OSDs as 'lost' and then you can
force_create_pg to get the PGs back (empty).

Wido

> regards,
> 
> Kevin
> 
> On 20.05.19 9:22 vorm., Kevin Flöh wrote:
>> Hi Frederic,
>>
>> we do not have access to the original OSDs. We exported the remaining
>> shards of the two pgs but we are only left with two shards (of
>> reasonable size) per pg. The rest of the shards displayed by ceph pg
>> query are empty. I guess marking the OSD as complete doesn't make
>> sense then.
>>
>> Best,
>> Kevin
>>
>> On 17.05.19 2:36 nachm., Frédéric Nass wrote:
>>>
>>>
>>> Le 14/05/2019 à 10:04, Kevin Flöh a écrit :

 On 13.05.19 11:21 nachm., Dan van der Ster wrote:
> Presumably the 2 OSDs you marked as lost were hosting those
> incomplete PGs?
> It would be useful to double confirm that: check with `ceph pg 
> query` and `ceph pg dump`.
> (If so, this is why the ignore history les thing isn't helping; you
> don't have the minimum 3 stripes up for those 3+1 PGs.)

 yes, but as written in my other mail, we still have enough shards,
 at least I think so.

>
> If those "lost" OSDs by some miracle still have the PG data, you might
> be able to export the relevant PG stripes with the
> ceph-objectstore-tool. I've never tried this myself, but there have
> been threads in the past where people export a PG from a nearly dead
> hdd, import to another OSD, then backfilling works.
 guess that is not possible.
>>>
>>> Hi Kevin,
>>>
>>> You want to make sure of this.
>>>
>>> Unless you recreated the OSDs 4 and 23 and had new data written on
>>> them, they should still host the data you need.
>>> What Dan suggested (export the 7 inconsistent PGs and import them on
>>> a healthy OSD) seems to be the only way to recover your lost data, as
>>> with 4 hosts and 2 OSDs lost, you're left with 2 chunks of
>>> data/parity when you actually need 3 to access it. Reducing min_size
>>> to 3 will not help.
>>>
>>> Have a look here:
>>>
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html
>>>
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html
>>>
>>>
>>> This is probably the best way you want to follow form now on.
>>>
>>> Regards,
>>> Frédéric.
>>>
>
> If OTOH those PGs are really lost forever, and someone else should
> confirm what I say here, I think the next step would be to force
> recreate the incomplete PGs then run a set of cephfs scrub/repair
> disaster recovery cmds to recover what you can from the cephfs.
>
> -- dan

 would this let us recover at least some of the data on the pgs? If
 not we would just set up a new ceph directly without fixing the old
 one and copy whatever is left.

 Best regards,

 Kevin



>
> On Mon, May 13, 2019 at 4:20 PM Kevin Flöh 
> wrote:
>> Dear ceph experts,
>>
>> we have several (maybe related) problems with our ceph cluster,
>> let me
>> first show you the current ceph status:
>>
>>     cluster:
>>   id: 23e72372-0d44-4cad-b24f-3641b14b86f4
>>   health: HEALTH_ERR
>>   1 MDSs report slow metadata IOs
>>   1 MDSs report slow requests
>>   1 MDSs behind on trimming
>>   1/126319678 objects unfound (0.000%)
>>   19 scrub errors
>>   Reduced data availability: 2 pgs inactive, 2 pgs
>> incomplete
>>   Possible data damage: 7 pgs inconsistent
>>   Degraded data redundancy: 1/500333881 objects degraded
>> (0.000%), 1 pg degraded
>>   118 stuck requests are blocked > 4096 sec.
>> Implicated osds
>> 24,32,91
>>
>>     services:
>>   mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>>   mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>>   mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3
>> up:standby
>>   osd: 96 osds: 96 up, 96 in
>>
>>     data:
>>   pools:   2 pools, 4096 pgs
>>   objects: 126.32M objects, 260TiB
>>   usage:   372TiB used, 152TiB / 524TiB avail
>>   pgs: 0.049% pgs not active
>>    1/500333881 objects degraded (0.000%)
>>    1/126319678 objects unfound (0.000%)
>>    4076 active+clean
>>    10   active+clean+scrubbing+deep
>>    7    active+clean+inconsistent
>>    2    incomplete
>>    1    active+recovery_wait+degraded
>>
>>     io:
>>   client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr
>>
>>
>> an

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread Kevin Flöh

Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we have 
another problem, there are 7 PGs inconsistent and a cpeh pg repair is 
not doing anything. I just get "instructing pg 1.5dd on osd.24 to 
repair" and nothing happens. Does somebody know how we can get the PGs 
to repair?


Regards,

Kevin

On 21.05.19 4:52 nachm., Wido den Hollander wrote:


On 5/21/19 4:48 PM, Kevin Flöh wrote:

Hi,

we gave up on the incomplete pgs since we do not have enough complete
shards to restore them. What is the procedure to get rid of these pgs?


You need to start with marking the OSDs as 'lost' and then you can
force_create_pg to get the PGs back (empty).

Wido


regards,

Kevin

On 20.05.19 9:22 vorm., Kevin Flöh wrote:

Hi Frederic,

we do not have access to the original OSDs. We exported the remaining
shards of the two pgs but we are only left with two shards (of
reasonable size) per pg. The rest of the shards displayed by ceph pg
query are empty. I guess marking the OSD as complete doesn't make
sense then.

Best,
Kevin

On 17.05.19 2:36 nachm., Frédéric Nass wrote:


Le 14/05/2019 à 10:04, Kevin Flöh a écrit :

On 13.05.19 11:21 nachm., Dan van der Ster wrote:

Presumably the 2 OSDs you marked as lost were hosting those
incomplete PGs?
It would be useful to double confirm that: check with `ceph pg 
query` and `ceph pg dump`.
(If so, this is why the ignore history les thing isn't helping; you
don't have the minimum 3 stripes up for those 3+1 PGs.)

yes, but as written in my other mail, we still have enough shards,
at least I think so.


If those "lost" OSDs by some miracle still have the PG data, you might
be able to export the relevant PG stripes with the
ceph-objectstore-tool. I've never tried this myself, but there have
been threads in the past where people export a PG from a nearly dead
hdd, import to another OSD, then backfilling works.

guess that is not possible.

Hi Kevin,

You want to make sure of this.

Unless you recreated the OSDs 4 and 23 and had new data written on
them, they should still host the data you need.
What Dan suggested (export the 7 inconsistent PGs and import them on
a healthy OSD) seems to be the only way to recover your lost data, as
with 4 hosts and 2 OSDs lost, you're left with 2 chunks of
data/parity when you actually need 3 to access it. Reducing min_size
to 3 will not help.

Have a look here:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-July/019673.html

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/023736.html


This is probably the best way you want to follow form now on.

Regards,
Frédéric.


If OTOH those PGs are really lost forever, and someone else should
confirm what I say here, I think the next step would be to force
recreate the incomplete PGs then run a set of cephfs scrub/repair
disaster recovery cmds to recover what you can from the cephfs.

-- dan

would this let us recover at least some of the data on the pgs? If
not we would just set up a new ceph directly without fixing the old
one and copy whatever is left.

Best regards,

Kevin




On Mon, May 13, 2019 at 4:20 PM Kevin Flöh 
wrote:

Dear ceph experts,

we have several (maybe related) problems with our ceph cluster,
let me
first show you the current ceph status:

     cluster:
   id: 23e72372-0d44-4cad-b24f-3641b14b86f4
   health: HEALTH_ERR
   1 MDSs report slow metadata IOs
   1 MDSs report slow requests
   1 MDSs behind on trimming
   1/126319678 objects unfound (0.000%)
   19 scrub errors
   Reduced data availability: 2 pgs inactive, 2 pgs
incomplete
   Possible data damage: 7 pgs inconsistent
   Degraded data redundancy: 1/500333881 objects degraded
(0.000%), 1 pg degraded
   118 stuck requests are blocked > 4096 sec.
Implicated osds
24,32,91

     services:
   mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
   mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
   mds: cephfs-1/1/1 up {0=ceph-node02.etp.kit.edu=up:active}, 3
up:standby
   osd: 96 osds: 96 up, 96 in

     data:
   pools:   2 pools, 4096 pgs
   objects: 126.32M objects, 260TiB
   usage:   372TiB used, 152TiB / 524TiB avail
   pgs: 0.049% pgs not active
    1/500333881 objects degraded (0.000%)
    1/126319678 objects unfound (0.000%)
    4076 active+clean
    10   active+clean+scrubbing+deep
    7    active+clean+inconsistent
    2    incomplete
    1    active+recovery_wait+degraded

     io:
   client:   449KiB/s rd, 42.9KiB/s wr, 152op/s rd, 0op/s wr


and ceph health detail:


HEALTH_ERR 1 MDSs report slow metadata IOs; 1 MDSs report slow
requests;
1 MDSs behind on trimming; 1/126319687 objects unfound (0.000%); 19
scrub errors; Reduced data availability: 2 pgs inactive, 2 pgs
incomplete; Possible data damage: 7 pgs inconsistent; De

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread John Petrini
It's been suggested here in the past to disable deep scrubbing temporarily
before running the repair because it does not execute immediately but gets
queued up behind deep scrubs.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-22 Thread Robert LeBlanc
On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:

> Hi,
>
> thank you, it worked. The PGs are not incomplete anymore. Still we have
> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
> repair" and nothing happens. Does somebody know how we can get the PGs
> to repair?
>
> Regards,
>
> Kevin
>

Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out why
they are inconsistent. Do these steps and then we can figure out how to
proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of
them)
2. Print out the inconsistent report for each inconsistent PG. `rados
list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the shards
have the same data.

Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does 
not change anything. Hence, the rados report is empty. Is there a way to 
stop the recovery wait to start the deep-scrub and get the output? I 
guess the recovery_wait might be caused by missing objects. Do we need 
to delete them first to get the recovery going?


Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
On Wed, May 22, 2019 at 4:31 AM Kevin Flöh > wrote:


Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we
have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the
PGs
to repair?

Regards,

Kevin


Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out 
why they are inconsistent. Do these steps and then we can figure out 
how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some 
of them)
2. Print out the inconsistent report for each inconsistent PG. `rados 
list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the 
shards have the same data.


Robert LeBlanc
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Dan van der Ster
What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:
>
> Hi,
>
> we have set the PGs to recover and now they are stuck in 
> active+recovery_wait+degraded and instructing them to deep-scrub does not 
> change anything. Hence, the rados report is empty. Is there a way to stop the 
> recovery wait to start the deep-scrub and get the output? I guess the 
> recovery_wait might be caused by missing objects. Do we need to delete them 
> first to get the recovery going?
>
> Kevin
>
> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
>
> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:
>>
>> Hi,
>>
>> thank you, it worked. The PGs are not incomplete anymore. Still we have
>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
>> repair" and nothing happens. Does somebody know how we can get the PGs
>> to repair?
>>
>> Regards,
>>
>> Kevin
>
>
> Kevin,
>
> I just fixed an inconsistent PG yesterday. You will need to figure out why 
> they are inconsistent. Do these steps and then we can figure out how to 
> proceed.
> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of 
> them)
> 2. Print out the inconsistent report for each inconsistent PG. `rados 
> list-inconsistent-obj  --format=json-pretty`
> 3. You will want to look at the error messages and see if all the shards have 
> the same data.
>
> Robert LeBlanc
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Marc Roos


I have been following this thread for a while, and thought I need to 
have 
 "major ceph disaster" alert on the monitoring ;)
 http://www.f1-outsourcing.eu/files/ceph-disaster.mp4 




-Original Message-
From: Kevin Flöh [mailto:kevin.fl...@kit.edu] 
Sent: donderdag 23 mei 2019 10:51
To: Robert LeBlanc
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Major ceph disaster

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does 
not change anything. Hence, the rados report is empty. Is there a way to 
stop the recovery wait to start the deep-scrub and get the output? I 
guess the recovery_wait might be caused by missing objects. Do we need 
to delete them first to get the recovery going?


Kevin


On 22.05.19 6:03 nachm., Robert LeBlanc wrote:


On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  
wrote:


Hi,

thank you, it worked. The PGs are not incomplete anymore. 
Still we have 
another problem, there are 7 PGs inconsistent and a cpeh pg 
repair is 
not doing anything. I just get "instructing pg 1.5dd on osd.24 
to 
repair" and nothing happens. Does somebody know how we can get 
the PGs 
to repair?

Regards,

Kevin



Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure 
out why they are inconsistent. Do these steps and then we can figure out 
how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix 
some of them)
2. Print out the inconsistent report for each inconsistent PG. 
`rados list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the 
shards have the same data.

Robert LeBlanc
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh

This is the current status of ceph:


  cluster:
    id: 23e72372-0d44-4cad-b24f-3641b14b86f4
    health: HEALTH_ERR
    9/125481144 objects unfound (0.000%)
    Degraded data redundancy: 9/497011417 objects degraded 
(0.000%), 7 pgs degraded
    9 stuck requests are blocked > 4096 sec. Implicated osds 
1,11,21,32,43,50,65


  services:
    mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
    mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
    mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3 
up:standby

    osd: 96 osds: 96 up, 96 in

  data:
    pools:   2 pools, 4096 pgs
    objects: 125.48M objects, 259TiB
    usage:   370TiB used, 154TiB / 524TiB avail
    pgs: 9/497011417 objects degraded (0.000%)
 9/125481144 objects unfound (0.000%)
 4078 active+clean
 11   active+clean+scrubbing+deep
 7    active+recovery_wait+degraded

  io:
    client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr

On 23.05.19 10:54 vorm., Dan van der Ster wrote:

What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does not 
change anything. Hence, the rados report is empty. Is there a way to stop the 
recovery wait to start the deep-scrub and get the output? I guess the 
recovery_wait might be caused by missing objects. Do we need to delete them 
first to get the recovery going?

Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:

On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:

Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the PGs
to repair?

Regards,

Kevin


Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out why they 
are inconsistent. Do these steps and then we can figure out how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
2. Print out the inconsistent report for each inconsistent PG. `rados 
list-inconsistent-obj  --format=json-pretty`
3. You will want to look at the error messages and see if all the shards have 
the same data.

Robert LeBlanc


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Dan van der Ster
I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
their degraded PGs.

Open a window with `watch ceph -s`, then in another window slowly do

ceph osd down 1
# then wait a minute or so for that osd.1 to re-peer fully.
ceph osd down 11
...

Continue that for each of the osds with stuck requests, or until there
are no more recovery_wait/degraded PGs.

After each `ceph osd down...`, you should expect to see several PGs
re-peer, and then ideally the slow requests will disappear and the
degraded PGs will become active+clean.
If anything else happens, you should stop and let us know.


-- dan

On Thu, May 23, 2019 at 10:59 AM Kevin Flöh  wrote:
>
> This is the current status of ceph:
>
>
>cluster:
>  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
>  health: HEALTH_ERR
>  9/125481144 objects unfound (0.000%)
>  Degraded data redundancy: 9/497011417 objects degraded
> (0.000%), 7 pgs degraded
>  9 stuck requests are blocked > 4096 sec. Implicated osds
> 1,11,21,32,43,50,65
>
>services:
>  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>  mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> up:standby
>  osd: 96 osds: 96 up, 96 in
>
>data:
>  pools:   2 pools, 4096 pgs
>  objects: 125.48M objects, 259TiB
>  usage:   370TiB used, 154TiB / 524TiB avail
>  pgs: 9/497011417 objects degraded (0.000%)
>   9/125481144 objects unfound (0.000%)
>   4078 active+clean
>   11   active+clean+scrubbing+deep
>   7active+recovery_wait+degraded
>
>io:
>  client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
>
> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
> > What's the full ceph status?
> > Normally recovery_wait just means that the relevant osd's are busy
> > recovering/backfilling another PG.
> >
> > On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:
> >> Hi,
> >>
> >> we have set the PGs to recover and now they are stuck in 
> >> active+recovery_wait+degraded and instructing them to deep-scrub does not 
> >> change anything. Hence, the rados report is empty. Is there a way to stop 
> >> the recovery wait to start the deep-scrub and get the output? I guess the 
> >> recovery_wait might be caused by missing objects. Do we need to delete 
> >> them first to get the recovery going?
> >>
> >> Kevin
> >>
> >> On 22.05.19 6:03 nachm., Robert LeBlanc wrote:
> >>
> >> On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:
> >>> Hi,
> >>>
> >>> thank you, it worked. The PGs are not incomplete anymore. Still we have
> >>> another problem, there are 7 PGs inconsistent and a cpeh pg repair is
> >>> not doing anything. I just get "instructing pg 1.5dd on osd.24 to
> >>> repair" and nothing happens. Does somebody know how we can get the PGs
> >>> to repair?
> >>>
> >>> Regards,
> >>>
> >>> Kevin
> >>
> >> Kevin,
> >>
> >> I just fixed an inconsistent PG yesterday. You will need to figure out why 
> >> they are inconsistent. Do these steps and then we can figure out how to 
> >> proceed.
> >> 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of 
> >> them)
> >> 2. Print out the inconsistent report for each inconsistent PG. `rados 
> >> list-inconsistent-obj  --format=json-pretty`
> >> 3. You will want to look at the error messages and see if all the shards 
> >> have the same data.
> >>
> >> Robert LeBlanc
> >>
> >>
> >> ___
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Kevin Flöh
thank you for this idea, it has improved the situation. Nevertheless, 
there are still 2 PGs in recovery_wait. ceph -s gives me:


  cluster:
    id: 23e72372-0d44-4cad-b24f-3641b14b86f4
    health: HEALTH_WARN
    3/125481112 objects unfound (0.000%)
    Degraded data redundancy: 3/497011315 objects degraded 
(0.000%), 2 pgs degraded


  services:
    mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
    mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
    mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3 
up:standby

    osd: 96 osds: 96 up, 96 in

  data:
    pools:   2 pools, 4096 pgs
    objects: 125.48M objects, 259TiB
    usage:   370TiB used, 154TiB / 524TiB avail
    pgs: 3/497011315 objects degraded (0.000%)
 3/125481112 objects unfound (0.000%)
 4083 active+clean
 10   active+clean+scrubbing+deep
 2    active+recovery_wait+degraded
 1    active+clean+scrubbing

  io:
    client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr


and ceph health detail:

HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data 
redundancy: 3/497011315 objects degraded (0.000%), 2 p

gs degraded
OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)
    pg 1.24c has 1 unfound objects
    pg 1.779 has 2 unfound objects
PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded 
(0.000%), 2 pgs degraded
    pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1 
unfound
    pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2 
unfound



also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph 
osd down for all OSDs of the degraded PGs. Do you have any further 
suggestions on how to proceed?


On 23.05.19 11:08 vorm., Dan van der Ster wrote:

I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
their degraded PGs.

Open a window with `watch ceph -s`, then in another window slowly do

 ceph osd down 1
 # then wait a minute or so for that osd.1 to re-peer fully.
 ceph osd down 11
 ...

Continue that for each of the osds with stuck requests, or until there
are no more recovery_wait/degraded PGs.

After each `ceph osd down...`, you should expect to see several PGs
re-peer, and then ideally the slow requests will disappear and the
degraded PGs will become active+clean.
If anything else happens, you should stop and let us know.


-- dan

On Thu, May 23, 2019 at 10:59 AM Kevin Flöh  wrote:

This is the current status of ceph:


cluster:
  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
  health: HEALTH_ERR
  9/125481144 objects unfound (0.000%)
  Degraded data redundancy: 9/497011417 objects degraded
(0.000%), 7 pgs degraded
  9 stuck requests are blocked > 4096 sec. Implicated osds
1,11,21,32,43,50,65

services:
  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
  mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
up:standby
  osd: 96 osds: 96 up, 96 in

data:
  pools:   2 pools, 4096 pgs
  objects: 125.48M objects, 259TiB
  usage:   370TiB used, 154TiB / 524TiB avail
  pgs: 9/497011417 objects degraded (0.000%)
   9/125481144 objects unfound (0.000%)
   4078 active+clean
   11   active+clean+scrubbing+deep
   7active+recovery_wait+degraded

io:
  client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr

On 23.05.19 10:54 vorm., Dan van der Ster wrote:

What's the full ceph status?
Normally recovery_wait just means that the relevant osd's are busy
recovering/backfilling another PG.

On Thu, May 23, 2019 at 10:53 AM Kevin Flöh  wrote:

Hi,

we have set the PGs to recover and now they are stuck in 
active+recovery_wait+degraded and instructing them to deep-scrub does not 
change anything. Hence, the rados report is empty. Is there a way to stop the 
recovery wait to start the deep-scrub and get the output? I guess the 
recovery_wait might be caused by missing objects. Do we need to delete them 
first to get the recovery going?

Kevin

On 22.05.19 6:03 nachm., Robert LeBlanc wrote:

On Wed, May 22, 2019 at 4:31 AM Kevin Flöh  wrote:

Hi,

thank you, it worked. The PGs are not incomplete anymore. Still we have
another problem, there are 7 PGs inconsistent and a cpeh pg repair is
not doing anything. I just get "instructing pg 1.5dd on osd.24 to
repair" and nothing happens. Does somebody know how we can get the PGs
to repair?

Regards,

Kevin

Kevin,

I just fixed an inconsistent PG yesterday. You will need to figure out why they 
are inconsistent. Do these steps and then we can figure out how to proceed.
1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them)
2. Print out the inconsistent report for each inconsistent PG. `rados 
list-inconsistent-obj  --format=json-pretty`
3

Re: [ceph-users] Major ceph disaster

2019-05-23 Thread Alexandre Marangone
The PGs will stay active+recovery_wait+degraded until you solve the unfound
objects issue.
You can follow this doc to look at which objects are unfound[1]  and if no
other recourse mark them lost

[1]
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects
.

On Thu, May 23, 2019 at 5:47 AM Kevin Flöh  wrote:

> thank you for this idea, it has improved the situation. Nevertheless,
> there are still 2 PGs in recovery_wait. ceph -s gives me:
>
>cluster:
>  id: 23e72372-0d44-4cad-b24f-3641b14b86f4
>  health: HEALTH_WARN
>  3/125481112 objects unfound (0.000%)
>  Degraded data redundancy: 3/497011315 objects degraded
> (0.000%), 2 pgs degraded
>
>services:
>  mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>  mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
>  mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> up:standby
>  osd: 96 osds: 96 up, 96 in
>
>data:
>  pools:   2 pools, 4096 pgs
>  objects: 125.48M objects, 259TiB
>  usage:   370TiB used, 154TiB / 524TiB avail
>  pgs: 3/497011315 objects degraded (0.000%)
>   3/125481112 objects unfound (0.000%)
>   4083 active+clean
>   10   active+clean+scrubbing+deep
>   2active+recovery_wait+degraded
>   1active+clean+scrubbing
>
>io:
>  client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr
>
>
> and ceph health detail:
>
> HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data
> redundancy: 3/497011315 objects degraded (0.000%), 2 p
> gs degraded
> OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)
>  pg 1.24c has 1 unfound objects
>  pg 1.779 has 2 unfound objects
> PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded
> (0.000%), 2 pgs degraded
>  pg 1.24c is active+recovery_wait+degraded, acting [32,4,61,36], 1
> unfound
>  pg 1.779 is active+recovery_wait+degraded, acting [50,4,77,62], 2
> unfound
>
>
> also the status changed form HEALTH_ERR to HEALTH_WARN. We also did ceph
> osd down for all OSDs of the degraded PGs. Do you have any further
> suggestions on how to proceed?
>
> On 23.05.19 11:08 vorm., Dan van der Ster wrote:
> > I think those osds (1, 11, 21, 32, ...) need a little kick to re-peer
> > their degraded PGs.
> >
> > Open a window with `watch ceph -s`, then in another window slowly do
> >
> >  ceph osd down 1
> >  # then wait a minute or so for that osd.1 to re-peer fully.
> >  ceph osd down 11
> >  ...
> >
> > Continue that for each of the osds with stuck requests, or until there
> > are no more recovery_wait/degraded PGs.
> >
> > After each `ceph osd down...`, you should expect to see several PGs
> > re-peer, and then ideally the slow requests will disappear and the
> > degraded PGs will become active+clean.
> > If anything else happens, you should stop and let us know.
> >
> >
> > -- dan
> >
> > On Thu, May 23, 2019 at 10:59 AM Kevin Flöh  wrote:
> >> This is the current status of ceph:
> >>
> >>
> >> cluster:
> >>   id: 23e72372-0d44-4cad-b24f-3641b14b86f4
> >>   health: HEALTH_ERR
> >>   9/125481144 objects unfound (0.000%)
> >>   Degraded data redundancy: 9/497011417 objects degraded
> >> (0.000%), 7 pgs degraded
> >>   9 stuck requests are blocked > 4096 sec. Implicated osds
> >> 1,11,21,32,43,50,65
> >>
> >> services:
> >>   mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
> >>   mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu
> >>   mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu=up:active}, 3
> >> up:standby
> >>   osd: 96 osds: 96 up, 96 in
> >>
> >> data:
> >>   pools:   2 pools, 4096 pgs
> >>   objects: 125.48M objects, 259TiB
> >>   usage:   370TiB used, 154TiB / 524TiB avail
> >>   pgs: 9/497011417 objects degraded (0.000%)
> >>9/125481144 objects unfound (0.000%)
> >>4078 active+clean
> >>11   active+clean+scrubbing+deep
> >>7active+recovery_wait+degraded
> >>
> >> io:
> >>   client:   211KiB/s rd, 46.0KiB/s wr, 158op/s rd, 0op/s wr
> >>
> >> On 23.05.19 10:54 vorm., Dan van der Ster wrote:
> >>> What's the full ceph status?
> >>> Normally recovery_wait just means that the relevant osd's are busy
> >>> recovering/backfilling another PG.
> >>>
> >>> On Thu, May 23, 2019 at 10:53 AM Kevin Flöh 
> wrote:
>  Hi,
> 
>  we have set the PGs to recover and now they are stuck in
> active+recovery_wait+degraded and instructing them to deep-scrub does not
> change anything. Hence, the rados report is empty. Is there a way to stop
> the recovery wait to start the deep-scrub and get the output? I guess the
> recovery_wait might be caused by missing objects. Do we need to delete them
> first to get the recovery going?
> 
>  Kevin
> 
>  On 

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh
We got the object ids of the missing objects with|ceph pg 1.24c 
list_missing:|


|{
    "offset": {
    "oid": "",
    "key": "",
    "snapid": 0,
    "hash": 0,
    "max": 0,
    "pool": -9223372036854775808,
    "namespace": ""
    },
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
    {
    "oid": {
    "oid": "10004dfce92.003d",
    "key": "",
    "snapid": -2,
    "hash": 90219084,
    "max": 0,
    "pool": 1,
    "namespace": ""
    },
    "need": "46950'195355",
    "have": "0'0",
    "flags": "none",
    "locations": [
    "36(3)",
    "61(2)"
    ]
    }
    ],
    "more": false
}
|

|we want to give up those objects with:|

ceph  pg  1.24c  mark_unfound_lost  revert But first we would like to know which file(s) is affected. Is 
there a way to map the object id to the corresponding file?


||

On 23.05.19 3:52 nachm., Alexandre Marangone wrote:
The PGs will stay active+recovery_wait+degraded until you solve the 
unfound objects issue.
You can follow this doc to look at which objects are unfound[1]  and 
if no other recourse mark them lost


[1] 
http://docs.ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#unfound-objects. 



On Thu, May 23, 2019 at 5:47 AM Kevin Flöh > wrote:


thank you for this idea, it has improved the situation. Nevertheless,
there are still 2 PGs in recovery_wait. ceph -s gives me:

   cluster:
 id: 23e72372-0d44-4cad-b24f-3641b14b86f4
 health: HEALTH_WARN
 3/125481112 objects unfound (0.000%)
 Degraded data redundancy: 3/497011315 objects degraded
(0.000%), 2 pgs degraded

   services:
 mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
 mgr: ceph-node01(active), standbys: ceph-node01.etp.kit.edu

 mds: cephfs-1/1/1 up  {0=ceph-node03.etp.kit.edu
=up:active}, 3
up:standby
 osd: 96 osds: 96 up, 96 in

   data:
 pools:   2 pools, 4096 pgs
 objects: 125.48M objects, 259TiB
 usage:   370TiB used, 154TiB / 524TiB avail
 pgs: 3/497011315 objects degraded (0.000%)
  3/125481112 objects unfound (0.000%)
  4083 active+clean
  10   active+clean+scrubbing+deep
  2    active+recovery_wait+degraded
  1    active+clean+scrubbing

   io:
 client:   318KiB/s rd, 77.0KiB/s wr, 190op/s rd, 0op/s wr


and ceph health detail:

HEALTH_WARN 3/125481112 objects unfound (0.000%); Degraded data
redundancy: 3/497011315 objects degraded (0.000%), 2 p
gs degraded
OBJECT_UNFOUND 3/125481112 objects unfound (0.000%)
 pg 1.24c has 1 unfound objects
 pg 1.779 has 2 unfound objects
PG_DEGRADED Degraded data redundancy: 3/497011315 objects degraded
(0.000%), 2 pgs degraded
 pg 1.24c is active+recovery_wait+degraded, acting
[32,4,61,36], 1
unfound
 pg 1.779 is active+recovery_wait+degraded, acting
[50,4,77,62], 2
unfound


also the status changed form HEALTH_ERR to HEALTH_WARN. We also
did ceph
osd down for all OSDs of the degraded PGs. Do you have any further
suggestions on how to proceed?

On 23.05.19 11:08 vorm., Dan van der Ster wrote:
> I think those osds (1, 11, 21, 32, ...) need a little kick to
re-peer
> their degraded PGs.
>
> Open a window with `watch ceph -s`, then in another window slowly do
>
>      ceph osd down 1
>      # then wait a minute or so for that osd.1 to re-peer fully.
>      ceph osd down 11
>      ...
>
> Continue that for each of the osds with stuck requests, or until
there
> are no more recovery_wait/degraded PGs.
>
> After each `ceph osd down...`, you should expect to see several PGs
> re-peer, and then ideally the slow requests will disappear and the
> degraded PGs will become active+clean.
> If anything else happens, you should stop and let us know.
>
>
> -- dan
>
> On Thu, May 23, 2019 at 10:59 AM Kevin Flöh mailto:kevin.fl...@kit.edu>> wrote:
>> This is the current status of ceph:
>>
>>
>>     cluster:
>>       id:     23e72372-0d44-4cad-b24f-3641b14b86f4
>>       health: HEALTH_ERR
>>               9/125481144 objects unfound (0.000%)
>>               Degraded data redundancy: 9/497011417 objects
degraded
>> (0.000%), 7 pgs degraded
>>               9 stuck requests are blocked > 4096 sec.
Implicated osds
>> 1,11,21,32,43,50,65
>>
>>     services:
>>       mon: 3 daemons, quorum ceph-node03,ceph-node01,ceph-node02
>>       mgr: ceph-node01(active), s

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Burkhard Linke

Hi,

On 5/24/19 9:48 AM, Kevin Flöh wrote:


We got the object ids of the missing objects with|ceph pg 1.24c 
list_missing:|


|{
    "offset": {
    "oid": "",
    "key": "",
    "snapid": 0,
    "hash": 0,
    "max": 0,
    "pool": -9223372036854775808,
    "namespace": ""
    },
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
    {
    "oid": {
    "oid": "10004dfce92.003d",
    "key": "",
    "snapid": -2,
    "hash": 90219084,
    "max": 0,
    "pool": 1,
    "namespace": ""
    },
    "need": "46950'195355",
    "have": "0'0",
    "flags": "none",
    "locations": [
    "36(3)",
    "61(2)"
    ]
    }
    ],
    "more": false
}
|

|we want to give up those objects with:|

ceph  pg  1.24c  mark_unfound_lost  revert But first we would like to know which file(s) is affected. Is 
there a way to map the object id to the corresponding file?



The object name is composed of the file inode id and the chunk within 
the file. The first chunk has some metadata you can use to retrieve the 
filename. See the 'CephFS object mapping' thread on the mailing list for 
more information.



Regards,

Burkhard


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh

Hi,

we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" 
but this is just hanging forever if we are looking for unfound objects. 
It works fine for all other objects.


We also tried scanning the ceph directory with find -inum 1099593404050 
(decimal of 10004dfce92) and found nothing. This is also working for non 
unfound objects.


Is there another way to find the corresponding file?

On 24.05.19 11:12 vorm., Burkhard Linke wrote:


Hi,

On 5/24/19 9:48 AM, Kevin Flöh wrote:


We got the object ids of the missing objects with|ceph pg 1.24c 
list_missing:|


|{
    "offset": {
    "oid": "",
    "key": "",
    "snapid": 0,
    "hash": 0,
    "max": 0,
    "pool": -9223372036854775808,
    "namespace": ""
    },
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
    {
    "oid": {
    "oid": "10004dfce92.003d",
    "key": "",
    "snapid": -2,
    "hash": 90219084,
    "max": 0,
    "pool": 1,
    "namespace": ""
    },
    "need": "46950'195355",
    "have": "0'0",
    "flags": "none",
    "locations": [
    "36(3)",
    "61(2)"
    ]
    }
    ],
    "more": false
}
|

|we want to give up those objects with:|

ceph  pg  1.24c  mark_unfound_lost  revert But first we would like to know which file(s) is affected. Is 
there a way to map the object id to the corresponding file?



The object name is composed of the file inode id and the chunk within 
the file. The first chunk has some metadata you can use to retrieve 
the filename. See the 'CephFS object mapping' thread on the mailing 
list for more information.



Regards,

Burkhard



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Robert LeBlanc
You need to use the first stripe of the object as that is the only one with
the metadata.

Try "rados -p ec31 getxattr 10004dfce92. parent" instead.

Robert LeBlanc

Sent from a mobile device, please excuse any typos.

On Fri, May 24, 2019, 4:42 AM Kevin Flöh  wrote:

> Hi,
>
> we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" but
> this is just hanging forever if we are looking for unfound objects. It
> works fine for all other objects.
>
> We also tried scanning the ceph directory with find -inum 1099593404050
> (decimal of 10004dfce92) and found nothing. This is also working for non
> unfound objects.
>
> Is there another way to find the corresponding file?
> On 24.05.19 11:12 vorm., Burkhard Linke wrote:
>
> Hi,
> On 5/24/19 9:48 AM, Kevin Flöh wrote:
>
> We got the object ids of the missing objects with ceph pg 1.24c
> list_missing:
>
> {
> "offset": {
> "oid": "",
> "key": "",
> "snapid": 0,
> "hash": 0,
> "max": 0,
> "pool": -9223372036854775808,
> "namespace": ""
> },
> "num_missing": 1,
> "num_unfound": 1,
> "objects": [
> {
> "oid": {
> "oid": "10004dfce92.003d",
> "key": "",
> "snapid": -2,
> "hash": 90219084,
> "max": 0,
> "pool": 1,
> "namespace": ""
> },
> "need": "46950'195355",
> "have": "0'0",
> "flags": "none",
> "locations": [
> "36(3)",
> "61(2)"
> ]
> }
> ],
> "more": false
> }
>
> we want to give up those objects with:
>
> ceph pg 1.24c mark_unfound_lost revert
>
> But first we would like to know which file(s) is affected. Is there a way to 
> map the object id to the corresponding file?
>
>
> The object name is composed of the file inode id and the chunk within the
> file. The first chunk has some metadata you can use to retrieve the
> filename. See the 'CephFS object mapping' thread on the mailing list for
> more information.
>
>
> Regards,
>
> Burkhard
>
>
> ___
> ceph-users mailing 
> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Kevin Flöh

ok this just gives me:

error getting xattr ec31/10004dfce92./parent: (2) No such file 
or directory


Does this mean that the lost object isn't even a file that appears in 
the ceph directory. Maybe a leftover of a file that has not been deleted 
properly? It wouldn't be an issue to mark the object as lost in that case.


On 24.05.19 5:08 nachm., Robert LeBlanc wrote:
You need to use the first stripe of the object as that is the only one 
with the metadata.


Try "rados -p ec31 getxattr 10004dfce92. parent" instead.

Robert LeBlanc

Sent from a mobile device, please excuse any typos.

On Fri, May 24, 2019, 4:42 AM Kevin Flöh > wrote:


Hi,

we already tried "rados -p ec31 getxattr 10004dfce92.003d
parent" but this is just hanging forever if we are looking for
unfound objects. It works fine for all other objects.

We also tried scanning the ceph directory with find -inum
1099593404050 (decimal of 10004dfce92) and found nothing. This is
also working for non unfound objects.

Is there another way to find the corresponding file?

On 24.05.19 11:12 vorm., Burkhard Linke wrote:


Hi,

On 5/24/19 9:48 AM, Kevin Flöh wrote:


We got the object ids of the missing objects with|ceph pg 1.24c
list_missing:|

|{
    "offset": {
    "oid": "",
    "key": "",
    "snapid": 0,
    "hash": 0,
    "max": 0,
    "pool": -9223372036854775808,
    "namespace": ""
    },
    "num_missing": 1,
    "num_unfound": 1,
    "objects": [
    {
    "oid": {
    "oid": "10004dfce92.003d",
    "key": "",
    "snapid": -2,
    "hash": 90219084,
    "max": 0,
    "pool": 1,
    "namespace": ""
    },
    "need": "46950'195355",
    "have": "0'0",
    "flags": "none",
    "locations": [
    "36(3)",
    "61(2)"
    ]
    }
    ],
    "more": false
}
|

|we want to give up those objects with:|

ceph  pg  1.24c  mark_unfound_lost  revert But first we would like to know 
which file(s) is
affected. Is there a way to map the object id to the
corresponding file?



The object name is composed of the file inode id and the chunk
within the file. The first chunk has some metadata you can use to
retrieve the filename. See the 'CephFS object mapping' thread on
the mailing list for more information.


Regards,

Burkhard



___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Robert LeBlanc
I'd say that if you can't find that object in Rados, then your assumption
may be good. I haven't run into this problem before. Try doing a Rados get
for that object and see if you get anything. I've done a Rados list
grepping for the hex inode, but it took almost two days on our cluster that
had half a billion objects. Your cluster may be faster.

Sent from a mobile device, please excuse any typos.

On Fri, May 24, 2019, 8:21 AM Kevin Flöh  wrote:

> ok this just gives me:
>
> error getting xattr ec31/10004dfce92./parent: (2) No such file or
> directory
>
> Does this mean that the lost object isn't even a file that appears in the
> ceph directory. Maybe a leftover of a file that has not been deleted
> properly? It wouldn't be an issue to mark the object as lost in that case.
> On 24.05.19 5:08 nachm., Robert LeBlanc wrote:
>
> You need to use the first stripe of the object as that is the only one
> with the metadata.
>
> Try "rados -p ec31 getxattr 10004dfce92. parent" instead.
>
> Robert LeBlanc
>
> Sent from a mobile device, please excuse any typos.
>
> On Fri, May 24, 2019, 4:42 AM Kevin Flöh  wrote:
>
>> Hi,
>>
>> we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" but
>> this is just hanging forever if we are looking for unfound objects. It
>> works fine for all other objects.
>>
>> We also tried scanning the ceph directory with find -inum 1099593404050
>> (decimal of 10004dfce92) and found nothing. This is also working for non
>> unfound objects.
>>
>> Is there another way to find the corresponding file?
>> On 24.05.19 11:12 vorm., Burkhard Linke wrote:
>>
>> Hi,
>> On 5/24/19 9:48 AM, Kevin Flöh wrote:
>>
>> We got the object ids of the missing objects with ceph pg 1.24c
>> list_missing:
>>
>> {
>> "offset": {
>> "oid": "",
>> "key": "",
>> "snapid": 0,
>> "hash": 0,
>> "max": 0,
>> "pool": -9223372036854775808,
>> "namespace": ""
>> },
>> "num_missing": 1,
>> "num_unfound": 1,
>> "objects": [
>> {
>> "oid": {
>> "oid": "10004dfce92.003d",
>> "key": "",
>> "snapid": -2,
>> "hash": 90219084,
>> "max": 0,
>> "pool": 1,
>> "namespace": ""
>> },
>> "need": "46950'195355",
>> "have": "0'0",
>> "flags": "none",
>> "locations": [
>> "36(3)",
>> "61(2)"
>> ]
>> }
>> ],
>> "more": false
>> }
>>
>> we want to give up those objects with:
>>
>> ceph pg 1.24c mark_unfound_lost revert
>>
>> But first we would like to know which file(s) is affected. Is there a way to 
>> map the object id to the corresponding file?
>>
>>
>> The object name is composed of the file inode id and the chunk within the
>> file. The first chunk has some metadata you can use to retrieve the
>> filename. See the 'CephFS object mapping' thread on the mailing list for
>> more information.
>>
>>
>> Regards,
>>
>> Burkhard
>>
>>
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-25 Thread Paul Emmerich
On Fri, May 24, 2019 at 5:22 PM Kevin Flöh  wrote:

> ok this just gives me:
>
> error getting xattr ec31/10004dfce92./parent: (2) No such file or
> directory
>
Try to run it on the replicated main data pool which contains an empty
object for each file, not sure where the xattr is stored in a multi-pool
setup.



-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


> Does this mean that the lost object isn't even a file that appears in the
> ceph directory. Maybe a leftover of a file that has not been deleted
> properly? It wouldn't be an issue to mark the object as lost in that case.
> On 24.05.19 5:08 nachm., Robert LeBlanc wrote:
>
> You need to use the first stripe of the object as that is the only one
> with the metadata.
>
> Try "rados -p ec31 getxattr 10004dfce92. parent" instead.
>
> Robert LeBlanc
>
> Sent from a mobile device, please excuse any typos.
>
> On Fri, May 24, 2019, 4:42 AM Kevin Flöh  wrote:
>
>> Hi,
>>
>> we already tried "rados -p ec31 getxattr 10004dfce92.003d parent" but
>> this is just hanging forever if we are looking for unfound objects. It
>> works fine for all other objects.
>>
>> We also tried scanning the ceph directory with find -inum 1099593404050
>> (decimal of 10004dfce92) and found nothing. This is also working for non
>> unfound objects.
>>
>> Is there another way to find the corresponding file?
>> On 24.05.19 11:12 vorm., Burkhard Linke wrote:
>>
>> Hi,
>> On 5/24/19 9:48 AM, Kevin Flöh wrote:
>>
>> We got the object ids of the missing objects with ceph pg 1.24c
>> list_missing:
>>
>> {
>> "offset": {
>> "oid": "",
>> "key": "",
>> "snapid": 0,
>> "hash": 0,
>> "max": 0,
>> "pool": -9223372036854775808,
>> "namespace": ""
>> },
>> "num_missing": 1,
>> "num_unfound": 1,
>> "objects": [
>> {
>> "oid": {
>> "oid": "10004dfce92.003d",
>> "key": "",
>> "snapid": -2,
>> "hash": 90219084,
>> "max": 0,
>> "pool": 1,
>> "namespace": ""
>> },
>> "need": "46950'195355",
>> "have": "0'0",
>> "flags": "none",
>> "locations": [
>> "36(3)",
>> "61(2)"
>> ]
>> }
>> ],
>> "more": false
>> }
>>
>> we want to give up those objects with:
>>
>> ceph pg 1.24c mark_unfound_lost revert
>>
>> But first we would like to know which file(s) is affected. Is there a way to 
>> map the object id to the corresponding file?
>>
>>
>> The object name is composed of the file inode id and the chunk within the
>> file. The first chunk has some metadata you can use to retrieve the
>> filename. See the 'CephFS object mapping' thread on the mailing list for
>> more information.
>>
>>
>> Regards,
>>
>> Burkhard
>>
>>
>> ___
>> ceph-users mailing 
>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Major ceph disaster

2019-05-25 Thread Paul Emmerich
On Sat, May 25, 2019 at 7:45 PM Paul Emmerich 
wrote:

>
>
> On Fri, May 24, 2019 at 5:22 PM Kevin Flöh  wrote:
>
>> ok this just gives me:
>>
>> error getting xattr ec31/10004dfce92./parent: (2) No such file or
>> directory
>>
> Try to run it on the replicated main data pool which contains an empty
> object for each file, not sure where the xattr is stored in a multi-pool
> setup.
>

Also, you probably didn't lose all the chunks of the erasure coded data.
Check the list_missing output to see which chunks are still there and where
they are.
You can export the chunks that you still have using ceph-objectstore-tool.
The first 3 chunks will be the data of the object, you might be able to
tell if that file is import for you.


Paul


>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
>> Does this mean that the lost object isn't even a file that appears in the
>> ceph directory. Maybe a leftover of a file that has not been deleted
>> properly? It wouldn't be an issue to mark the object as lost in that case.
>> On 24.05.19 5:08 nachm., Robert LeBlanc wrote:
>>
>> You need to use the first stripe of the object as that is the only one
>> with the metadata.
>>
>> Try "rados -p ec31 getxattr 10004dfce92. parent" instead.
>>
>> Robert LeBlanc
>>
>> Sent from a mobile device, please excuse any typos.
>>
>> On Fri, May 24, 2019, 4:42 AM Kevin Flöh  wrote:
>>
>>> Hi,
>>>
>>> we already tried "rados -p ec31 getxattr 10004dfce92.003d parent"
>>> but this is just hanging forever if we are looking for unfound objects. It
>>> works fine for all other objects.
>>>
>>> We also tried scanning the ceph directory with find -inum 1099593404050
>>> (decimal of 10004dfce92) and found nothing. This is also working for non
>>> unfound objects.
>>>
>>> Is there another way to find the corresponding file?
>>> On 24.05.19 11:12 vorm., Burkhard Linke wrote:
>>>
>>> Hi,
>>> On 5/24/19 9:48 AM, Kevin Flöh wrote:
>>>
>>> We got the object ids of the missing objects with ceph pg 1.24c
>>> list_missing:
>>>
>>> {
>>> "offset": {
>>> "oid": "",
>>> "key": "",
>>> "snapid": 0,
>>> "hash": 0,
>>> "max": 0,
>>> "pool": -9223372036854775808,
>>> "namespace": ""
>>> },
>>> "num_missing": 1,
>>> "num_unfound": 1,
>>> "objects": [
>>> {
>>> "oid": {
>>> "oid": "10004dfce92.003d",
>>> "key": "",
>>> "snapid": -2,
>>> "hash": 90219084,
>>> "max": 0,
>>> "pool": 1,
>>> "namespace": ""
>>> },
>>> "need": "46950'195355",
>>> "have": "0'0",
>>> "flags": "none",
>>> "locations": [
>>> "36(3)",
>>> "61(2)"
>>> ]
>>> }
>>> ],
>>> "more": false
>>> }
>>>
>>> we want to give up those objects with:
>>>
>>> ceph pg 1.24c mark_unfound_lost revert
>>>
>>> But first we would like to know which file(s) is affected. Is there a way 
>>> to map the object id to the corresponding file?
>>>
>>>
>>> The object name is composed of the file inode id and the chunk within
>>> the file. The first chunk has some metadata you can use to retrieve the
>>> filename. See the 'CephFS object mapping' thread on the mailing list for
>>> more information.
>>>
>>>
>>> Regards,
>>>
>>> Burkhard
>>>
>>>
>>> ___
>>> ceph-users mailing 
>>> listceph-us...@lists.ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com