[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Frank Schilder
Just in case, maybe this blog post contains some useful hints:
https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/

Its on a rather old ceph version, but the operations with objects might still 
be relevant. It requires that at least 1 OSD has a valid copy though.

You should try to find out which file/image this object belongs to from the 
user's perspective. If you have a backup/snapshot, you could mark the object as 
lost and restore a copy of the file/image from backup/snapshot. That's what 
others did in this situation.

You need to search this list for how to find that information. I believe there 
was something with ceph-encoder and low-level rados commands. Search for 
recovery_unfound and "und=found object", there should be many posts.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Frank Schilder 
Sent: Monday, June 26, 2023 12:18 PM
To: Jorge JP; Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hi Jorge,

neither do I. You will need to wait for help on the list or try to figure 
something out with the docs.

Please be patient, a mark-unfound-lost is only needed if everything else has 
been tried and failed. Until then, clients that don't access the broken object 
should work fine.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge JP 
Sent: Monday, June 26, 2023 11:56 AM
To: Frank Schilder; Stefan Kooman; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hello Frank,

Thank you. I ran the next command: ceph pg 32.15c list_unfound

I located the object but I don't know how solve this problem.

{
"num_missing": 1,
"num_unfound": 1,
"objects": [
{
"oid": {
"oid": "rbd_data.aedf52e8a44410.021f",
"key": "",
"snapid": -2,
"hash": 358991196,
"max": 0,
"pool": 32,
"namespace": ""
},
"need": "49128'125646582",
"have": "0'0",
"flags": "none",
"clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1",
        "locations": []
        }
],
    "more": false


Thank you.


De: Frank Schilder 
Enviado: lunes, 26 de junio de 2023 11:43
Para: Jorge JP ; Stefan Kooman ; 
ceph-users@ceph.io 
Asunto: Re: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

I don't think pg repair will work. It looks like a 2(1) replicated pool where 
both OSDs seem to have accepted writes while the other was down and now the PG 
can't decide what is the true latest version.

Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will 
need to figure out what files/objects are affected and either update the 
missing copy or delete the object manually.

Best regards,
=====
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge JP 
Sent: Monday, June 26, 2023 11:34 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hello Stefan,

I run this command yesterday but the status not changed. Other pgs with status 
"inconsistent" was repaired after a day, but in this case, not works.

instructing pg 32.15c on osd.49 to repair

Normally, the pg will changed to repair but not.


De: Stefan Kooman 
Enviado: lunes, 26 de junio de 2023 11:27
Para: Jorge JP ; ceph-users@ceph.io 
Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

On 6/26/23 08:38, Jorge JP wrote:
> Hello,
>
> After deep-scrub my cluster shown this error:
>
> HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
> damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
> 2/77158878 objects degraded (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
>  pg 32.15c has 1 unfound objects
> [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg 
> inconsistent
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
> (0.000%), 1 pg degraded
&

[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Frank Schilder
Hi Jorge,

neither do I. You will need to wait for help on the list or try to figure 
something out with the docs.

Please be patient, a mark-unfound-lost is only needed if everything else has 
been tried and failed. Until then, clients that don't access the broken object 
should work fine.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge JP 
Sent: Monday, June 26, 2023 11:56 AM
To: Frank Schilder; Stefan Kooman; ceph-users@ceph.io
Subject: RE: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hello Frank,

Thank you. I ran the next command: ceph pg 32.15c list_unfound

I located the object but I don't know how solve this problem.

{
"num_missing": 1,
"num_unfound": 1,
"objects": [
{
"oid": {
"oid": "rbd_data.aedf52e8a44410.021f",
"key": "",
"snapid": -2,
"hash": 358991196,
"max": 0,
"pool": 32,
"namespace": ""
},
"need": "49128'125646582",
"have": "0'0",
"flags": "none",
"clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1",
        "locations": []
    }
],
    "more": false


Thank you.


De: Frank Schilder 
Enviado: lunes, 26 de junio de 2023 11:43
Para: Jorge JP ; Stefan Kooman ; 
ceph-users@ceph.io 
Asunto: Re: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

I don't think pg repair will work. It looks like a 2(1) replicated pool where 
both OSDs seem to have accepted writes while the other was down and now the PG 
can't decide what is the true latest version.

Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will 
need to figure out what files/objects are affected and either update the 
missing copy or delete the object manually.

Best regards,
=============
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge JP 
Sent: Monday, June 26, 2023 11:34 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hello Stefan,

I run this command yesterday but the status not changed. Other pgs with status 
"inconsistent" was repaired after a day, but in this case, not works.

instructing pg 32.15c on osd.49 to repair

Normally, the pg will changed to repair but not.


De: Stefan Kooman 
Enviado: lunes, 26 de junio de 2023 11:27
Para: Jorge JP ; ceph-users@ceph.io 
Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

On 6/26/23 08:38, Jorge JP wrote:
> Hello,
>
> After deep-scrub my cluster shown this error:
>
> HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
> damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
> 2/77158878 objects degraded (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
>  pg 32.15c has 1 unfound objects
> [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg 
> inconsistent
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
> (0.000%), 1 pg degraded
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
>
>
> I searching in internet how it solves, but I'm confusing..
>
> Anyone can help me?

Does "ceph pg repair 32.15c" work for you?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Jorge JP
Hello Frank,

Thank you. I ran the next command: ceph pg 32.15c list_unfound

I located the object but I don't know how solve this problem.

{
"num_missing": 1,
"num_unfound": 1,
"objects": [
{
"oid": {
"oid": "rbd_data.aedf52e8a44410.021f",
"key": "",
"snapid": -2,
"hash": 358991196,
"max": 0,
"pool": 32,
"namespace": ""
},
"need": "49128'125646582",
"have": "0'0",
"flags": "none",
"clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1",
    "locations": []
        }
],
"more": false


Thank you.


De: Frank Schilder 
Enviado: lunes, 26 de junio de 2023 11:43
Para: Jorge JP ; Stefan Kooman ; 
ceph-users@ceph.io 
Asunto: Re: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

I don't think pg repair will work. It looks like a 2(1) replicated pool where 
both OSDs seem to have accepted writes while the other was down and now the PG 
can't decide what is the true latest version.

Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will 
need to figure out what files/objects are affected and either update the 
missing copy or delete the object manually.

Best regards,
=============
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge JP 
Sent: Monday, June 26, 2023 11:34 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hello Stefan,

I run this command yesterday but the status not changed. Other pgs with status 
"inconsistent" was repaired after a day, but in this case, not works.

instructing pg 32.15c on osd.49 to repair

Normally, the pg will changed to repair but not.


De: Stefan Kooman 
Enviado: lunes, 26 de junio de 2023 11:27
Para: Jorge JP ; ceph-users@ceph.io 
Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

On 6/26/23 08:38, Jorge JP wrote:
> Hello,
>
> After deep-scrub my cluster shown this error:
>
> HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
> damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
> 2/77158878 objects degraded (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
>  pg 32.15c has 1 unfound objects
> [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg 
> inconsistent
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
> (0.000%), 1 pg degraded
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
>
>
> I searching in internet how it solves, but I'm confusing..
>
> Anyone can help me?

Does "ceph pg repair 32.15c" work for you?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Frank Schilder
I don't think pg repair will work. It looks like a 2(1) replicated pool where 
both OSDs seem to have accepted writes while the other was down and now the PG 
can't decide what is the true latest version.

Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will 
need to figure out what files/objects are affected and either update the 
missing copy or delete the object manually.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Jorge JP 
Sent: Monday, June 26, 2023 11:34 AM
To: Stefan Kooman; ceph-users@ceph.io
Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

Hello Stefan,

I run this command yesterday but the status not changed. Other pgs with status 
"inconsistent" was repaired after a day, but in this case, not works.

instructing pg 32.15c on osd.49 to repair

Normally, the pg will changed to repair but not.


De: Stefan Kooman 
Enviado: lunes, 26 de junio de 2023 11:27
Para: Jorge JP ; ceph-users@ceph.io 
Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

On 6/26/23 08:38, Jorge JP wrote:
> Hello,
>
> After deep-scrub my cluster shown this error:
>
> HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
> damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
> 2/77158878 objects degraded (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
>  pg 32.15c has 1 unfound objects
> [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg 
> inconsistent
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
> (0.000%), 1 pg degraded
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
>
>
> I searching in internet how it solves, but I'm confusing..
>
> Anyone can help me?

Does "ceph pg repair 32.15c" work for you?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Jorge JP
Hello Stefan,

I run this command yesterday but the status not changed. Other pgs with status 
"inconsistent" was repaired after a day, but in this case, not works.

instructing pg 32.15c on osd.49 to repair

Normally, the pg will changed to repair but not.


De: Stefan Kooman 
Enviado: lunes, 26 de junio de 2023 11:27
Para: Jorge JP ; ceph-users@ceph.io 
Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg 
inconsistent

On 6/26/23 08:38, Jorge JP wrote:
> Hello,
>
> After deep-scrub my cluster shown this error:
>
> HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
> damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
> 2/77158878 objects degraded (0.000%), 1 pg degraded
> [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
>  pg 32.15c has 1 unfound objects
> [ERR] OSD_SCRUB_ERRORS: 1 scrub errors
> [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg 
> inconsistent
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
> (0.000%), 1 pg degraded
>  pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
> [49,47], 1 unfound
>
>
> I searching in internet how it solves, but I'm confusing..
>
> Anyone can help me?

Does "ceph pg repair 32.15c" work for you?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent

2023-06-26 Thread Stefan Kooman

On 6/26/23 08:38, Jorge JP wrote:

Hello,

After deep-scrub my cluster shown this error:

HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data 
damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 
2/77158878 objects degraded (0.000%), 1 pg degraded
[WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%)
 pg 32.15c has 1 unfound objects
[ERR] OSD_SCRUB_ERRORS: 1 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
 pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
[49,47], 1 unfound
[WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded 
(0.000%), 1 pg degraded
 pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting 
[49,47], 1 unfound


I searching in internet how it solves, but I'm confusing..

Anyone can help me?


Does "ceph pg repair 32.15c" work for you?

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io