[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
Just in case, maybe this blog post contains some useful hints: https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-hard-way/ Its on a rather old ceph version, but the operations with objects might still be relevant. It requires that at least 1 OSD has a valid copy though. You should try to find out which file/image this object belongs to from the user's perspective. If you have a backup/snapshot, you could mark the object as lost and restore a copy of the file/image from backup/snapshot. That's what others did in this situation. You need to search this list for how to find that information. I believe there was something with ceph-encoder and low-level rados commands. Search for recovery_unfound and "und=found object", there should be many posts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, June 26, 2023 12:18 PM To: Jorge JP; Stefan Kooman; ceph-users@ceph.io Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hi Jorge, neither do I. You will need to wait for help on the list or try to figure something out with the docs. Please be patient, a mark-unfound-lost is only needed if everything else has been tried and failed. Until then, clients that don't access the broken object should work fine. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jorge JP Sent: Monday, June 26, 2023 11:56 AM To: Frank Schilder; Stefan Kooman; ceph-users@ceph.io Subject: RE: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hello Frank, Thank you. I ran the next command: ceph pg 32.15c list_unfound I located the object but I don't know how solve this problem. { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "rbd_data.aedf52e8a44410.021f", "key": "", "snapid": -2, "hash": 358991196, "max": 0, "pool": 32, "namespace": "" }, "need": "49128'125646582", "have": "0'0", "flags": "none", "clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1", "locations": [] } ], "more": false Thank you. De: Frank Schilder Enviado: lunes, 26 de junio de 2023 11:43 Para: Jorge JP ; Stefan Kooman ; ceph-users@ceph.io Asunto: Re: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent I don't think pg repair will work. It looks like a 2(1) replicated pool where both OSDs seem to have accepted writes while the other was down and now the PG can't decide what is the true latest version. Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will need to figure out what files/objects are affected and either update the missing copy or delete the object manually. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jorge JP Sent: Monday, June 26, 2023 11:34 AM To: Stefan Kooman; ceph-users@ceph.io Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hello Stefan, I run this command yesterday but the status not changed. Other pgs with status "inconsistent" was repaired after a day, but in this case, not works. instructing pg 32.15c on osd.49 to repair Normally, the pg will changed to repair but not. De: Stefan Kooman Enviado: lunes, 26 de junio de 2023 11:27 Para: Jorge JP ; ceph-users@ceph.io Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent On 6/26/23 08:38, Jorge JP wrote: > Hello, > > After deep-scrub my cluster shown this error: > > HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data > damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: > 2/77158878 objects degraded (0.000%), 1 pg degraded > [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) > pg 32.15c has 1 unfound objects > [ERR] OSD_SCRUB_ERRORS: 1 scrub errors > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg > inconsistent > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded > (0.000%), 1 pg degraded &
[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
Hi Jorge, neither do I. You will need to wait for help on the list or try to figure something out with the docs. Please be patient, a mark-unfound-lost is only needed if everything else has been tried and failed. Until then, clients that don't access the broken object should work fine. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jorge JP Sent: Monday, June 26, 2023 11:56 AM To: Frank Schilder; Stefan Kooman; ceph-users@ceph.io Subject: RE: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hello Frank, Thank you. I ran the next command: ceph pg 32.15c list_unfound I located the object but I don't know how solve this problem. { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "rbd_data.aedf52e8a44410.021f", "key": "", "snapid": -2, "hash": 358991196, "max": 0, "pool": 32, "namespace": "" }, "need": "49128'125646582", "have": "0'0", "flags": "none", "clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1", "locations": [] } ], "more": false Thank you. De: Frank Schilder Enviado: lunes, 26 de junio de 2023 11:43 Para: Jorge JP ; Stefan Kooman ; ceph-users@ceph.io Asunto: Re: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent I don't think pg repair will work. It looks like a 2(1) replicated pool where both OSDs seem to have accepted writes while the other was down and now the PG can't decide what is the true latest version. Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will need to figure out what files/objects are affected and either update the missing copy or delete the object manually. Best regards, ============= Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jorge JP Sent: Monday, June 26, 2023 11:34 AM To: Stefan Kooman; ceph-users@ceph.io Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hello Stefan, I run this command yesterday but the status not changed. Other pgs with status "inconsistent" was repaired after a day, but in this case, not works. instructing pg 32.15c on osd.49 to repair Normally, the pg will changed to repair but not. De: Stefan Kooman Enviado: lunes, 26 de junio de 2023 11:27 Para: Jorge JP ; ceph-users@ceph.io Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent On 6/26/23 08:38, Jorge JP wrote: > Hello, > > After deep-scrub my cluster shown this error: > > HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data > damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: > 2/77158878 objects degraded (0.000%), 1 pg degraded > [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) > pg 32.15c has 1 unfound objects > [ERR] OSD_SCRUB_ERRORS: 1 scrub errors > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg > inconsistent > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded > (0.000%), 1 pg degraded > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > > > I searching in internet how it solves, but I'm confusing.. > > Anyone can help me? Does "ceph pg repair 32.15c" work for you? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
Hello Frank, Thank you. I ran the next command: ceph pg 32.15c list_unfound I located the object but I don't know how solve this problem. { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "rbd_data.aedf52e8a44410.021f", "key": "", "snapid": -2, "hash": 358991196, "max": 0, "pool": 32, "namespace": "" }, "need": "49128'125646582", "have": "0'0", "flags": "none", "clean_regions": "clean_offsets: [], clean_omap: 0, new_object: 1", "locations": [] } ], "more": false Thank you. De: Frank Schilder Enviado: lunes, 26 de junio de 2023 11:43 Para: Jorge JP ; Stefan Kooman ; ceph-users@ceph.io Asunto: Re: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent I don't think pg repair will work. It looks like a 2(1) replicated pool where both OSDs seem to have accepted writes while the other was down and now the PG can't decide what is the true latest version. Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will need to figure out what files/objects are affected and either update the missing copy or delete the object manually. Best regards, ============= Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jorge JP Sent: Monday, June 26, 2023 11:34 AM To: Stefan Kooman; ceph-users@ceph.io Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hello Stefan, I run this command yesterday but the status not changed. Other pgs with status "inconsistent" was repaired after a day, but in this case, not works. instructing pg 32.15c on osd.49 to repair Normally, the pg will changed to repair but not. De: Stefan Kooman Enviado: lunes, 26 de junio de 2023 11:27 Para: Jorge JP ; ceph-users@ceph.io Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent On 6/26/23 08:38, Jorge JP wrote: > Hello, > > After deep-scrub my cluster shown this error: > > HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data > damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: > 2/77158878 objects degraded (0.000%), 1 pg degraded > [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) > pg 32.15c has 1 unfound objects > [ERR] OSD_SCRUB_ERRORS: 1 scrub errors > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg > inconsistent > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded > (0.000%), 1 pg degraded > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > > > I searching in internet how it solves, but I'm confusing.. > > Anyone can help me? Does "ceph pg repair 32.15c" work for you? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
I don't think pg repair will work. It looks like a 2(1) replicated pool where both OSDs seem to have accepted writes while the other was down and now the PG can't decide what is the true latest version. Using size 2 min-size 1 comes with manual labor. As far as I can tell, you will need to figure out what files/objects are affected and either update the missing copy or delete the object manually. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Jorge JP Sent: Monday, June 26, 2023 11:34 AM To: Stefan Kooman; ceph-users@ceph.io Subject: [ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent Hello Stefan, I run this command yesterday but the status not changed. Other pgs with status "inconsistent" was repaired after a day, but in this case, not works. instructing pg 32.15c on osd.49 to repair Normally, the pg will changed to repair but not. De: Stefan Kooman Enviado: lunes, 26 de junio de 2023 11:27 Para: Jorge JP ; ceph-users@ceph.io Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent On 6/26/23 08:38, Jorge JP wrote: > Hello, > > After deep-scrub my cluster shown this error: > > HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data > damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: > 2/77158878 objects degraded (0.000%), 1 pg degraded > [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) > pg 32.15c has 1 unfound objects > [ERR] OSD_SCRUB_ERRORS: 1 scrub errors > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg > inconsistent > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded > (0.000%), 1 pg degraded > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > > > I searching in internet how it solves, but I'm confusing.. > > Anyone can help me? Does "ceph pg repair 32.15c" work for you? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
Hello Stefan, I run this command yesterday but the status not changed. Other pgs with status "inconsistent" was repaired after a day, but in this case, not works. instructing pg 32.15c on osd.49 to repair Normally, the pg will changed to repair but not. De: Stefan Kooman Enviado: lunes, 26 de junio de 2023 11:27 Para: Jorge JP ; ceph-users@ceph.io Asunto: Re: [ceph-users] Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent On 6/26/23 08:38, Jorge JP wrote: > Hello, > > After deep-scrub my cluster shown this error: > > HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data > damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: > 2/77158878 objects degraded (0.000%), 1 pg degraded > [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) > pg 32.15c has 1 unfound objects > [ERR] OSD_SCRUB_ERRORS: 1 scrub errors > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg > inconsistent > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded > (0.000%), 1 pg degraded > pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting > [49,47], 1 unfound > > > I searching in internet how it solves, but I'm confusing.. > > Anyone can help me? Does "ceph pg repair 32.15c" work for you? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
On 6/26/23 08:38, Jorge JP wrote: Hello, After deep-scrub my cluster shown this error: HEALTH_ERR 1/38578006 objects unfound (0.000%); 1 scrub errors; Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent; Degraded data redundancy: 2/77158878 objects degraded (0.000%), 1 pg degraded [WRN] OBJECT_UNFOUND: 1/38578006 objects unfound (0.000%) pg 32.15c has 1 unfound objects [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting [49,47], 1 unfound [WRN] PG_DEGRADED: Degraded data redundancy: 2/77158878 objects degraded (0.000%), 1 pg degraded pg 32.15c is active+recovery_unfound+degraded+inconsistent, acting [49,47], 1 unfound I searching in internet how it solves, but I'm confusing.. Anyone can help me? Does "ceph pg repair 32.15c" work for you? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io