[ceph-users] Re: PG inconsistent

2024-04-12 Thread Anthony D'Atri
If you're using an Icinga active check that just looks for 

SMART overall-health self-assessment test result: PASSED

then it's not doing much for you.  That bivalue status can be shown for a drive 
that is decidedly an ex-parrot.  Gotta look at specific attributes, which is 
thorny since they aren't consistently implemented.  drivedb.h is a downright 
mess, which doesn't help.

> 
> 
> 
> 
> - Le 12 Avr 24, à 15:17, Albert Shih albert.s...@obspm.fr a écrit :
> 
>> Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit
>>> 
>> Hi,
>> 
>>> 
>>> Have you check the hardware status of the involved drives other than with
>>> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for 
>>> DELL
>>> hardware for example).
>> 
>> Yes, all my disk are «under» periodic check with smartctl + icinga.
> 
> Actually, I meant lower level tools (drive / server vendor tools).
> 
>> 
>>> If these tools don't report any media error (that is bad blocs on disks) 
>>> then
>>> you might just be facing the bit rot phenomenon. But this is very rare and
>>> should happen in a sysadmin's lifetime as often as a Royal Flush hand in a
>>> professional poker player's lifetime. ;-)
>>> 
>>> If no media error is reported, then you might want to check and update the
>>> firmware of all drives.
>> 
>> You're perfectly right.
>> 
>> It's just a newbie error, I check on the «main» osd of the PG (meaning the
>> first in the list) but forget to check on other.
>> 
> 
> Ok.
> 
>> On when server I indeed get some error on a disk.
>> 
>> But strangely smartctl report nothing. I will add a check with dmesg.
> 
> That's why I pointed you to the drive / server vendor tools earlier as 
> sometimes smartctl is missing the information you want.
> 
>> 
>>> 
>>> Once you figured it out, you may enable osd_scrub_auto_repair=true to have 
>>> these
>>> inconsistencies repaired automatically on deep-scrubbing, but make sure 
>>> you're
>>> using the alert module [1] so to at least get informed about the scrub 
>>> errors.
>> 
>> Thanks. I will look into because we got already icinga2 on site so I use
>> icinga2 to check the cluster.
>> 
>> Is they are a list of what the alert module going to check ?
> 
> Basically the module checks for ceph status (ceph -s) changes.
> 
> https://github.com/ceph/ceph/blob/main/src/pybind/mgr/alerts/module.py
> 
> Regards,
> Frédéric.
> 
>> 
>> 
>> Regards
>> 
>> JAS
>> --
>> Albert SHIH 嶺 
>> France
>> Heure locale/Local time:
>> ven. 12 avril 2024 15:13:13 CEST
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent

2024-04-12 Thread Frédéric Nass


- Le 12 Avr 24, à 15:17, Albert Shih albert.s...@obspm.fr a écrit :

> Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit
>> 
> Hi,
> 
>> 
>> Have you check the hardware status of the involved drives other than with
>> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for 
>> DELL
>> hardware for example).
> 
> Yes, all my disk are «under» periodic check with smartctl + icinga.

Actually, I meant lower level tools (drive / server vendor tools).

> 
>> If these tools don't report any media error (that is bad blocs on disks) then
>> you might just be facing the bit rot phenomenon. But this is very rare and
>> should happen in a sysadmin's lifetime as often as a Royal Flush hand in a
>> professional poker player's lifetime. ;-)
>> 
>> If no media error is reported, then you might want to check and update the
>> firmware of all drives.
> 
> You're perfectly right.
> 
> It's just a newbie error, I check on the «main» osd of the PG (meaning the
> first in the list) but forget to check on other.
> 

Ok.

> On when server I indeed get some error on a disk.
> 
> But strangely smartctl report nothing. I will add a check with dmesg.

That's why I pointed you to the drive / server vendor tools earlier as 
sometimes smartctl is missing the information you want.

> 
>> 
>> Once you figured it out, you may enable osd_scrub_auto_repair=true to have 
>> these
>> inconsistencies repaired automatically on deep-scrubbing, but make sure 
>> you're
>> using the alert module [1] so to at least get informed about the scrub 
>> errors.
> 
> Thanks. I will look into because we got already icinga2 on site so I use
> icinga2 to check the cluster.
> 
> Is they are a list of what the alert module going to check ?

Basically the module checks for ceph status (ceph -s) changes.

https://github.com/ceph/ceph/blob/main/src/pybind/mgr/alerts/module.py

Regards,
Frédéric.

> 
> 
> Regards
> 
> JAS
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> ven. 12 avril 2024 15:13:13 CEST
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent

2024-04-12 Thread Albert Shih
Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit
> 
Hi, 

> 
> Have you check the hardware status of the involved drives other than with 
> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for 
> DELL hardware for example).

Yes, all my disk are «under» periodic check with smartctl + icinga. 

> If these tools don't report any media error (that is bad blocs on disks) then 
> you might just be facing the bit rot phenomenon. But this is very rare and 
> should happen in a sysadmin's lifetime as often as a Royal Flush hand in a 
> professional poker player's lifetime. ;-)
> 
> If no media error is reported, then you might want to check and update the 
> firmware of all drives.

You're perfectly right. 

It's just a newbie error, I check on the «main» osd of the PG (meaning the
first in the list) but forget to check on other. 

On when server I indeed get some error on a disk.

But strangely smartctl report nothing. I will add a check with dmesg. 

> 
> Once you figured it out, you may enable osd_scrub_auto_repair=true to have 
> these inconsistencies repaired automatically on deep-scrubbing, but make sure 
> you're using the alert module [1] so to at least get informed about the scrub 
> errors.

Thanks. I will look into because we got already icinga2 on site so I use
icinga2 to check the cluster. 

Is they are a list of what the alert module going to check ? 


Regards

JAS
-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
ven. 12 avril 2024 15:13:13 CEST
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent

2024-04-12 Thread Wesley Dillingham
check your ceph.log on the mons for "stat mismatch" and grep for the PG in
question for potentially more information.

Additionally "rados list-inconsistent-obj {pgid}" will often show which OSD
and objects are implicated for the inconsistency. If the acting set has
changed since the scrub (for example an osd is removed or failed) in which
the inconsistency was found this data wont be there any longer and you
would need to deep-scrub the PG again to get that information.

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 12, 2024 at 6:56 AM Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

>
> Hello Albert,
>
> Have you check the hardware status of the involved drives other than with
> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for
> DELL hardware for example).
> If these tools don't report any media error (that is bad blocs on disks)
> then you might just be facing the bit rot phenomenon. But this is very rare
> and should happen in a sysadmin's lifetime as often as a Royal Flush hand
> in a professional poker player's lifetime. ;-)
>
> If no media error is reported, then you might want to check and update the
> firmware of all drives.
>
> Once you figured it out, you may enable osd_scrub_auto_repair=true to have
> these inconsistencies repaired automatically on deep-scrubbing, but make
> sure you're using the alert module [1] so to at least get informed about
> the scrub errors.
>
> Regards,
> Frédéric.
>
> [1] https://docs.ceph.com/en/latest/mgr/alerts/
>
> - Le 12 Avr 24, à 11:59, Albert Shih albert.s...@obspm.fr a écrit :
>
> > Hi everyone.
> >
> > I got a warning with
> >
> > root@cthulhu1:/etc/ceph# ceph -s
> >  cluster:
> >id: 9c5bb196-c212-11ee-84f3-c3f2beae892d
> >health: HEALTH_ERR
> >1 scrub errors
> >Possible data damage: 1 pg inconsistent
> >
> > So I find the pg with the issue, and launch a pg repair (still waiting)
> >
> > But I try to find «why» so I check all the OSD related on this pg and
> > didn't find anything, no error from osd daemon, no errors from smartctl,
> no
> > error from the kernel message.
> >
> > So I just like to know if that's «normal» or should I scratch deeper.
> >
> > JAS
> > --
> > Albert SHIH 嶺 
> > France
> > Heure locale/Local time:
> > ven. 12 avril 2024 11:51:37 CEST
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent

2024-04-12 Thread Frédéric Nass

Hello Albert,

Have you check the hardware status of the involved drives other than with 
smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for DELL 
hardware for example).
If these tools don't report any media error (that is bad blocs on disks) then 
you might just be facing the bit rot phenomenon. But this is very rare and 
should happen in a sysadmin's lifetime as often as a Royal Flush hand in a 
professional poker player's lifetime. ;-)

If no media error is reported, then you might want to check and update the 
firmware of all drives.

Once you figured it out, you may enable osd_scrub_auto_repair=true to have 
these inconsistencies repaired automatically on deep-scrubbing, but make sure 
you're using the alert module [1] so to at least get informed about the scrub 
errors.

Regards,
Frédéric.

[1] https://docs.ceph.com/en/latest/mgr/alerts/

- Le 12 Avr 24, à 11:59, Albert Shih albert.s...@obspm.fr a écrit :

> Hi everyone.
> 
> I got a warning with
> 
> root@cthulhu1:/etc/ceph# ceph -s
>  cluster:
>id: 9c5bb196-c212-11ee-84f3-c3f2beae892d
>health: HEALTH_ERR
>1 scrub errors
>Possible data damage: 1 pg inconsistent
> 
> So I find the pg with the issue, and launch a pg repair (still waiting)
> 
> But I try to find «why» so I check all the OSD related on this pg and
> didn't find anything, no error from osd daemon, no errors from smartctl, no
> error from the kernel message.
> 
> So I just like to know if that's «normal» or should I scratch deeper.
> 
> JAS
> --
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> ven. 12 avril 2024 11:51:37 CEST
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent+failed_repair

2021-06-24 Thread Vladimir Prokofev
Followup. This is what's written in logs when I try to fix one PG:
ceph pg repair 3.60

primary osd log:
2021-06-25 01:07:32.146 7fc006339700 -1 log_channel(cluster) log [ERR] :
repair 3.53 3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 : is
an unexpected clone
2021-06-25 01:07:32.146 7fc006339700 -1 osd.6 pg_epoch: 210926 pg[3.53( v
210926'64271902 (210920'64268839,210926'64271902]
local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f
210883/210883/5620 210811/210882/210882) [6,22,12] r=0 lpr=210882
luod=210926'64271899 crt=210926'64271902 lcod 210926'64271898 mlcod
210926'64271898 active+clean+scrubbing+deep+inconsistent+repair]
_scan_snaps no clone_snaps for
3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{}

secondary osd 1:
2021-06-25 01:07:31.934 7f9eae8fa700 -1 osd.22 pg_epoch: 210926 pg[3.53( v
210926'64271899 (210920'64268839,210926'64271899]
local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f
210883/210883/5620 210811/210882/210882) [6,22,12] r=1 lpr=210882 luod=0'0
lua=210881'64265352 crt=210926'64271899 lcod 210926'64271898
active+inconsistent mbc={}] _scan_snaps no clone_snaps for
3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{}

secondary osd 2:
2021-06-25 01:07:30.828 7f94d6e61700 -1 osd.12 pg_epoch: 210926 pg[3.53( v
210926'64271899 (210920'64268839,210926'64271899]
local-lis/les=210882/210883 n=6046 ec=56/56 lis/c 210882/210882 les/c/f
210883/210883/5620 210811/210882/210882) [6,22,12] r=2 lpr=210882 luod=0'0
lua=210881'64265352 crt=210926'64271899 lcod 210926'64271898
active+inconsistent mbc={}] _scan_snaps no clone_snaps for
3:cb4336ff:::rbd_data.e2d302dd699130.69b3:6aa5 in 6aa5=[6aa5]:{}

And nothing happens, it's still in a failed_repair state.

пт, 25 июн. 2021 г. в 00:36, Vladimir Prokofev :

> Hello.
>
> Today we've experienced a complete CEPH cluster outage - total loss of
> power in the whole infrastructure.
> 6 osd nodes and 3 monitors went down at the same time. CEPH 14.2.10
>
> This resulted in unfound objects, which were "reverted" in a hurry with
> ceph pg  mark_unfound_lost revert
> In retrospect that was probably a mistake as the "have" part stated 0'0.
>
> But then deep-scrubs started and they found inconsistent PGs. We tried
> repairing them, but they just switched to failed_repair.
>
> Here's a log example:
> 2021-06-25 00:08:07.693645 osd.0 [ERR] 3.c shard 6
> 3:3163e703:::rbd_data.be08c566ef438d.2445:head : missing
> 2021-06-25 00:08:07.693710 osd.0 [ERR] repair 3.c
> 3:3163e2ee:::rbd_data.efa86358d15f4a.004b:6ab1 : is an
> unexpected clone
> 2021-06-25 00:11:55.128951 osd.0 [ERR] 3.c repair 1 missing, 0
> inconsistent objects
> 2021-06-25 00:11:55.128969 osd.0 [ERR] 3.c repair 2 errors, 1 fixed
>
> I tried manually deleting conflicting objects from secondary osds
> with ceph-objectstore-tool like this
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22 --pgid 3.c
> rbd_data.efa86358d15f4a.004b:6ab1 remove
> it removes it but without any positive impact. Pretty sure I don't
> understand the concept.
>
> So currently I have the following thoughts:
>  - is there any doc on the object placement specifics and what all of
> those numbers in their name mean? I've seen objects with similar prefix/mid
> but different suffix and I have no idea what does it mean;
>  - I'm actually not sure what the production impact is at that point
> because everything seems to work so far. So I'm thinking if it's possible
> to kill replicas on secondary OSDd with ceph-objectstore-tool and just let
> CEPH create a replica from primary PG?
>
> I have 8 scrub errors and 4 inconsistent+failed_repair PGs, and I'm afraid
> that further deep scrubs will reveal more errors.
> Any thoughts appreciated.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-27 Thread Richard Bade
Thanks Dan and Anthony your suggestions have pointed me in the right
direction. Looking back through the logs at when the first error was
detected I found this:

ceph-osd: 2021-01-24 01:04:55.905 7f0c17821700 -1 log_channel(cluster)
log [ERR] : 17.7ffs0 scrub : stat mismatch, got 112867/112868 objects,
0/0 clones, 112867/112868 dirty, 0/0 omap, 0/0 pinned, 0/0
hit_set_archive, 0/0 whiteouts, 473372381184/473376575488 bytes, 0/0
manifest objects, 0/0 hit_set_archive bytes.

As Anthony suggested the error is not in the rados objects but
actually the stats.
I assume that a repair will fix this up?

Thanks again everyone.
Rich

On Thu, 28 Jan 2021 at 03:59, Dan van der Ster  wrote:
>
> Usually the ceph.log prints the reason for the inconsistency when it
> is first detected by scrubbing.
>
> -- dan
>
> On Wed, Jan 27, 2021 at 12:41 AM Richard Bade  wrote:
> >
> > Hi Everyone,
> > I also have seen this inconsistent with empty when you do 
> > list-inconsistent-obj
> >
> > $ sudo ceph health detail
> > HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
> > pgs not deep-scrubbed in time
> > OSD_SCRUB_ERRORS 1 scrub errors
> > PG_DAMAGED Possible data damage: 1 pg inconsistent
> > pg 17.7ff is active+clean+inconsistent, acting [232,242,34,280,266,21]
> > PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
> > pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811
> >
> > $ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
> > {
> > "epoch": 183807,
> > "inconsistents": []
> > }
> >
> > Usually these are caused by read errors on the disks, but I've checked
> > all osd hosts that are part of this osd and there's no smart or dmesg
> > errors.
> >
> > Rich
> >
> > --
> > >
> > > Date: Sun, 17 Jan 2021 14:00:01 +0330
> > > From: Seena Fallah 
> > > Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
> > > objects
> > > To: "Alexander E. Patrakov" 
> > > Cc: ceph-users 
> > > Message-ID:
> > > 
> > > 
> > > Content-Type: text/plain; charset="UTF-8"
> > >
> > > It's for a long time ago and I don't have the `ceph health detail` output!
> > >
> > > On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov 
> > > wrote:
> > >
> > > > For a start, please post the "ceph health detail" output.
> > > >
> > > > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm facing something strange! One of the PGs in my pool got 
> > > > > inconsistent
> > > > > and when I run `rados list-inconsistent-obj $PG_ID 
> > > > > --format=json-pretty`
> > > > > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> > > > or..?
> > > > >
> > > > > Thanks.
> > > > > ___
> > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> > > >
> > > > --
> > > > Alexander E. Patrakov
> > > > CV: http://u.pc.cd/wT8otalK
> > > >
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-27 Thread Dan van der Ster
Usually the ceph.log prints the reason for the inconsistency when it
is first detected by scrubbing.

-- dan

On Wed, Jan 27, 2021 at 12:41 AM Richard Bade  wrote:
>
> Hi Everyone,
> I also have seen this inconsistent with empty when you do 
> list-inconsistent-obj
>
> $ sudo ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
> pgs not deep-scrubbed in time
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 17.7ff is active+clean+inconsistent, acting [232,242,34,280,266,21]
> PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
> pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811
>
> $ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
> {
> "epoch": 183807,
> "inconsistents": []
> }
>
> Usually these are caused by read errors on the disks, but I've checked
> all osd hosts that are part of this osd and there's no smart or dmesg
> errors.
>
> Rich
>
> ----------
> >
> > Date: Sun, 17 Jan 2021 14:00:01 +0330
> > From: Seena Fallah 
> > Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
> > objects
> > To: "Alexander E. Patrakov" 
> > Cc: ceph-users 
> > Message-ID:
> > 
> > Content-Type: text/plain; charset="UTF-8"
> >
> > It's for a long time ago and I don't have the `ceph health detail` output!
> >
> > On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov 
> > wrote:
> >
> > > For a start, please post the "ceph health detail" output.
> > >
> > > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > > >
> > > > Hi,
> > > >
> > > > I'm facing something strange! One of the PGs in my pool got inconsistent
> > > > and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty`
> > > > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> > > or..?
> > > >
> > > > Thanks.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> > >
> > > --
> > > Alexander E. Patrakov
> > > CV: http://u.pc.cd/wT8otalK
> > >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-26 Thread Richard Bade
Thanks Joe for your reply.
Yes I realise I can scrub the one that's behind, that's not my issue this
time. I'm interested in the inconsistent pg.
Usually the list inconsistent obj command gives which copy is wrong and
what the issue is. In this case it reports nothing.
I don't really want to blindly repair as in the past the repair has copied
the primary over the other copies (this may have been fixed).
Usually in this situation I match up the read errors with the failing disk,
set the disk out and run a deep scrub and all is well again.

Ceph v14.2.13 by the way.

Rich

On Wed, 27 Jan 2021, 12:57 Joe Comeau,  wrote:

> just issue the commands
>
> scrub pg deep-scrub 17.1cs
> this will deep scrub this pg
>
> ceph pg repair 17.7ff
> repairs the pg
>
>
>
>
>
> >>> Richard Bade  1/26/2021 3:40 PM >>>
> Hi Everyone,
> I also have seen this inconsistent with empty when you do
> list-inconsistent-obj
>
> $ sudo ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
> pgs not deep-scrubbed in time
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 17.7ff is active+clean+inconsistent, acting [232,242,34,280,266,21]
> PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
> pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811
>
> $ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
> {
> "epoch": 183807,
> "inconsistents": []
> }
>
> Usually these are caused by read errors on the disks, but I've checked
> all osd hosts that are part of this osd and there's no smart or dmesg
> errors.
>
> Rich
>
> --
> >
> > Date: Sun, 17 Jan 2021 14:00:01 +0330
> > From: Seena Fallah 
> > Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
> > objects
> > To: "Alexander E. Patrakov" 
> > Cc: ceph-users 
> > Message-ID:
> > <
> cak3+omxvdc_x2r-kox-ui4k3osdvxh4o8zeqybztbumqmye...@mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > It's for a long time ago and I don't have the `ceph health detail`
> output!
> >
> > On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov <
> patra...@gmail.com>
> > wrote:
> >
> > > For a start, please post the "ceph health detail" output.
> > >
> > > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > > >
> > > > Hi,
> > > >
> > > > I'm facing something strange! One of the PGs in my pool got
> inconsistent
> > > > and when I run `rados list-inconsistent-obj $PG_ID
> --format=json-pretty`
> > > > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> > > or..?
> > > >
> > > > Thanks.
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> > >
> > > --
> > > Alexander E. Patrakov
> > > CV: http://u.pc.cd/wT8otalK
> > >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-26 Thread Joe Comeau
just issue the commands
 
scrub pg deep-scrub 17.1cs  
this will deep scrub this pg
 
ceph pg repair 17.7ff
repairs the pg 
 
 
 


>>> Richard Bade  1/26/2021 3:40 PM >>>
Hi Everyone,
I also have seen this inconsistent with empty when you do
list-inconsistent-obj

$ sudo ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
pgs not deep-scrubbed in time
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 17.7ff is active+clean+inconsistent, acting
[232,242,34,280,266,21]
PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811

$ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
{
"epoch": 183807,
"inconsistents": []
}

Usually these are caused by read errors on the disks, but I've checked
all osd hosts that are part of this osd and there's no smart or dmesg
errors.

Rich

--
>
> Date: Sun, 17 Jan 2021 14:00:01 +0330
> From: Seena Fallah 
> Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
>objects
> To: "Alexander E. Patrakov" 
> Cc: ceph-users 
> Message-ID:
>   

> Content-Type: text/plain; charset="UTF-8"
>
> It's for a long time ago and I don't have the `ceph health detail`
output!
>
> On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov

> wrote:
>
> > For a start, please post the "ceph health detail" output.
> >
> > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > >
> > > Hi,
> > >
> > > I'm facing something strange! One of the PGs in my pool got
inconsistent
> > > and when I run `rados list-inconsistent-obj $PG_ID
--format=json-pretty`
> > > the `inconsistents` key was empty! What is this? Is it a bug in
Ceph
> > or..?
> > >
> > > Thanks.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > --
> > Alexander E. Patrakov
> > CV: http://u.pc.cd/wT8otalK
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-26 Thread Richard Bade
Hi Everyone,
I also have seen this inconsistent with empty when you do list-inconsistent-obj

$ sudo ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent; 1
pgs not deep-scrubbed in time
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 17.7ff is active+clean+inconsistent, acting [232,242,34,280,266,21]
PG_NOT_DEEP_SCRUBBED 1 pgs not deep-scrubbed in time
pg 17.1c2 not deep-scrubbed since 2021-01-15 02:46:16.271811

$ sudo rados list-inconsistent-obj 17.7ff --format=json-pretty
{
"epoch": 183807,
"inconsistents": []
}

Usually these are caused by read errors on the disks, but I've checked
all osd hosts that are part of this osd and there's no smart or dmesg
errors.

Rich

--
>
> Date: Sun, 17 Jan 2021 14:00:01 +0330
> From: Seena Fallah 
> Subject: [ceph-users] Re: PG inconsistent with empty inconsistent
> objects
> To: "Alexander E. Patrakov" 
> Cc: ceph-users 
> Message-ID:
> 
> Content-Type: text/plain; charset="UTF-8"
>
> It's for a long time ago and I don't have the `ceph health detail` output!
>
> On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov 
> wrote:
>
> > For a start, please post the "ceph health detail" output.
> >
> > сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> > >
> > > Hi,
> > >
> > > I'm facing something strange! One of the PGs in my pool got inconsistent
> > > and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty`
> > > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> > or..?
> > >
> > > Thanks.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > --
> > Alexander E. Patrakov
> > CV: http://u.pc.cd/wT8otalK
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-17 Thread Seena Fallah
It's for a long time ago and I don't have the `ceph health detail` output!

On Sat, Jan 16, 2021 at 9:42 PM Alexander E. Patrakov 
wrote:

> For a start, please post the "ceph health detail" output.
>
> сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
> >
> > Hi,
> >
> > I'm facing something strange! One of the PGs in my pool got inconsistent
> > and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty`
> > the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> or..?
> >
> > Thanks.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Alexander E. Patrakov
> CV: http://u.pc.cd/wT8otalK
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-16 Thread Alexander E. Patrakov
For a start, please post the "ceph health detail" output.

сб, 19 дек. 2020 г. в 23:48, Seena Fallah :
>
> Hi,
>
> I'm facing something strange! One of the PGs in my pool got inconsistent
> and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty`
> the `inconsistents` key was empty! What is this? Is it a bug in Ceph or..?
>
> Thanks.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Alexander E. Patrakov
CV: http://u.pc.cd/wT8otalK
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-15 Thread Seena Fallah
All of my daemons are 14.2.24

On Sat, Jan 16, 2021 at 2:39 AM  wrote:

> Hello Seena,
>
> Which Version of ceph you are using?
>
> IIRC there Was a bug in an older luminous which caused an empty list...
>
> HTH
> Mehmet
>
> Am 19. Dezember 2020 19:47:10 MEZ schrieb Seena Fallah <
> seenafal...@gmail.com>:
> >Hi,
> >
> >I'm facing something strange! One of the PGs in my pool got
> >inconsistent
> >and when I run `rados list-inconsistent-obj $PG_ID
> >--format=json-pretty`
> >the `inconsistents` key was empty! What is this? Is it a bug in Ceph
> >or..?
> >
> >Thanks.
> >___
> >ceph-users mailing list -- ceph-users@ceph.io
> >To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent with empty inconsistent objects

2021-01-15 Thread ceph
Hello Seena,

Which Version of ceph you are using?

IIRC there Was a bug in an older luminous which caused an empty list...

HTH
Mehmet

Am 19. Dezember 2020 19:47:10 MEZ schrieb Seena Fallah :
>Hi,
>
>I'm facing something strange! One of the PGs in my pool got
>inconsistent
>and when I run `rados list-inconsistent-obj $PG_ID
>--format=json-pretty`
>the `inconsistents` key was empty! What is this? Is it a bug in Ceph
>or..?
>
>Thanks.
>___
>ceph-users mailing list -- ceph-users@ceph.io
>To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io