[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Hello, 'ceph osd deep-scrub 5' deep-scrubs all PGs for which osd.5 is primary (and only those). You can check that from ceph-osd.5.log by running: for pg in $(grep 'deep-scrub starts' /var/log/ceph/*/ceph-osd.5.log | awk '{print $8}') ; do echo "pg: $pg, primary osd is osd.$(ceph pg $pg query -f json | jq '.info.stats.acting_primary')" ; done while 'ceph osd deep-scrub all' instructs all OSDs to start deep-scrubbing all PGs they're primary for, so in the end, all cluster's PGs. So if the data you overwrote on osd.5 with 'dd' was part of a PG for which osd.5 was not the primary OSD then it wasn't deep-scrubbed. man ceph 8 could rather say: Subcommand deep-scrub initiates deep scrub on all PGs osd is primary for. Usage: ceph osd deep-scrub Regards, Frédéric. - Le 10 Juin 24, à 16:51, Petr Bena petr@bena.rocks a écrit : > Most likely it wasn't, the ceph help or documentation is not very clear about > this: > > osd deep-scrub > initiate > deep scrub on osd , or use to deep scrub all > > It doesn't say anything like "initiate deep scrub of primary PGs on osd" > > I assumed it just runs a scrub of everything on given OSD. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Scrubs are of PGs not OSDs, the lead OSD for a PG orchestrates subops to secondary OSDs. If you can point me to where this is in docs/src I'll clarify it, ideally if you can put in a tracker ticket and send me a link. Scrubbing all PGs on an OSD at once or even in sequence would be impactful. > On Jun 10, 2024, at 10:51, Petr Bena wrote: > > Most likely it wasn't, the ceph help or documentation is not very clear about > this: > > osd deep-scrub > initiate deep scrub on osd , or > use to deep scrub all > > It doesn't say anything like "initiate deep scrub of primary PGs on osd" > > I assumed it just runs a scrub of everything on given OSD. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Most likely it wasn't, the ceph help or documentation is not very clear about this: osd deep-scrub initiate deep scrub on osd , or use to deep scrub all It doesn't say anything like "initiate deep scrub of primary PGs on osd" I assumed it just runs a scrub of everything on given OSD. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
That would have been my next question, did you verify that the corrupted OSD was a primary? The default deep-scrub config scrubs all PGs within a week, so yeah, it can take a week until it's detected. It could have been detected sooner if those objects would have been in use by clients and required to be updated (rewritten). Zitat von Petr Bena : Hello, No I don't have osd_scrub_auto_repair, interestingly after about a week after forgetting about this, an error manifested: [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 4.1d is active+clean+inconsistent, acting [4,2] which could be repaired as expected, since I damaged only 1 OSD. It's interesting it took a whole week to find it. For some reason it seems to be that running deep-scrub on entire OSD only runs it for PGs where the OSD is considered "primary", so maybe that's why it wasn't detected when I ran it manually? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Hello, No I don't have osd_scrub_auto_repair, interestingly after about a week after forgetting about this, an error manifested: [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 4.1d is active+clean+inconsistent, acting [4,2] which could be repaired as expected, since I damaged only 1 OSD. It's interesting it took a whole week to find it. For some reason it seems to be that running deep-scrub on entire OSD only runs it for PGs where the OSD is considered "primary", so maybe that's why it wasn't detected when I ran it manually? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Hello Petr, - Le 4 Juin 24, à 12:13, Petr Bena petr@bena.rocks a écrit : > Hello, > > I wanted to try out (lab ceph setup) what exactly is going to happen > when parts of data on OSD disk gets corrupted. I created a simple test > where I was going through the block device data until I found something > that resembled user data (using dd and hexdump) (/dev/sdd is a block > device that is used by OSD) > > INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | > hexdump -C > 6e 20 69 64 3d 30 20 65 78 65 3d 22 2f 75 73 72 |n id=0 > exe="/usr| > 0010 2f 73 62 69 6e 2f 73 73 68 64 22 20 68 6f 73 74 |/sbin/sshd" > host| > > Then I deliberately overwrote 32 bytes using random data: > > INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/urandom of=/dev/sdd bs=32 > count=1 seek=33920 > > INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | > hexdump -C > 25 75 af 3e 87 b0 3b 04 78 ba 79 e3 64 fc 76 d2 >|%u.>..;.x.y.d.v.| > 0010 9e 94 00 c2 45 a5 e1 d2 a8 86 f1 25 fc 18 07 5a >|E..%...Z| > > At this point I would expect some sort of data corruption. I restarted > the OSD daemon on this host to make sure it flushes any potentially > buffered data. It restarted OK without noticing anything, which was > expected. > > Then I ran > > ceph osd scrub 5 > > ceph osd deep-scrub 5 > > And waiting for all scheduled scrub operations for all PGs to finish. > > No inconsistency was found. No errors reported, scrubs just finished OK, > data are still visibly corrupt via hexdump. > > Did I just hit some block of data that WAS used by OSD, but was marked > deleted and therefore no longer used or am I missing something? Possibly, if you deep-scrubed all PGs. I remember marking bad sectors in the past and still getting a fsck success on ceph-bluestore-tool fsck. To be sure, you could overwrite the very same sector, stop the OSD and then: $ ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-X/ or (in containerized environment) $ cephadm shell --name osd.X ceph-bluestore-tool fsck --deep yes --path /var/lib/ceph/osd/ceph-X/ osd.X being the OSD associated to drive /dev/sdd. Regards, Frédéric. > I would expect CEPH to detect disk corruption and automatically replace the > invalid data with a valid copy? > > I use only replica pools in this lab setup, for RBD and CephFS. > > Thanks > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities
Do you have osd_scrub_auto_repair set to true? Zitat von Petr Bena : Hello, I wanted to try out (lab ceph setup) what exactly is going to happen when parts of data on OSD disk gets corrupted. I created a simple test where I was going through the block device data until I found something that resembled user data (using dd and hexdump) (/dev/sdd is a block device that is used by OSD) INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | hexdump -C 6e 20 69 64 3d 30 20 65 78 65 3d 22 2f 75 73 72 |n id=0 exe="/usr| 0010 2f 73 62 69 6e 2f 73 73 68 64 22 20 68 6f 73 74 |/sbin/sshd" host| Then I deliberately overwrote 32 bytes using random data: INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/urandom of=/dev/sdd bs=32 count=1 seek=33920 INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | hexdump -C 25 75 af 3e 87 b0 3b 04 78 ba 79 e3 64 fc 76 d2 |%u.>..;.x.y.d.v.| 0010 9e 94 00 c2 45 a5 e1 d2 a8 86 f1 25 fc 18 07 5a |E..%...Z| At this point I would expect some sort of data corruption. I restarted the OSD daemon on this host to make sure it flushes any potentially buffered data. It restarted OK without noticing anything, which was expected. Then I ran ceph osd scrub 5 ceph osd deep-scrub 5 And waiting for all scheduled scrub operations for all PGs to finish. No inconsistency was found. No errors reported, scrubs just finished OK, data are still visibly corrupt via hexdump. Did I just hit some block of data that WAS used by OSD, but was marked deleted and therefore no longer used or am I missing something? I would expect CEPH to detect disk corruption and automatically replace the invalid data with a valid copy? I use only replica pools in this lab setup, for RBD and CephFS. Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io