[ceph-users] Re: PG damaged "failed_repair"
Hi, Sorry for the broken formatting. Here are the outputs again. ceph osd df: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS 3hdd 1.81879 0 0 B 0 B 0 B 0 B 0 B 0 B 0 00down 12hdd 1.81879 1.0 1.8 TiB 385 GiB 383 GiB 6.7 MiB 1.4 GiB 1.4 TiB 20.66 1.73 18 up 13hdd 1.81879 1.0 1.8 TiB 422 GiB 421 GiB 5.8 MiB 1.3 GiB 1.4 TiB 22.67 1.90 17 up 15hdd 1.81879 1.0 1.8 TiB 264 GiB 263 GiB 4.6 MiB 1.1 GiB 1.6 TiB 14.17 1.19 14 up 16hdd 9.09520 1.0 9.1 TiB 1.0 TiB 1023 GiB 8.8 MiB 2.6 GiB 8.1 TiB 11.01 0.92 65 up 17hdd 1.81879 1.0 1.8 TiB 319 GiB 318 GiB 6.1 MiB 1.0 GiB 1.5 TiB 17.13 1.43 15 up 1hdd 5.45749 1.0 5.5 TiB 546 GiB 544 GiB 7.8 MiB 1.4 GiB 4.9 TiB 9.76 0.82 29 up 4hdd 5.45749 1.0 5.5 TiB 801 GiB 799 GiB 8.3 MiB 2.4 GiB 4.7 TiB 14.34 1.20 44 up 8hdd 5.45749 1.0 5.5 TiB 708 GiB 706 GiB 9.7 MiB 2.1 GiB 4.8 TiB 12.67 1.06 36 up 11hdd 5.45749 0 0 B 0 B 0 B 0 B 0 B 0 B 0 00down 14hdd 1.81879 1.0 1.8 TiB 200 GiB 198 GiB 3.8 MiB 1.3 GiB 1.6 TiB 10.71 0.90 10 up 0hdd 9.09520 0 0 B 0 B 0 B 0 B 0 B 0 B 0 00down 5hdd 9.09520 1.0 9.1 TiB 859 GiB 857 GiB 17 MiB 2.1 GiB 8.3 TiB 9.23 0.77 46 up 9hdd 9.09520 1.0 9.1 TiB 924 GiB 922 GiB 11 MiB 2.3 GiB 8.2 TiB 9.92 0.83 55 up TOTAL 53 TiB 6.3 TiB 6.3 TiB 90 MiB 19 GiB 46 TiB 11.95 MIN/MAX VAR: 0.77/1.90 STDDEV: 4.74 ceph osd pool ls detail : pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 32 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr pool 2 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9327 lfor 0/0/104 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 3 'images' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9018 lfor 0/0/104 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 4 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9149 lfor 0/0/106 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 5 'polyphoto_backup' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 372 lfor 0/0/362 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_algorithm snappy compression_mode aggressive application rbd The error seems to come from a software error in Ceph. I see this error in the logs : "FAILED ceph_assert(clone_overlap.count(clone))" Thanks, Romain Lebbadi-Breteau ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PG damaged "failed_repair"
Hi, Sorry for the bad formatting. Here are the outputs again. ceph osd df : ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 3 hdd 1.81879 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 12 hdd 1.81879 1.0 1.8 TiB 385 GiB 383 GiB 6.7 MiB 1.4 GiB 1.4 TiB 20.66 1.73 18 up 13 hdd 1.81879 1.0 1.8 TiB 422 GiB 421 GiB 5.8 MiB 1.3 GiB 1.4 TiB 22.67 1.90 17 up 15 hdd 1.81879 1.0 1.8 TiB 264 GiB 263 GiB 4.6 MiB 1.1 GiB 1.6 TiB 14.17 1.19 14 up 16 hdd 9.09520 1.0 9.1 TiB 1.0 TiB 1023 GiB 8.8 MiB 2.6 GiB 8.1 TiB 11.01 0.92 65 up 17 hdd 1.81879 1.0 1.8 TiB 319 GiB 318 GiB 6.1 MiB 1.0 GiB 1.5 TiB 17.13 1.43 15 up 1 hdd 5.45749 1.0 5.5 TiB 546 GiB 544 GiB 7.8 MiB 1.4 GiB 4.9 TiB 9.76 0.82 29 up 4 hdd 5.45749 1.0 5.5 TiB 801 GiB 799 GiB 8.3 MiB 2.4 GiB 4.7 TiB 14.34 1.20 44 up 8 hdd 5.45749 1.0 5.5 TiB 708 GiB 706 GiB 9.7 MiB 2.1 GiB 4.8 TiB 12.67 1.06 36 up 11 hdd 5.45749 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 14 hdd 1.81879 1.0 1.8 TiB 200 GiB 198 GiB 3.8 MiB 1.3 GiB 1.6 TiB 10.71 0.90 10 up 0 hdd 9.09520 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 down 5 hdd 9.09520 1.0 9.1 TiB 859 GiB 857 GiB 17 MiB 2.1 GiB 8.3 TiB 9.23 0.77 46 up 9 hdd 9.09520 1.0 9.1 TiB 924 GiB 922 GiB 11 MiB 2.3 GiB 8.2 TiB 9.92 0.83 55 up TOTAL 53 TiB 6.3 TiB 6.3 TiB 90 MiB 19 GiB 46 TiB 11.95 MIN/MAX VAR: 0.77/1.90 STDDEV: 4.74 ceph osd pool ls detail : pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 32 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr pool 2 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9327 lfor 0/0/104 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 3 'images' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9018 lfor 0/0/104 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 4 'vms' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 9149 lfor 0/0/106 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd pool 5 'polyphoto_backup' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 372 lfor 0/0/362 flags hashpspool,selfmanaged_snaps stripe_width 0 compression_algorithm snappy compression_mode aggressive application rbd The error seems to come from a software error in Ceph. In the logs, I get the message "FAILED ceph_assert(clone_overlap.count(clone))". Thanks, Romain Lebbadi-Breteau ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PG damaged "failed_repair"
Hi, Yes we're trying to remove the osd.3. Here is the result of `ceph osd df` : IDCLASSWEIGHTREWEIGHTSIZERAWUSEDATAOMAPMETAAVAIL%USEVARPGSSTATUS 3hdd1.818791.01.8TiB443GiB441GiB6.8MiB1.5GiB1.4TiB23.782.3716up 6hdd1.818791.01.8TiB114GiB114GiB981KiB343MiB1.7TiB6.140.618up 12hdd1.818791.01.8TiB359GiB358GiB5.8MiB1.0GiB1.5TiB19.271.9215up 13hdd1.818791.01.8TiB331GiB330GiB3.9MiB1.5GiB1.5TiB17.771.7715up 15hdd1.818791.01.8TiB217GiB216GiB2.0MiB1.1GiB1.6TiB11.641.1613up 16hdd9.095201.09.1TiB785GiB783GiB8.8MiB1.9GiB8.3TiB8.430.8451up 17hdd1.818791.01.8TiB204GiB203GiB2.9MiB1.2GiB1.6TiB10.951.0911up 1hdd5.457491.05.5TiB428GiB427GiB4.9MiB876MiB5.0TiB7.660.7624up 4hdd5.457491.05.5TiB638GiB636GiB6.8MiB2.2GiB4.8TiB11.421.1436up 8hdd5.457491.05.5TiB594GiB591GiB8.7MiB2.2GiB4.9TiB10.621.0630up 11hdd5.457491.05.5TiB567GiB565GiB7.8MiB2.1GiB4.9TiB10.151.0129up 14hdd1.818791.01.8TiB197GiB195GiB2.9MiB1.2GiB1.6TiB10.551.0510up 0hdd9.095201.09.1TiB764GiB763GiB9.6MiB1.8GiB8.3TiB8.210.8247up 5hdd9.095201.09.1TiB791GiB789GiB11MiB2.6GiB8.3TiB8.500.8538up 9hdd9.095201.09.1TiB858GiB856GiB11MiB2.4GiB8.3TiB9.210.9244up TOTAL71TiB7.1TiB7.1TiB93MiB24GiB64TiB10.03 MIN/MAXVAR:0.61/2.37STDDEV:4.97 And here id `ceph osd pool ls detail` (but yes our replicated size if 3) : pool1'.mgr'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num1pgp_num1autoscale_modeonlast_change32flagshashpspoolstripe_width0pg_num_max32pg_num_min1applicationmgr pool2'volumes'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change9327lfor0/0/104flagshashpspool,selfmanaged_snapsstripe_width0applicationrbd pool3'images'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change9018lfor0/0/104flagshashpspool,selfmanaged_snapsstripe_width0applicationrbd pool4'vms'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change9149lfor0/0/106flagshashpspool,selfmanaged_snapsstripe_width0applicationrbd pool5'polyphoto_backup'replicatedsize3min_size2crush_rule0object_hashrjenkinspg_num32pgp_num32autoscale_modeonlast_change372lfor0/0/362flagshashpspool,selfmanaged_snapsstripe_width0compression_algorithmsnappycompression_modeaggressiveapplicationrbd And we're using quincy : romain:step@alpha-cen~ $ sudoceph--version cephversion17.2.6(d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) All our physical disks are on their own RAID 0 using the built-in Raid Controller (PERC H730 Mini). For the logs, journalctl has rotated and we don't have them anymore. I recreated the situation where the three OSDs crash (shutting down osd.3 and marking it out), and here are the logs : ceph -w : https://pastebin.com/A7gJ3ss2 <https://pastebin.com/A7gJ3ss2>osd.0, osd.3 and osd.11 : https://gitlab.com/RomainL456/ceph-incident-logs/ I put the full logs (output of `sudo journalctl --since "23:00" -u ceph-9b4b12fe-4dc6-11ed-9ed9-d18a342d7c2b@osd.*`) in a public Git repo, and I also put a file for the logs right before osd.0 crashed. Here is the timeline of events (local time) : 23:27 : I manually shut down osd.3 23:46 : osd.0 crashes 23:46 : osd.11 crashes 23:48 : I start osd.3, it crashes in less than a minute 23:49 : After I mark osd.3 "in" and start it again, it comes back online with osd.0 and osd.11 soon after Best regards, Romain Lebbadi-Breteau On 2024-03-08 3:17 a.m., Eugen Block wrote: Hi, can you share more details? Which OSD are you trying to get out, the primary osd.3? Can you also share 'ceph osd df'? It looks like a replicated pool with size 3, can you confirm with 'ceph osd pool ls detail'? Do you have logs from the crashing OSDs when you take out osd.3? Which ceph version is this? Thanks, Eugen Zitat von Romain Lebbadi-Breteau : Hi, We're a student club from Montréal where we host an Openstack cloud with a Ceph backend for storage of virtual machines and volumes using rbd. Two weeks ago we received an email from our ceph cluster saying that some pages were damaged. We ran "sudo ceph pg repair " but then there was an I/O error on the disk during the recovery ("An unrecoverable disk media error occurred on Disk 4 in Backplane 1 of Integrated RAID Controller 1." and "Bad block medium error is detected at block 0x1377e2ad on Virtual Disk 3 on Integrated RAID Controller 1." messages on iDRAC). After that, the PG we tried to repair was in the state "active+recovery_unfound+degraded". After a week, we ran the command "sudo ceph pg 2.1b mark_unfound_lost revert" to try to recover the damaged PG. We tried to boot the virtual machine that had crashed because of this incident, but the volume seemed to have been completely erased, the "mount" command said there was no filesystem on it, so we recreated the VM from a backup. A few days later, the same PG was once again d
[ceph-users] PG damaged "failed_repair"
Hi, We're a student club from Montréal where we host an Openstack cloud with a Ceph backend for storage of virtual machines and volumes using rbd. Two weeks ago we received an email from our ceph cluster saying that some pages were damaged. We ran "sudo ceph pg repair " but then there was an I/O error on the disk during the recovery ("An unrecoverable disk media error occurred on Disk 4 in Backplane 1 of Integrated RAID Controller 1." and "Bad block medium error is detected at block 0x1377e2ad on Virtual Disk 3 on Integrated RAID Controller 1." messages on iDRAC). After that, the PG we tried to repair was in the state "active+recovery_unfound+degraded". After a week, we ran the command "sudo ceph pg 2.1b mark_unfound_lost revert" to try to recover the damaged PG. We tried to boot the virtual machine that had crashed because of this incident, but the volume seemed to have been completely erased, the "mount" command said there was no filesystem on it, so we recreated the VM from a backup. A few days later, the same PG was once again damaged, and since we knew the physical disk on the OSD hosting one part of the PG had problems, we tried to "out" the OSD from the cluster. That resulted in the two other OSDs hosting copies of the problematic PG to go down, which caused timeouts on our virtual machines, so we put the OSD back in. We then tried to repair the PG again, but that failed and the PG is now "active+clean+inconsistent+failed_repair", and whenever it goes down, two other OSDs from two other hosts go down too after a few minutes, so it's impossible to replace the disk right now, even if we have new ones available. We have backups for most of our services, but it would be very disrupting to delete the whole cluster, and we don't know that to do with the broken PG and the OSD that can't be shut down. Any help would be really appreciated, we're not experts with Ceph and Openstack, and it's likely we handled things wrong at some point, but we really want to go back to a healthy Ceph. Here are some information about our cluster : romain:step@alpha-cen ~ $ sudo ceph health detail HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent [ERR] OSD_SCRUB_ERRORS: 1 scrub errors [ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent pg 2.1b is active+clean+inconsistent+failed_repair, acting [3,11,0] romain:step@alpha-cen ~ $ sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 70.94226 root default -7 20.00792 host alpha-cen 3 hdd 1.81879 osd.3 up 1.0 1.0 6 hdd 1.81879 osd.6 up 1.0 1.0 12 hdd 1.81879 osd.12 up 1.0 1.0 13 hdd 1.81879 osd.13 up 1.0 1.0 15 hdd 1.81879 osd.15 up 1.0 1.0 16 hdd 9.09520 osd.16 up 1.0 1.0 17 hdd 1.81879 osd.17 up 1.0 1.0 -5 23.64874 host beta-cen 1 hdd 5.45749 osd.1 up 1.0 1.0 4 hdd 5.45749 osd.4 up 1.0 1.0 8 hdd 5.45749 osd.8 up 1.0 1.0 11 hdd 5.45749 osd.11 up 1.0 1.0 14 hdd 1.81879 osd.14 up 1.0 1.0 -3 27.28560 host gamma-cen 0 hdd 9.09520 osd.0 up 1.0 1.0 5 hdd 9.09520 osd.5 up 1.0 1.0 9 hdd 9.09520 osd.9 up 1.0 1.0 romain:step@alpha-cen ~ $ sudo rados list-inconsistent-obj 2.1b {"epoch":9787,"inconsistents":[]} romain:step@alpha-cen ~ $ sudo ceph pg 2.1b query https://pastebin.com/gsKCPCjr Best regards, Romain Lebbadi-Breteau ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io