I've never heard or experienced such a massive corruption, tbh. Since you mentioned "yum update", I assume that this is not a cephadm managed cluster, is that correct? I wonder if there are remainders from uncleaned OSD maintenance, maybe some sort of mixup between OSD directories and block devices (or symlinks). It wouldn't be the first time, not too long ago I helped a customer getting OSDs back up when it turned out that they had been down/out for quite a while, but they hadn't cleaned up properly.
What you describe sounds differently, though, but maybe worth checking anyway?

Zitat von kefu chai <[email protected]>:

hi Eugen,

On Mon, Oct 13, 2025 at 2:03 AM Eugen Block <[email protected]> wrote:

Hi,

just a couple of days ago someone had the same issue:


https://lists.ceph.io/hyperkitty/list/[email protected]/thread/4D5QQGOKJNUITFVTZERGJXC5K3WY6FM4/#LAJSJRMXSFXFP4LUDYF5BHB4Y3MAZDSQ

Apparently, the cluster was upgraded while a pool deletion was in
progress. Is that the same case here? OP of the other thread patched


Actually, that was my initial suspicion as well. I specifically asked them
about this possibility, but they confirmed that no pools were deleted
during the upgrade. Additionally, they mentioned that the system was
experiencing relatively low load at the time since the upgrade occurred
over the weekend.

However, the puzzling aspect is that several 'ghost' PGs appeared after the
upgrade. These weren't created due to misplacement—they seemingly
materialized out of nowhere. And some PGs disappeared. The only plausible
explanation is a corrupted objectstore. This scares me.


his OSD code to skip the check, not sure how risky that is. But I'm
also not sure how to get out of this situation, one idea was to delete
the PGs from the affected OSDs, but that can be risky as well.

Btw., skipping a major release is supported and has been for a long
time, so upgrading from O to Q is in general totally okay. But one
should only upgrade if the cluster is healthy (all PGs active+clean).


Thanks for pointing this out. I also realized that we do have a test for +2
upgrade
see https://github.com/ceph/ceph/tree/main/qa/suites/upgrade/reef-x.


Regards,
Eugen

Zitat von kefu chai <[email protected]>:

> Hello Ceph community,
>
> I'm writing on behalf of a friend who is experiencing a critical cluster
> issue after upgrading and would appreciate any assistance.
>
> Environment:
>
>    - 5 MON nodes, 2 MGR nodes, 40 OSD servers (306 OSDs total)
>    - OS: CentOS 8.2 upgraded to 8.4
>    - Ceph: 15.2.17 upgraded to 17.2.7
>    - Upgrade method: yum update in rolling batches
>
> Timeline: The upgrade started on October 8th at 1:00 PM. We upgraded
> MON/MGR servers first, and then upgraded OSD nodes in batches of 5 nodes.
> The process appeared normal initially, but when approximately 10 OSD
> servers remained, OSDs began going down.
>
> MON Quorum Issue: When the OSDs began failing, the monitors failed to
form
> a quorum. In an attempt to recover, we stopped 4 out of 5 monitors.
> However, the remaining monitor (mbjson20010) then failed to start due to
a
> missing .ldb file. We eventually recovered this single monitor from OSD
> using the instructions at
>
https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds
,
> so
> we now have only 1 MON in the cluster instead of the original 5.
>
> However, rebuilding the MON store did not help, and restarting the OSD
> servers also failed to resolve the issue. The cluster status remains
> problematic.
>
> Current Cluster Status:
>
>    - Only 1 MON daemon active (quorum: mbjson20010) - down from 5 MONs
>    - OSDs: 91 up / 229 in (out of 306 total)
>    - 88.872% of PGs are not active
>    - 4.779% of PGs are unknown
>    - 3,918 PGs down
>    - 1,311 PGs stale+down
>    - Only 12 PGs active+clean
>
> Critical Error: When examining OSD logs, we discovered that some OSDs are
> failing to start with the following error:
>
> osd.43 39677784 init missing pg_pool_t for deleted pool 9 for pg 9.3ds3;
> please downgrade to luminous and allow pg deletion to complete before
> upgrading
>
> Full error context from one of the failing OSDs:
>
> # tail  /var/log/ceph/ceph-osd.43.log
>
>     -7> 2025-10-12T13:40:05.987+0800 7fdd13259540  1
> bluestore(/var/lib/ceph/osd/ceph-43) _upgrade_super from 4, latest 4
>
>     -6> 2025-10-12T13:40:05.987+0800 7fdd13259540  1
> bluestore(/var/lib/ceph/osd/ceph-43) _upgrade_super done
>
>     -5> 2025-10-12T13:40:05.987+0800 7fdd13259540  2 osd.43 0 journal
looks
> like ssd
>
>     -4> 2025-10-12T13:40:05.987+0800 7fdd13259540  2 osd.43 0 boot
>
>     -3> 2025-10-12T13:40:05.987+0800 7fdceb2cc700  5
> bluestore.MempoolThread(0x55c7b0c66b40) _resize_shards cache_size:
> 8589934592 kv_alloc: 1717986918 kv_used: 91136 kv_onode_alloc: 343597383
> kv_onode_used: 23328 meta_alloc: 6871947673 meta_used: 2984 data_alloc: 0
> data_used: 0
>
>     -2> 2025-10-12T13:40:05.989+0800 7fdd13259540 -1 osd.43 39677784 init
> missing pg_pool_t for deleted pool 9 for pg 9.3ds3; please downgrade to
> luminous and allow pg deletion to complete before upgrading
>
>     -1> 2025-10-12T13:40:05.991+0800 7fdd13259540 -1
>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/osd/OSD.cc:
> In function 'int OSD::init()' thread 7fdd13259540 time
> 2025-10-12T13:40:05.990845+0800
>
>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/osd/OSD.cc:
> 3735: ceph_abort_msg("abort() called")
>
> # tail  /var/log/ceph/ceph-osd.51.log
>   -7> 2025-10-12T13:39:36.739+0800 7f603e5f7540  1
> bluestore(/var/lib/ceph/osd/ceph-51) _upgrade_super from 4, latest 4
>     -6> 2025-10-12T13:39:36.739+0800 7f603e5f7540  1
> bluestore(/var/lib/ceph/osd/ceph-51) _upgrade_super done
>     -5> 2025-10-12T13:39:36.739+0800 7f603e5f7540  2 osd.51 0 journal
looks
> like ssd
>     -4> 2025-10-12T13:39:36.739+0800 7f603e5f7540  2 osd.51 0 boot
>     -3> 2025-10-12T13:39:36.739+0800 7f6016669700  5
> bluestore.MempoolThread(0x55e839d4cb40) _resize_shards cache_size:
> 8589934592 kv_alloc: 1717986918 kv_used: 31232 kv_onode_alloc: 343597383
> kv_onode_used: 21584 meta_alloc: 6871947673 meta_used: 1168 data_alloc: 0
> data_used: 0
>     -2> 2025-10-12T13:39:36.741+0800 7f603e5f7540 -1 osd.51 39677784 init
> missing pg_pool_t for deleted pool 6 for pg 6.1f; please downgrade to
> luminous and allow pg deletion to complete before upgrading
>     -1> 2025-10-12T13:39:36.742+0800 7f603e5f7540 -1
>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/osd/OSD.cc:
> In function 'int OSD::init()' thread 7f603e5f7540 time
> 2025-10-12T13:39:36.742527+0800
>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.2.7/rpm/el8/BUILD/ceph-17.2.7/src/osd/OSD.cc:
> 3735: ceph_abort_msg("abort() called")
>
> Investigation Findings: We examined all OSD instances that failed to
start.
> All of them exhibit the same error pattern in their logs and all contain
PG
> references to non-existent pools. For example, running
> "ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-51 --op
list-pgs"
> shows PG references to pools that no longer exist (e.g., pool 9, pool 10,
> pool 4, pool 6, pool 8), while the current pools are numbered 101, 140,
> 141, 149, 212, 213, 216, 217, 218, 219. Notably, each affected OSD
contains
> only 2-3 PGs referencing these non-existent pools, which is significantly
> fewer than the hundreds of PGs a regular OSD typically contains. It
appears
> the OSD metadata has been corrupted or overwritten with stale references
to
> deleted pools from previous operations, preventing these OSDs from
starting
> and causing widespread PG state abnormalities across the cluster.
>
> 2 PGs referencing non-existent pools were found in osd.51:
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-51 --op
list-pgs
> 1.0
> 6.1f
>
> # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-51 --op list
> Error getting attr on : 1.0_head,#1:00000000::::head#, (61) No data
> available
> Error getting attr on : 6.1f_head,#6:f8000000::::head#, (61) No data
> available
>
["1.0",{"oid":"","key":"","snapid":-2,"hash":0,"max":0,"pool":1,"namespace":"","max":0}]
>
["1.0",{"oid":"main.db-journal.0000000000000000","key":"","snapid":-2,"hash":1969844440,"max":0,"pool":1,"namespace":"devicehealth","max":0}]
>
["1.0",{"oid":"main.db.0000000000000000","key":"","snapid":-2,"hash":1315310604,"max":0,"pool":1,"namespace":"devicehealth","max":0}]
>
["6.1f",{"oid":"","key":"","snapid":-2,"hash":31,"max":0,"pool":6,"namespace":"","max":0}]
>
> We also performed a comprehensive check by listing all PGs from all OSD
> nodes using "ceph-objectstore-tool --op list-pgs" and comparing the
results
> with the output of "ceph pg dump". This comparison revealed that quite a
> few PGs are missing from the OSD listings. We suspect that some OSDs that
> previously held these missing PGs may now be corrupted, which would
explain
> both the missing PGs and the widespread cluster degradation. It appears
the
> OSD metadata has been corrupted or overwritten with stale references to
> deleted pools from previous operations, preventing these OSDs from
starting
> and causing widespread PG state abnormalities across the cluster.
>
> It appears the OSD objectstore's metadata has been corrupted or
overwritten
> with stale references to deleted pools from previous operations,
preventing
> these OSDs from starting and causing widespread PG state abnormalities
> across the cluster.
>
> Questions:
>
>    1. How can we safely restore the missing PGs from the OSD without data
>    loss?
>    2. Has anyone encountered similar issues when upgrading from Octopus
>    (15.2.x) to Quincy (17.2.x)?
>
> We understand that skipping major versions may not be officially
supported,
> but we urgently need guidance on the safest recovery path at this point.
>
> Any help would be greatly appreciated. Thank you in advance.
>
> --
> Regards
> Kefu Chai
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]



--
Regards
Kefu Chai


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to