Hi, Here user that suffer years ago a problem with orphans.
Years ago, after much research, we discovered that for some reason the WALLDB/Metadata entries were being deleted and corrupted, but the data on the disks weren't physically erased. Sometimes the garbage collector (deferred delete) would fail and skip the deletion, leaving hundreds of TB behind. Speaking with other heavy CEPH users, they were aware of this and couldn't find a great solution either, just they instead use replica 3 , used replica 4. (big customers with big budget) At the time, we were presented with two options: wipe each disk, and CEPH would only rebuild the data it knows is valid, but that would take time, maybe in your case where your full NVME will take not too much. Or create a new cluster and move the valid data. In our case ceph orphan tool start looping due bugs and didn’t provide a real solution, our case 1PB ceph, with aprox 300TB orphaned. I remember ceph orphan tool running for weeks ☹ bad ass time. Our ceph use case : S3 and version 12 to 14 nautilus... Sometimes , we as administrators doesn’t care about this issues until you need to wipe a lot of data. And you use simple calc and don’t match. Regards, -----Mensaje original----- De: Alexander Patrakov <[email protected]> Enviado el: jueves, 2 de octubre de 2025 22:56 Para: Anthony D'Atri <[email protected]> CC: [email protected] Asunto: [ceph-users] Re: Orphaned CephFS objects On Thu, Oct 2, 2025 at 9:45 PM Anthony D'Atri <[email protected]> wrote: > There is design work for a future ability to migrate a pool transparently, > for example to effect a new EC profile, but that won't be available anytime > soon. This is, unfortunately, irrelevant in this case. Migrating a pool will migrate all the objects and their snapshots, even the unwanted ones. What Trey has (as far as I understood) is that there are some RADOS-level snapshots that do not correspond to any CephFS-level snapshots and are thus garbage, not to be migrated. That's why the talk about file migration and not pool-level operations. Now to the original question: > will I be able to do 'ceph fs rm_data_pool' once there are no longer any > objects associated with the CephFS instance on the pool, or will the MDS > have ghost object records that cause the command to balk? Just tested in a test cluster - it won't balk and won't demand force even if you remove a pool that is actually used by files. So beware. $ ceph osd pool create badfs_evilpool 32 ssd-only pool 'badfs_evilpool' created $ ceph fs add_data_pool badfs badfs_evilpool added data pool 38 to fsmap $ ceph fs ls name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data cephfs_data_wrongpool cephfs_data_rightpool cephfs_data_hdd ] name: badfs, metadata pool: badfs_metadata, data pools: [badfs_data badfs_evilpool ] $ cephfs-shell -f badfs CephFS:~/>>> ls dir1/ dir2/ CephFS:~/>>> mkdir evil CephFS:~/>>> setxattr evil ceph.dir.layout.pool badfs_evilpool ceph.dir.layout.pool is successfully set to badfs_evilpool CephFS:~/>>> put /usr/bin/ls /evil/ls $ ceph fs rm_data_pool badfs badfs_evilpool removed data pool 38 from fsmap -- Alexander Patrakov _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
