Hi, 

Here user that suffer years ago a problem with orphans.

Years ago, after much research, we discovered that for some reason the 
WALLDB/Metadata entries were being deleted and corrupted, but the data on the 
disks weren't physically erased. 
Sometimes the garbage collector (deferred delete) would fail and skip the 
deletion, leaving hundreds of TB behind.
Speaking with other heavy CEPH users, they were aware of this and couldn't find 
a great solution either, just they instead use replica 3 , used replica 4. (big 
customers with big budget)
At the time, we were presented with two options: wipe each disk, and CEPH would 
only rebuild the data it knows is valid, but that would take time, maybe in 
your case where your full NVME will take not too much. Or create a new cluster 
and move the valid data.

In our case ceph orphan tool start looping due bugs and didn’t provide a real 
solution, our case 1PB ceph, with aprox 300TB orphaned.

I remember ceph orphan tool running for weeks ☹ bad ass time.

Our ceph use case : S3 and version 12 to 14 nautilus...

Sometimes , we as administrators doesn’t care about this issues until you need 
to wipe a lot of data. And you use simple calc and don’t match.

Regards,

-----Mensaje original-----
De: Alexander Patrakov <[email protected]> 
Enviado el: jueves, 2 de octubre de 2025 22:56
Para: Anthony D'Atri <[email protected]>
CC: [email protected]
Asunto: [ceph-users] Re: Orphaned CephFS objects

On Thu, Oct 2, 2025 at 9:45 PM Anthony D'Atri <[email protected]> wrote:

> There is design work for a future ability to migrate a pool transparently, 
> for example to effect a new EC profile, but that won't be available anytime 
> soon.

This is, unfortunately, irrelevant in this case. Migrating a pool will
migrate all the objects and their snapshots, even the unwanted ones.
What Trey has (as far as I understood) is that there are some
RADOS-level snapshots that do not correspond to any CephFS-level
snapshots and are thus garbage, not to be migrated.

That's why the talk about file migration and not pool-level operations.

Now to the original question:

> will I be able to do 'ceph fs rm_data_pool'  once there are no longer any
> objects associated with the CephFS instance on the pool, or will the MDS
> have ghost object records that cause the command to balk?

Just tested in a test cluster - it won't balk and won't demand force
even if you remove a pool that is actually used by files. So beware.

$ ceph osd pool create badfs_evilpool 32 ssd-only
pool 'badfs_evilpool' created
$ ceph fs add_data_pool badfs badfs_evilpool
added data pool 38 to fsmap
$ ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data
cephfs_data_wrongpool cephfs_data_rightpool cephfs_data_hdd ]
name: badfs, metadata pool: badfs_metadata, data pools: [badfs_data
badfs_evilpool ]
$ cephfs-shell -f badfs
CephFS:~/>>> ls
dir1/   dir2/
CephFS:~/>>> mkdir evil
CephFS:~/>>> setxattr evil ceph.dir.layout.pool badfs_evilpool
ceph.dir.layout.pool is successfully set to badfs_evilpool
CephFS:~/>>> put /usr/bin/ls /evil/ls
$ ceph fs rm_data_pool badfs badfs_evilpool
removed data pool 38 from fsmap

-- 
Alexander Patrakov
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to