FWIW, my understanding is that this RGW issue was fixed several releases ago.  
The OP’s cluster IIRC is mostly CephFS, so I suspect something else is going on.


> On Oct 2, 2025, at 7:29 PM, Manuel Rios - EDH <[email protected]> 
> wrote:
> 
> Hi, 
> 
> Here user that suffer years ago a problem with orphans.
> 
> Years ago, after much research, we discovered that for some reason the 
> WALLDB/Metadata entries were being deleted and corrupted, but the data on the 
> disks weren't physically erased. 
> Sometimes the garbage collector (deferred delete) would fail and skip the 
> deletion, leaving hundreds of TB behind.
> Speaking with other heavy CEPH users, they were aware of this and couldn't 
> find a great solution either, just they instead use replica 3 , used replica 
> 4. (big customers with big budget)
> At the time, we were presented with two options: wipe each disk, and CEPH 
> would only rebuild the data it knows is valid, but that would take time, 
> maybe in your case where your full NVME will take not too much. Or create a 
> new cluster and move the valid data.
> 
> In our case ceph orphan tool start looping due bugs and didn’t provide a real 
> solution, our case 1PB ceph, with aprox 300TB orphaned.
> 
> I remember ceph orphan tool running for weeks ☹ bad ass time.
> 
> Our ceph use case : S3 and version 12 to 14 nautilus...
> 
> Sometimes , we as administrators doesn’t care about this issues until you 
> need to wipe a lot of data. And you use simple calc and don’t match.
> 
> Regards,
> 
> -----Mensaje original-----
> De: Alexander Patrakov <[email protected]> 
> Enviado el: jueves, 2 de octubre de 2025 22:56
> Para: Anthony D'Atri <[email protected]>
> CC: [email protected]
> Asunto: [ceph-users] Re: Orphaned CephFS objects
> 
> On Thu, Oct 2, 2025 at 9:45 PM Anthony D'Atri <[email protected]> wrote:
> 
>> There is design work for a future ability to migrate a pool transparently, 
>> for example to effect a new EC profile, but that won't be available anytime 
>> soon.
> 
> This is, unfortunately, irrelevant in this case. Migrating a pool will
> migrate all the objects and their snapshots, even the unwanted ones.
> What Trey has (as far as I understood) is that there are some
> RADOS-level snapshots that do not correspond to any CephFS-level
> snapshots and are thus garbage, not to be migrated.
> 
> That's why the talk about file migration and not pool-level operations.
> 
> Now to the original question:
> 
>> will I be able to do 'ceph fs rm_data_pool'  once there are no longer any
>> objects associated with the CephFS instance on the pool, or will the MDS
>> have ghost object records that cause the command to balk?
> 
> Just tested in a test cluster - it won't balk and won't demand force
> even if you remove a pool that is actually used by files. So beware.
> 
> $ ceph osd pool create badfs_evilpool 32 ssd-only
> pool 'badfs_evilpool' created
> $ ceph fs add_data_pool badfs badfs_evilpool
> added data pool 38 to fsmap
> $ ceph fs ls
> name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data
> cephfs_data_wrongpool cephfs_data_rightpool cephfs_data_hdd ]
> name: badfs, metadata pool: badfs_metadata, data pools: [badfs_data
> badfs_evilpool ]
> $ cephfs-shell -f badfs
> CephFS:~/>>> ls
> dir1/   dir2/
> CephFS:~/>>> mkdir evil
> CephFS:~/>>> setxattr evil ceph.dir.layout.pool badfs_evilpool
> ceph.dir.layout.pool is successfully set to badfs_evilpool
> CephFS:~/>>> put /usr/bin/ls /evil/ls
> $ ceph fs rm_data_pool badfs badfs_evilpool
> removed data pool 38 from fsmap
> 
> -- 
> Alexander Patrakov
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to