[ceph-users] Re: Orphaned CephFS objects

Trey Palmer Sat, 18 Oct 2025 12:43:55 -0700

Frédéric,

Thanks so much for looking into this.


The documentation isn't all that clear, but my impression has been that
pool snapshots are an entirely different thing from CephFS snapshots.

At least the documentation says this, and it sounds from the bug report you
posted like it's dealing with mon-managed snapshots?

To avoid snap id collision between mon-managed snapshots and file system
snapshots, pools with mon-managed snapshots are not allowed to be attached
to a file system. Also, mon-managed snapshots can’t be created in pools
already attached to a file system either.

I'd love for my impression to be incorrect and to be able to fix it this
way, though!

Thanks again,

Trey



On Mon, Oct 6, 2025 at 10:26 AM Frédéric Nass <[email protected]>
wrote:

> Hi Greg,
>
> This one? https://tracker.ceph.com/issues/64646
>
> Symptoms:
> - CLONES are reported on 'rados df' while pool has no snapshots.
> - 'rados lssnap -p <pool_name>' command shows no snapshots but some clones
> are listed by 'rados listsnaps -p <pool_name> <object_name>' even sometimes
> with no 'head' object.
>
> @Trey, if this is the one --- make sure it is before running the command
> --- running a 'ceph osd pool force-remove-snap <pool_name>' should put all
> leaked clone objects back in the trim queue and the OSDs should get rid of
> them.
>
> Regards,
> Frédéric.
>
>
> Frédéric Nass
>
> Senior Ceph Engineer
>
> Ceph Ambassador, France
>
>   +49 89 215252-751 <https://call.ctrlq.org/+49%2089%20215252-751>
>
>   [email protected]
>
>   www.clyso.com
>
>   Hohenzollernstr. 27, 80801 Munich
>
> Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306
>
>
>
>
> Le ven. 3 oct. 2025 à 05:19, Gregory Farnum <[email protected]> a écrit :
>
>> There was a RADOS bug with trimming deleted snapshots, but I don’t have
>> the
>> tracker ticket for more details — it involved some subtleties with one of
>> the snaptrim reworks. Since CephFS doesn’t have any link to the snapshots,
>> I suspect it was that.
>> -Greg
>>
>> On Thu, Oct 2, 2025 at 5:08 PM Anthony D'Atri <[email protected]>
>> wrote:
>>
>> > FWIW, my understanding is that this RGW issue was fixed several releases
>> > ago.  The OP’s cluster IIRC is mostly CephFS, so I suspect something
>> else
>> > is going on.
>> >
>> >
>> > > On Oct 2, 2025, at 7:29 PM, Manuel Rios - EDH <
>> [email protected]>
>> > wrote:
>> > >
>> > > Hi,
>> > >
>> > > Here user that suffer years ago a problem with orphans.
>> > >
>> > > Years ago, after much research, we discovered that for some reason the
>> > WALLDB/Metadata entries were being deleted and corrupted, but the data
>> on
>> > the disks weren't physically erased.
>> > > Sometimes the garbage collector (deferred delete) would fail and skip
>> > the deletion, leaving hundreds of TB behind.
>> > > Speaking with other heavy CEPH users, they were aware of this and
>> > couldn't find a great solution either, just they instead use replica 3 ,
>> > used replica 4. (big customers with big budget)
>> > > At the time, we were presented with two options: wipe each disk, and
>> > CEPH would only rebuild the data it knows is valid, but that would take
>> > time, maybe in your case where your full NVME will take not too much. Or
>> > create a new cluster and move the valid data.
>> > >
>> > > In our case ceph orphan tool start looping due bugs and didn’t
>> provide a
>> > real solution, our case 1PB ceph, with aprox 300TB orphaned.
>> > >
>> > > I remember ceph orphan tool running for weeks ☹ bad ass time.
>> > >
>> > > Our ceph use case : S3 and version 12 to 14 nautilus...
>> > >
>> > > Sometimes , we as administrators doesn’t care about this issues until
>> > you need to wipe a lot of data. And you use simple calc and don’t match.
>> > >
>> > > Regards,
>> > >
>> > > -----Mensaje original-----
>> > > De: Alexander Patrakov <[email protected]>
>> > > Enviado el: jueves, 2 de octubre de 2025 22:56
>> > > Para: Anthony D'Atri <[email protected]>
>> > > CC: [email protected]
>> > > Asunto: [ceph-users] Re: Orphaned CephFS objects
>> > >
>> > > On Thu, Oct 2, 2025 at 9:45 PM Anthony D'Atri <
>> [email protected]>
>> > wrote:
>> > >
>> > >> There is design work for a future ability to migrate a pool
>> > transparently, for example to effect a new EC profile, but that won't be
>> > available anytime soon.
>> > >
>> > > This is, unfortunately, irrelevant in this case. Migrating a pool will
>> > > migrate all the objects and their snapshots, even the unwanted ones.
>> > > What Trey has (as far as I understood) is that there are some
>> > > RADOS-level snapshots that do not correspond to any CephFS-level
>> > > snapshots and are thus garbage, not to be migrated.
>> > >
>> > > That's why the talk about file migration and not pool-level
>> operations.
>> > >
>> > > Now to the original question:
>> > >
>> > >> will I be able to do 'ceph fs rm_data_pool'  once there are no longer
>> > any
>> > >> objects associated with the CephFS instance on the pool, or will the
>> MDS
>> > >> have ghost object records that cause the command to balk?
>> > >
>> > > Just tested in a test cluster - it won't balk and won't demand force
>> > > even if you remove a pool that is actually used by files. So beware.
>> > >
>> > > $ ceph osd pool create badfs_evilpool 32 ssd-only
>> > > pool 'badfs_evilpool' created
>> > > $ ceph fs add_data_pool badfs badfs_evilpool
>> > > added data pool 38 to fsmap
>> > > $ ceph fs ls
>> > > name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data
>> > > cephfs_data_wrongpool cephfs_data_rightpool cephfs_data_hdd ]
>> > > name: badfs, metadata pool: badfs_metadata, data pools: [badfs_data
>> > > badfs_evilpool ]
>> > > $ cephfs-shell -f badfs
>> > > CephFS:~/>>> ls
>> > > dir1/   dir2/
>> > > CephFS:~/>>> mkdir evil
>> > > CephFS:~/>>> setxattr evil ceph.dir.layout.pool badfs_evilpool
>> > > ceph.dir.layout.pool is successfully set to badfs_evilpool
>> > > CephFS:~/>>> put /usr/bin/ls /evil/ls
>> > > $ ceph fs rm_data_pool badfs badfs_evilpool
>> > > removed data pool 38 from fsmap
>> > >
>> > > --
>> > > Alexander Patrakov
>> > > _______________________________________________
>> > > ceph-users mailing list -- [email protected]
>> > > To unsubscribe send an email to [email protected]
>> > > _______________________________________________
>> > > ceph-users mailing list -- [email protected]
>> > > To unsubscribe send an email to [email protected]
>> > _______________________________________________
>> > ceph-users mailing list -- [email protected]
>> > To unsubscribe send an email to [email protected]
>> >
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Orphaned CephFS objects

Reply via email to