[ceph-users] Re: Orphaned CephFS objects

Trey Palmer Sat, 18 Oct 2025 13:29:28 -0700

> There was a RADOS bug with trimming deleted snapshots, but I don’t have
the
tracker ticket for more details — it involved some subtleties with one of
the snaptrim reworks. Since CephFS doesn’t have any link to the snapshots,
> I suspect it was that.
> -Greg


This sounds like exactly what it was.  And we deleted even more on a
cluster on a newer version, and it didn't have this problem.

It doesn't *completely* make sense to me that CephFS has no link to the
snapshots, given that you access them via directories, and create and
remove via mkdir and rmdir, which I would think surely must mean the MDS is
aware of them?

But I claim no sophisticated knowledge.  I'm fairly new to CephFS, most of
my Ceph experience (going back almost a decade) is with RadosGW.

Thanks,

Trey

On Thu, Oct 2, 2025 at 11:28 PM Gregory Farnum <[email protected]> wrote:

> There was a RADOS bug with trimming deleted snapshots, but I don’t have the
> tracker ticket for more details — it involved some subtleties with one of
> the snaptrim reworks. Since CephFS doesn’t have any link to the snapshots,
> I suspect it was that.
> -Greg
>
> On Thu, Oct 2, 2025 at 5:08 PM Anthony D'Atri <[email protected]>
> wrote:
>
> > FWIW, my understanding is that this RGW issue was fixed several releases
> > ago.  The OP’s cluster IIRC is mostly CephFS, so I suspect something else
> > is going on.
> >
> >
> > > On Oct 2, 2025, at 7:29 PM, Manuel Rios - EDH <
> [email protected]>
> > wrote:
> > >
> > > Hi,
> > >
> > > Here user that suffer years ago a problem with orphans.
> > >
> > > Years ago, after much research, we discovered that for some reason the
> > WALLDB/Metadata entries were being deleted and corrupted, but the data on
> > the disks weren't physically erased.
> > > Sometimes the garbage collector (deferred delete) would fail and skip
> > the deletion, leaving hundreds of TB behind.
> > > Speaking with other heavy CEPH users, they were aware of this and
> > couldn't find a great solution either, just they instead use replica 3 ,
> > used replica 4. (big customers with big budget)
> > > At the time, we were presented with two options: wipe each disk, and
> > CEPH would only rebuild the data it knows is valid, but that would take
> > time, maybe in your case where your full NVME will take not too much. Or
> > create a new cluster and move the valid data.
> > >
> > > In our case ceph orphan tool start looping due bugs and didn’t provide
> a
> > real solution, our case 1PB ceph, with aprox 300TB orphaned.
> > >
> > > I remember ceph orphan tool running for weeks ☹ bad ass time.
> > >
> > > Our ceph use case : S3 and version 12 to 14 nautilus...
> > >
> > > Sometimes , we as administrators doesn’t care about this issues until
> > you need to wipe a lot of data. And you use simple calc and don’t match.
> > >
> > > Regards,
> > >
> > > -----Mensaje original-----
> > > De: Alexander Patrakov <[email protected]>
> > > Enviado el: jueves, 2 de octubre de 2025 22:56
> > > Para: Anthony D'Atri <[email protected]>
> > > CC: [email protected]
> > > Asunto: [ceph-users] Re: Orphaned CephFS objects
> > >
> > > On Thu, Oct 2, 2025 at 9:45 PM Anthony D'Atri <[email protected]
> >
> > wrote:
> > >
> > >> There is design work for a future ability to migrate a pool
> > transparently, for example to effect a new EC profile, but that won't be
> > available anytime soon.
> > >
> > > This is, unfortunately, irrelevant in this case. Migrating a pool will
> > > migrate all the objects and their snapshots, even the unwanted ones.
> > > What Trey has (as far as I understood) is that there are some
> > > RADOS-level snapshots that do not correspond to any CephFS-level
> > > snapshots and are thus garbage, not to be migrated.
> > >
> > > That's why the talk about file migration and not pool-level operations.
> > >
> > > Now to the original question:
> > >
> > >> will I be able to do 'ceph fs rm_data_pool'  once there are no longer
> > any
> > >> objects associated with the CephFS instance on the pool, or will the
> MDS
> > >> have ghost object records that cause the command to balk?
> > >
> > > Just tested in a test cluster - it won't balk and won't demand force
> > > even if you remove a pool that is actually used by files. So beware.
> > >
> > > $ ceph osd pool create badfs_evilpool 32 ssd-only
> > > pool 'badfs_evilpool' created
> > > $ ceph fs add_data_pool badfs badfs_evilpool
> > > added data pool 38 to fsmap
> > > $ ceph fs ls
> > > name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data
> > > cephfs_data_wrongpool cephfs_data_rightpool cephfs_data_hdd ]
> > > name: badfs, metadata pool: badfs_metadata, data pools: [badfs_data
> > > badfs_evilpool ]
> > > $ cephfs-shell -f badfs
> > > CephFS:~/>>> ls
> > > dir1/   dir2/
> > > CephFS:~/>>> mkdir evil
> > > CephFS:~/>>> setxattr evil ceph.dir.layout.pool badfs_evilpool
> > > ceph.dir.layout.pool is successfully set to badfs_evilpool
> > > CephFS:~/>>> put /usr/bin/ls /evil/ls
> > > $ ceph fs rm_data_pool badfs badfs_evilpool
> > > removed data pool 38 from fsmap
> > >
> > > --
> > > Alexander Patrakov
> > > _______________________________________________
> > > ceph-users mailing list -- [email protected]
> > > To unsubscribe send an email to [email protected]
> > > _______________________________________________
> > > ceph-users mailing list -- [email protected]
> > > To unsubscribe send an email to [email protected]
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> >
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Orphaned CephFS objects

Reply via email to