[ceph-users] Re: Orphaned CephFS objects

Gregory Farnum Fri, 17 Oct 2025 21:43:42 -0700

On Fri, Oct 3, 2025 at 8:26 AM Trey Palmer <[email protected]> wrote:


> > There was a RADOS bug with trimming deleted snapshots, but I don’t have
> the
> tracker ticket for more details — it involved some subtleties with one of
> the snaptrim reworks. Since CephFS doesn’t have any link to the snapshots,
> > I suspect it was that.
> > -Greg
>
> This sounds like exactly what it was.  And we deleted even more on a
> cluster on a newer version, and it didn't have this problem.
>
> It doesn't *completely* make sense to me that CephFS has no link to the
> snapshots, given that you access them via directories, and create and
> remove via mkdir and rmdir, which I would think surely must mean the MDS is
> aware of them?
>

The MDS/client created the snapshotted data, but deleting a snapshot is
just a metadata update: it tells the monitor “please delete snapshot ID
1234”, and the monitor puts 1234 in the osdmap as a snapshot to be deleted.
Then the OSDs do that work asynchronously and outside of cephfs’
supervision.
So, that must have happened, and there was this bug in RADOS that meant the
snapshot didn’t actually get (fully) trimmed.
-Greg


> But I claim no sophisticated knowledge.  I'm fairly new to CephFS, most of
> my Ceph experience (going back almost a decade) is with RadosGW.
>
> Thanks,
>
> Trey
>
> On Thu, Oct 2, 2025 at 11:28 PM Gregory Farnum <[email protected]> wrote:
>
>> There was a RADOS bug with trimming deleted snapshots, but I don’t have
>> the
>> tracker ticket for more details — it involved some subtleties with one of
>> the snaptrim reworks. Since CephFS doesn’t have any link to the snapshots,
>> I suspect it was that.
>> -Greg
>>
>> On Thu, Oct 2, 2025 at 5:08 PM Anthony D'Atri <[email protected]>
>> wrote:
>>
>> > FWIW, my understanding is that this RGW issue was fixed several releases
>> > ago.  The OP’s cluster IIRC is mostly CephFS, so I suspect something
>> else
>> > is going on.
>> >
>> >
>> > > On Oct 2, 2025, at 7:29 PM, Manuel Rios - EDH <
>> [email protected]>
>> > wrote:
>> > >
>> > > Hi,
>> > >
>> > > Here user that suffer years ago a problem with orphans.
>> > >
>> > > Years ago, after much research, we discovered that for some reason the
>> > WALLDB/Metadata entries were being deleted and corrupted, but the data
>> on
>> > the disks weren't physically erased.
>> > > Sometimes the garbage collector (deferred delete) would fail and skip
>> > the deletion, leaving hundreds of TB behind.
>> > > Speaking with other heavy CEPH users, they were aware of this and
>> > couldn't find a great solution either, just they instead use replica 3 ,
>> > used replica 4. (big customers with big budget)
>> > > At the time, we were presented with two options: wipe each disk, and
>> > CEPH would only rebuild the data it knows is valid, but that would take
>> > time, maybe in your case where your full NVME will take not too much. Or
>> > create a new cluster and move the valid data.
>> > >
>> > > In our case ceph orphan tool start looping due bugs and didn’t
>> provide a
>> > real solution, our case 1PB ceph, with aprox 300TB orphaned.
>> > >
>> > > I remember ceph orphan tool running for weeks ☹ bad ass time.
>> > >
>> > > Our ceph use case : S3 and version 12 to 14 nautilus...
>> > >
>> > > Sometimes , we as administrators doesn’t care about this issues until
>> > you need to wipe a lot of data. And you use simple calc and don’t match.
>> > >
>> > > Regards,
>> > >
>> > > -----Mensaje original-----
>> > > De: Alexander Patrakov <[email protected]>
>> > > Enviado el: jueves, 2 de octubre de 2025 22:56
>> > > Para: Anthony D'Atri <[email protected]>
>> > > CC: [email protected]
>> > > Asunto: [ceph-users] Re: Orphaned CephFS objects
>> > >
>> > > On Thu, Oct 2, 2025 at 9:45 PM Anthony D'Atri <
>> [email protected]>
>> > wrote:
>> > >
>> > >> There is design work for a future ability to migrate a pool
>> > transparently, for example to effect a new EC profile, but that won't be
>> > available anytime soon.
>> > >
>> > > This is, unfortunately, irrelevant in this case. Migrating a pool will
>> > > migrate all the objects and their snapshots, even the unwanted ones.
>> > > What Trey has (as far as I understood) is that there are some
>> > > RADOS-level snapshots that do not correspond to any CephFS-level
>> > > snapshots and are thus garbage, not to be migrated.
>> > >
>> > > That's why the talk about file migration and not pool-level
>> operations.
>> > >
>> > > Now to the original question:
>> > >
>> > >> will I be able to do 'ceph fs rm_data_pool'  once there are no longer
>> > any
>> > >> objects associated with the CephFS instance on the pool, or will the
>> MDS
>> > >> have ghost object records that cause the command to balk?
>> > >
>> > > Just tested in a test cluster - it won't balk and won't demand force
>> > > even if you remove a pool that is actually used by files. So beware.
>> > >
>> > > $ ceph osd pool create badfs_evilpool 32 ssd-only
>> > > pool 'badfs_evilpool' created
>> > > $ ceph fs add_data_pool badfs badfs_evilpool
>> > > added data pool 38 to fsmap
>> > > $ ceph fs ls
>> > > name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data
>> > > cephfs_data_wrongpool cephfs_data_rightpool cephfs_data_hdd ]
>> > > name: badfs, metadata pool: badfs_metadata, data pools: [badfs_data
>> > > badfs_evilpool ]
>> > > $ cephfs-shell -f badfs
>> > > CephFS:~/>>> ls
>> > > dir1/   dir2/
>> > > CephFS:~/>>> mkdir evil
>> > > CephFS:~/>>> setxattr evil ceph.dir.layout.pool badfs_evilpool
>> > > ceph.dir.layout.pool is successfully set to badfs_evilpool
>> > > CephFS:~/>>> put /usr/bin/ls /evil/ls
>> > > $ ceph fs rm_data_pool badfs badfs_evilpool
>> > > removed data pool 38 from fsmap
>> > >
>> > > --
>> > > Alexander Patrakov
>> > > _______________________________________________
>> > > ceph-users mailing list -- [email protected]
>> > > To unsubscribe send an email to [email protected]
>> > > _______________________________________________
>> > > ceph-users mailing list -- [email protected]
>> > > To unsubscribe send an email to [email protected]
>> > _______________________________________________
>> > ceph-users mailing list -- [email protected]
>> > To unsubscribe send an email to [email protected]
>> >
>> _______________________________________________
>> ceph-users mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
>
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Orphaned CephFS objects

Reply via email to