On Fri, Oct 8, 2021 at 6:44 AM David Prude <da...@democracynow.org> wrote:
>
> Hello,
>
>     My apologies if this has been answered previously but by attempt to
> find an answer have failed me. I am trying to determine the canonical
> manner for determining how much storage space a cephfs snapshot is
> consuming. It seems that you can determine the size of the referenced
> data by pulling the ceph.dir.rbytes attribute for the the snap
> directory, however there does not seem to be an attribute which
> indicates the storage the snapshot it's self is consuming:
>
> getfattr -d -m - daily_2021-10-07_191702
> # file: daily_2021-10-07_191702
> ceph.dir.entries="17"
> ceph.dir.files="0"
> ceph.dir.rbytes="6129426031788"
> ceph.dir.rctime="1633653849.686409000"
> ceph.dir.rentries="132588"
> ceph.dir.rfiles="97679"
> ceph.dir.rsubdirs="34909"
> ceph.dir.subdirs="17"

Yeah. Because all the allocations are handled by OSDs, and the OSDs
and the MDS don't communicate about individual objects, the
per-snapshot size differential is not actually tracked. Doing so is
infeasible — it's known only by the OSD and potentially changes on
every write to the live data, which is far too much communication to
make happen while keeping any of these systems functional.

>
> I have found in the documentation references to the command "ceph fs
> subvolume snapshot info" which should be able to give snapshot size in
> bytes for a snapshot of a subvolume, however we are not using
> subvolumes.

I am reasonably sure this doesn't do what you seem to want, either — I
think it's just plugging in the rbytes value (much of the subvolume
API exists so it can plug in to the OpenStack Manila interfaces).

> If we assume a cephfs volume "volume" with a top-level
> directory "directory" and an associated snapshot "snapshot":
>
> volume/directory/.snap/snapshot
>
> What is the best way to determine the size consumed by snapshot?

If you really, REALLY need this, the only approach I can come up with
is to traverse the snapshot and the live tree and identify changed
files, and use some heuristic to guess about how much of the data is
actually changed between them.

But the basic problem is that data usage frequently doesn't belong to
a snapshot, it belongs to a SET of snapshots, so even if we did the
data gathering, we can't partition it out between them. If for
instance your data flow looks like this:
AAAA
 -- snapshot 1
BBBB
 -- snapshot 2
 -- snapshot 3
 -- snapshot 4
CCCC
 -- snapshot 5

Then you might say that snapshot 2 is size 4 and snapshots 3 and 4 are
size 0. But if you delete snapshot 2, you can't actually remove BBBB,
because it's required for snapshots 3 and 4.
-Greg

>
> Thank you,
>
> -David
>
>
> --
> David Prude
> Systems Administrator
> PGP Fingerprint: 1DAA 4418 7F7F B8AA F50C  6FDF C294 B58F A286 F847
> Democracy Now!
> www.democracynow.org
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to