Hi everyone!
Just thought I would let everyone know: The issue appears to have been
the Ceph NFS service associated with the filesystem.
I removed all the files, waited a while, disconnected all the clients,
waited a while, then deleted the NFS shares - the disk space and objects
abruptly
Grep through the ls output for ‘rados bench’ leftovers, it’s easy to leave them
behind.
> On Mar 20, 2024, at 5:28 PM, Igor Fedotov wrote:
>
> Hi Thorne,
>
> unfortunately I'm unaware of any tools high level enough to easily map files
> to rados objects without deep undestanding how this
Thorne,
if that's a bug in Ceph which causes space leakage you might be unable
to reclaim the space without total purge of the pool.
The problem is that we still uncertain if this is a leakage or something
else. Hence the need for more thorough research.
Thanks,
Igor
On 3/20/2024 9:13
Thorne,
if that's a bug in Ceph which causes space leakage you might be unable
to reclaim the space without total purge of the pool.
The problem is that we still uncertain if this is a leakage or something
else. Hence the need for more thorough research.
Thanks,
Igor
On 3/20/2024 9:13
Hi Thorne,
unfortunately I'm unaware of any tools high level enough to easily map
files to rados objects without deep undestanding how this works. You
might want to try "rados ls" command to get the list of all the objects
in the cephfs data pool. And then learn how that mapping is performed
Alexander,
Thanks for explaining this. As I suspected, this is a high abstract
pursuit of what caused the problem, and while I'm sure this makes sense
for Ceph developers, it isn't going to happen in this case.
I don't care how it got this way- the tools used to create this pool
will never
Hi Thorne,
The idea is quite simple. By retesting the leak with a separate pool, used
by nobody except you, in the case if the leak exists and is reproducible
(which is not a given), you can definitely pinpoint it without giving any
chance to the alternate hypothesis "somebody wrote some data in
Alexander,
I'm happy to create a new pool if it will help, but I don't presently
see how creating a new pool will help us to identify the source of the
10TB discrepancy in this original cephfs pool.
Please help me to understand what you are hoping to find...?
On 20/03/2024 6:35 pm,
Thorne,
That's why I asked you to create a separate pool. All writes go to the
original pool, and it is possible to see object counts per-pool.
On Wed, Mar 20, 2024 at 6:32 AM Thorne Lawler wrote:
> Alexander,
>
> Thank you, but as I said to Igor: The 5.5TB of files on this filesystem
> are
> Those files are VM disk images, and they're under constant heavy use, so yes-
> there/is/ constant severe write load against this disk.
Why are you using CephFS for an RBD application?
___
ceph-users mailing list -- ceph-users@ceph.io
To
Alexander,
Thank you, but as I said to Igor: The 5.5TB of files on this filesystem
are virtual machine disks. They are under constant, heavy write load.
There is no way to turn this off.
On 19/03/2024 9:36 pm, Alexander E. Patrakov wrote:
Hello Thorne,
Here is one more suggestion on how to
Igor,
Those files are VM disk images, and they're under constant heavy use, so
yes- there/is/ constant severe write load against this disk.
Apart from writing more test files into the filesystems, there must be
Ceph diagnostic tools to describe what those objects are being used for,
surely?
Hello Thorne,
Here is one more suggestion on how to debug this. Right now, there is
uncertainty on whether there is really a disk space leak or if
something simply wrote new data during the test.
If you have at least three OSDs you can reassign, please set their
CRUSH device class to something
Hi Thorn,
given the amount of files at CephFS volume I presume you don't have
severe write load against it. Is that correct?
If so we can assume that the numbers you're sharing are mostly refer to
your experiment. At peak I can see bytes_used increase = 629,461,893,120
bytes (45978612027392
It's your pool replication (size = 3):
3886733 (number of objects) * 3 = 11660199
Zitat von Thorne Lawler :
Can anyone please tell me what "COPIES" means in this context?
[ceph: root@san2 /]# rados df -p cephfs.shared.data
POOL_NAME USED OBJECTS CLONES COPIES
Can anyone please tell me what "COPIES" means in this context?
[ceph: root@san2 /]# rados df -p cephfs.shared.data
POOL_NAME USED OBJECTS CLONES COPIES
MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS
WR USED COMPR UNDER COMPR
cephfs.shared.data 41
Thanks Igor,
I have tried that, and the number of objects and bytes_used took a long
time to drop, but they seem to have dropped back to almost the original
level:
* Before creating the file:
o 3885835 objects
o 45349150134272 bytes_used
* After creating the file:
o 3931663
Hi Thorn,
so the problem is apparently bound to huge file sizes. I presume they're
split into multiple chunks at ceph side hence producing millions of objects.
And possibly something is wrong with this mapping.
If this pool has no write load at the moment you might want to run the
following
Also, before anyone asks- I have just gone over every client attached to
this filesystem through native CephFS or NFS and checked for deleted
files. There are a total of three deleted files, amounting to about 200G.
On 15/03/2024 10:05 am, Thorne Lawler wrote:
Igor,
Yes. Just a bit.
--Original Message-
From: Igor Fedotov
Sent: March 14, 2024 1:37 PM
To: Thorne Lawler;ceph-users@ceph.io;
etienne.men...@ubisoft.com;vbog...@gmail.com
Subject: [ceph-users] Re: CephFS space usage
Thorn,
you might want to assess amount of files on the mounted fs by runnning "du
-h | wc&q
Igor,
Yes. Just a bit.
root@pmx101:/mnt/pve/iso# du -h | wc -l
10
root@pmx101:/mnt/pve/iso# du -h
0 ./snippets
0 ./tmp
257M ./xcp_nfs_sr/2ba36cf5-291a-17d2-b510-db1a295ce0c2
5.5T ./xcp_nfs_sr/5aacaebb-4469-96f9-729e-fe45eef06a14
5.5T ./xcp_nfs_sr
0 ./failover_test
11G
...@ubisoft.com; vbog...@gmail.com
> Subject: [ceph-users] Re: CephFS space usage
>
> Thorn,
>
> you might want to assess amount of files on the mounted fs by runnning "du
> -h | wc". Does it differ drastically from amount of objects in the pool = ~3.8
> M?
>
Thorn,
you might want to assess amount of files on the mounted fs by runnning
"du -h | wc". Does it differ drastically from amount of objects in the
pool = ~3.8 M?
And just in case - please run "rados lssnap -p cephfs.shared.data".
Thanks,
Igor
On 3/14/2024 1:42 AM, Thorne Lawler wrote:
Igor, Etienne, Bogdan,
The system is a four node cluster. Each node has 12 3.8TB SSDs, and each
SSD is an OSD.
I have not defined any separate DB / WAL devices - this cluster is
mostly at cephadm defaults.
Everything is currently configured to have x3 replicas.
The system also does
Hi Thorn,
could you please share the output of "ceph df detail" command
representing the problem?
And please give an overview of your OSD layout - amount of OSDs, shared
or dedicated DB/WAL, main and DB volume sizes.
Thanks,
Igor
On 3/13/2024 5:58 AM, Thorne Lawler wrote:
Hi
Hi,
Not sure if it was mentioned but also you could check the following:
1. Snapshots
Snapshots can consume a significant amount of space without being
immediately obvious. They preserve the state of the filesystem at various
points in time.
List Snapshots: Use the "*ceph fs subvolume snapshot
Hi,
Check your replication/EC configuration. How do you get your different
sizes/usages?
Étienne
From: Thorne Lawler
Sent: Wednesday, 13 March 2024 03:58
To: ceph-users@ceph.io
Subject: [ceph-users] CephFS space usage
[Some people who received this message
27 matches
Mail list logo