Re: [ceph-users] cephfs deleting files No space left on device

2019-05-10 Thread Rafael Lopez
Hey Kenneth,

We encountered this when the number of strays (unlinked files yet to be
purged) reached 1 million, which is a result of many many file removals
happening on the fs repeatedly. It can also happen when there are more than
100k files in a dir with default settings.

You can tune it via 'mds_bal_fragment_size_max' setting on the mds either
temporarily to rm files or permanently. Beware setting it too high.

Check num strays in mds cache by running `ceph daemon mds.{mds name} perf
dump` and inspecting the mds cache section for num_strays. The 1 million
limit is a multiple/function of the mds bal fragment size (10x).

Raf


On Fri, May 10, 2019, 9:03 PM Kenneth Waegeman 
wrote:

> Hi all,
>
> I am seeing issues on cephfs running 13.2.5 when deleting files:
>
> [root@osd006 ~]# rm /mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700
> rm: remove regular empty file
> ‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’? y
> rm: cannot remove
> ‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’: No space left
> on device
>
> few minutes later, I can remove it without problem. This happens
> especially when there are a lot of files deleted somewhere on the
> filesystem around the same time.
>
> We already have tuned our mds config:
>
> [mds]
> mds_cache_memory_limit=10737418240
> mds_log_max_expiring=200
> mds_log_max_segments=200
> mds_max_purge_files=2560
> mds_max_purge_ops=327600
> mds_max_purge_ops_per_pg=20
>
> ceph -s is reporting everything clean, and the file system space usage
> is less than 50%, also no full osds or anything.
>
> Is there a way to further debug what the bottleneck is when removing
> files that gives this 'no space left on device' error?
>
>
> Thank you very much!
>
> Kenneth
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs deleting files No space left on device

2019-05-10 Thread Paul Emmerich
About how many files are we talking here?

Implementation detail on file deletion to understand why this might happen:
deletion is async, deleting a file inserts it into the purge queue and the
actual data will be removed in the background.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Fri, May 10, 2019 at 1:04 PM Kenneth Waegeman 
wrote:

> Hi all,
>
> I am seeing issues on cephfs running 13.2.5 when deleting files:
>
> [root@osd006 ~]# rm /mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700
> rm: remove regular empty file
> ‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’? y
> rm: cannot remove
> ‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’: No space left
> on device
>
> few minutes later, I can remove it without problem. This happens
> especially when there are a lot of files deleted somewhere on the
> filesystem around the same time.
>
> We already have tuned our mds config:
>
> [mds]
> mds_cache_memory_limit=10737418240
> mds_log_max_expiring=200
> mds_log_max_segments=200
> mds_max_purge_files=2560
> mds_max_purge_ops=327600
> mds_max_purge_ops_per_pg=20
>
> ceph -s is reporting everything clean, and the file system space usage
> is less than 50%, also no full osds or anything.
>
> Is there a way to further debug what the bottleneck is when removing
> files that gives this 'no space left on device' error?
>
>
> Thank you very much!
>
> Kenneth
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs deleting files No space left on device

2019-05-10 Thread Kenneth Waegeman

Hi all,

I am seeing issues on cephfs running 13.2.5 when deleting files:

[root@osd006 ~]# rm /mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700
rm: remove regular empty file 
‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’? y
rm: cannot remove 
‘/mnt/ceph/backups/osd006.gigalith.os-2b5a3740.1326700’: No space left 
on device


few minutes later, I can remove it without problem. This happens 
especially when there are a lot of files deleted somewhere on the 
filesystem around the same time.


We already have tuned our mds config:

[mds]
mds_cache_memory_limit=10737418240
mds_log_max_expiring=200
mds_log_max_segments=200
mds_max_purge_files=2560
mds_max_purge_ops=327600
mds_max_purge_ops_per_pg=20

ceph -s is reporting everything clean, and the file system space usage 
is less than 50%, also no full osds or anything.


Is there a way to further debug what the bottleneck is when removing 
files that gives this 'no space left on device' error?



Thank you very much!

Kenneth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com