[ceph-users] Re: ceph fs mv does copy, not move

Frank Schilder Thu, 24 Jun 2021 12:35:23 -0700

Dear Patrick,

thanks for letting me know.

Could you please consider to make this a ceph client mount option, for example, 
'-o fast_move', that enables a code path that enforces an mv to be a proper 
atomic mv with the risk that in some corner cases the target quota is overrun? 
With this option enabled, a move should either be a move or fail outright with 
"out of disk quota" (no partial move, no cp+rm at all). The fail should only 
occur if it is absolutely obvious that the target quota will be exceeded. Any 
corner cases are the responsibility of the operator. Application crashes due to 
incorrect error handling are acceptable.

Reasoning:

>From a user's/operator's side, the preferred functionality is that in cases 
>where a definite quota overrun can securely be detected in advance, the move 
>should actually fail with "out of disk quota" instead of resorting to cp+rm, 
>potentially leading to partial moves and a total mess for users/operators to 
>clean up. In any other case, the quota should simply be ignored and the move 
>should be a complete atomic move with the risk of exceeding the target quota 
>and IO to stall. A temporary stall or fail of IO until the operator increases 
>the quota again is, in my opinion and use case, highly preferable over the 
>alternative of cp+rm. A quota or a crashed job is fast to fix, a partial move 
>is not.

Some background:

We use ceph fs as an HPC home file system and as a back-end store. Being able 
to move data quickly across the entire file system is essential, because users 
re-factor their directory structure containing huge amounts of data quite often 
for various reasons.

On our system, we set file system quotas mainly for psychological reasons. We 
run a cron job that adjusts the quotas every day to show between 20% and 30% 
free capacity on the mount points. The psychological side here is to give an 
incentive to users to clean up temporary data. It is not intended to limit 
usage seriously, only to limit what can be done in between cron job runs as a 
safe-guard. The pool quotas set the real hard limits.

I'm in the process of migrating 100+TB right now and am really happy that I 
still have a client where I can do an O(1) move. It would be a disaster if I 
had now to use rsync or similar, which would take weeks.

Please, in such situations where developers seem to have to make a definite 
choice, consider the possibility of offering operators to choose the 
alternative that suits their use case best. Adding further options seems far 
better than limiting functionality in a way that becomes a terrible burden in 
certain, if not many use cases.

In ceph fs there have been many such decisions that allow for different answers 
from a user/operator perspective. For example, I would prefer if I could get 
rid of the attempted higher POSIX compliance level of ceph fs compared with 
Lustre, just disable all the client-caps and cache-coherence management and 
turn it into an awesome scale-out parallel file system. The attempt of POSIX 
compliant handling of simultaneous writes to files offers nothing to us, but 
costs huge in performance and forces users to move away from perfectly 
reasonable HPC work flows. Also, that it takes a TTL to expire before changes 
on one client become visible on another (unless direct_io is used for all IO) 
is perfectly acceptable for us given the potential performance gain due to 
simpler client-MDS communication.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Patrick Donnelly <pdonn...@redhat.com>
Sent: 24 June 2021 05:29:45
To: Frank Schilder
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] ceph fs mv does copy, not move

Hello Frank,

On Tue, Jun 22, 2021 at 2:16 AM Frank Schilder <fr...@dtu.dk> wrote:
>
> Dear all,
>
> some time ago I reported that the kernel client resorts to a copy instead of 
> move when moving a file across quota domains. I was told that the fuse client 
> does not have this problem. If enough space is available, a move should be a 
> move, not a copy.
>
> Today, I tried to move a large file across quota domains testing botn, the 
> kernel- and the fuse client. Both still resort to a copy even though this 
> issue was addressed quite a while ago 
> (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/44AEIHNEGKV4VGCARRTARGFZ264CR4T7/#XY7ZCE3KWHI4QSUNZHDWL3QZQFOHXRQW).
>  The versions I'm using are (CentOS 7)
>
> # yum list installed | grep ceph-fuse
> ceph-fuse.x86_64                      2:13.2.10-0.el7               @ceph
>
> # uname -r
> 3.10.0-1160.31.1.el7.x86_64
>
> Any suggestions how to get this to work? I have to move directories 
> containing 100+ TB.

ceph-fuse reverted this behavior in: https://tracker.ceph.com/issues/48203
The kernel had a patch around that time too.

In summary, it was not possible to accurately account for the quota
usage prior to doing the rename. Rather than allow a quota to
potentially be massively overrun, we fell back to the old behavior of
not allowing it.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph fs mv does copy, not move

Reply via email to