Dear Marc

> Adding to this. I can remember that I was surprised that a mv on cephfs 
> between directories linked to different pools

This is documented behaviour and should not be surprising. Placement is 
assigned on file creation time. Hence, placement changes only affect newly 
created files, existing files retain their placement. To perform a migration, a 
full copy must be executed.

This is, in fact, what is expected and also wished for (only unavoidable data 
movement should happen automatically, optional data movement on explicit 
request). A move is not a file creation and, therefore, cannot request a change 
of data placement. It is a request for re-linking a file/directory to another 
root (a move on an fs is a pointer-like operation, not a data operation), which 
is the reason why a move should be and usually is atomic and O(1).

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Marc <m...@f1-outsourcing.eu>
Sent: 25 June 2021 10:21:16
To: Frank Schilder; Patrick Donnelly
Cc: ceph-users@ceph.io
Subject: RE: [ceph-users] Re: ceph fs mv does copy, not move

Adding to this. I can remember that I was surprised that a mv on cephfs between 
directories linked to different pools, only some meta(?) data was moved/changed 
and some data stayed still in the old pool.
I am not sure if this is still the same in newer ceph versions, but I rather 
see data being moved completely. That is what everyone expects, regardless if 
this would take more time in this case between different pools.


> -----Original Message-----
> From: Frank Schilder <fr...@dtu.dk>
> Sent: Thursday, 24 June 2021 17:34
> To: Patrick Donnelly <pdonn...@redhat.com>
> Cc: ceph-users@ceph.io
> Subject: [ceph-users] Re: ceph fs mv does copy, not move
>
> Dear Patrick,
>
> thanks for letting me know.
>
> Could you please consider to make this a ceph client mount option, for
> example, '-o fast_move', that enables a code path that enforces an mv to
> be a proper atomic mv with the risk that in some corner cases the target
> quota is overrun? With this option enabled, a move should either be a
> move or fail outright with "out of disk quota" (no partial move, no
> cp+rm at all). The fail should only occur if it is absolutely obvious
> that the target quota will be exceeded. Any corner cases are the
> responsibility of the operator. Application crashes due to incorrect
> error handling are acceptable.
>
> Reasoning:
>
> From a user's/operator's side, the preferred functionality is that in
> cases where a definite quota overrun can securely be detected in
> advance, the move should actually fail with "out of disk quota" instead
> of resorting to cp+rm, potentially leading to partial moves and a total
> mess for users/operators to clean up. In any other case, the quota
> should simply be ignored and the move should be a complete atomic move
> with the risk of exceeding the target quota and IO to stall. A temporary
> stall or fail of IO until the operator increases the quota again is, in
> my opinion and use case, highly preferable over the alternative of
> cp+rm. A quota or a crashed job is fast to fix, a partial move is not.
>
> Some background:
>
> We use ceph fs as an HPC home file system and as a back-end store. Being
> able to move data quickly across the entire file system is essential,
> because users re-factor their directory structure containing huge
> amounts of data quite often for various reasons.
>
> On our system, we set file system quotas mainly for psychological
> reasons. We run a cron job that adjusts the quotas every day to show
> between 20% and 30% free capacity on the mount points. The psychological
> side here is to give an incentive to users to clean up temporary data.
> It is not intended to limit usage seriously, only to limit what can be
> done in between cron job runs as a safe-guard. The pool quotas set the
> real hard limits.
>
> I'm in the process of migrating 100+TB right now and am really happy
> that I still have a client where I can do an O(1) move. It would be a
> disaster if I had now to use rsync or similar, which would take weeks.
>
> Please, in such situations where developers seem to have to make a
> definite choice, consider the possibility of offering operators to
> choose the alternative that suits their use case best. Adding further
> options seems far better than limiting functionality in a way that
> becomes a terrible burden in certain, if not many use cases.
>
> In ceph fs there have been many such decisions that allow for different
> answers from a user/operator perspective. For example, I would prefer if
> I could get rid of the attempted higher POSIX compliance level of ceph
> fs compared with Lustre, just disable all the client-caps and cache-
> coherence management and turn it into an awesome scale-out parallel file
> system. The attempt of POSIX compliant handling of simultaneous writes
> to files offers nothing to us, but costs huge in performance and forces
> users to move away from perfectly reasonable HPC work flows. Also, that
> it takes a TTL to expire before changes on one client become visible on
> another (unless direct_io is used for all IO) is perfectly acceptable
> for us given the potential performance gain due to simpler client-MDS
> communication.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Patrick Donnelly <pdonn...@redhat.com>
> Sent: 24 June 2021 05:29:45
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] ceph fs mv does copy, not move
>
> Hello Frank,
>
> On Tue, Jun 22, 2021 at 2:16 AM Frank Schilder <fr...@dtu.dk> wrote:
> >
> > Dear all,
> >
> > some time ago I reported that the kernel client resorts to a copy
> instead of move when moving a file across quota domains. I was told that
> the fuse client does not have this problem. If enough space is
> available, a move should be a move, not a copy.
> >
> > Today, I tried to move a large file across quota domains testing botn,
> the kernel- and the fuse client. Both still resort to a copy even though
> this issue was addressed quite a while ago
> (https://lists.ceph.io/hyperkitty/list/ceph-
> us...@ceph.io/thread/44AEIHNEGKV4VGCARRTARGFZ264CR4T7/#XY7ZCE3KWHI4QSUNZ
> HDWL3QZQFOHXRQW). The versions I'm using are (CentOS 7)
> >
> > # yum list installed | grep ceph-fuse
> > ceph-fuse.x86_64                      2:13.2.10-0.el7
> @ceph
> >
> > # uname -r
> > 3.10.0-1160.31.1.el7.x86_64
> >
> > Any suggestions how to get this to work? I have to move directories
> containing 100+ TB.
>
> ceph-fuse reverted this behavior in:
> https://tracker.ceph.com/issues/48203
> The kernel had a patch around that time too.
>
> In summary, it was not possible to accurately account for the quota
> usage prior to doing the rename. Rather than allow a quota to
> potentially be massively overrun, we fell back to the old behavior of
> not allowing it.
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to