Re: [lustre-discuss] Renaming or Moving directories on Lustre?

2023-02-28 Thread Grigory Shamov
Hi Andreas,

Thank you very much! Yes, this explains it.
Looks like rsync or lustre-rsync is still the best solution for now (we are on 
Lustre 2.12).


--
Grigory Shamov
Site Lead / HPC Specialist
University of Manitoba and DRI Alliance Canada


From: Andreas Dilger 
Date: Monday, February 27, 2023 at 11:36 PM
To: Grigory Shamov 
Cc: lustre-discuss 
Subject: Re: [lustre-discuss] Renaming or Moving directories on Lustre?

Caution: This message was sent from outside the University of Manitoba.

On Feb 27, 2023, at 11:57, Grigory Shamov 
mailto:grigory.sha...@umanitoba.ca>> wrote:

Hi All,

What happens if a directory on Lustre FS gets moved with a regular CentOS7 mv 
command, within the same filesystem? On CentOS 7, using mv from the distro, 
like this, as root:

mv /project/TEMP/user  /project/XYZ/user

It looks like the content gets copied entirely. Which for large data takes a 
large amount of time.
Is there a way to rename the Lustre directories (changing the name of the top 
directory, only without moving every object in these directories)?  Thanks!

Renaming a file or subdirectory tree between "regular" directories in Lustre 
works as you would expect for a local filesystem, even if the directories are 
on different MDTs.  What you are seeing (full copy of contents between 
directories) is really a result of the implementation/design of project quotas, 
and not directly a Lustre problem.  The same would happen if you have two 
directories using two different project IDs and the "PROJINHERIT" flag set with 
ext4 or XFS, since they also return "-EXDEV" if trying to move (rename) a file 
between directories that do not have the same project ID, and that causes "mv" 
to copy the whole directory tree.

Running the ext4 "mv" under strace shows this:

# df -T /mnt/tmp
Filesystem Type 1K-blocks  Used Available Use% Mounted on
/dev/mapper/vg_test-lvtest ext4  1633778852  15482492   1% /mnt/tmp
# mkdir /mnt/tmp/{dir1,dir2}
# chattr -P -p 1000 /mnt/tmp/dir1
# chattr -P -p 2000 /mnt/tmp/dir2
# cp /etc/hosts /mnt/tmp/dir1
# lsattr /mnt/tmp/dir1
--eP-- /mnt/tmp/dir1/hosts
# ls -li /mnt/tmp/dir1
total 8
655365 8 -rw-r--r--. 1 root root 7424 Oct 18 22:42 hosts
# strace mv /mnt/tmp/dir1/hosts /mnt/tmp/dir2/hosts
:
renameat2(AT_FDCWD, "/mnt/tmp/dir1/hosts", AT_FDCWD, "/mnt/tmp/dir2/hosts", 
RENAME_NOREPLACE) = -1 EXDEV (Invalid cross-device link)
stat("/mnt/tmp/dir2/hosts", 0x78a6c2b0) = -1 ENOENT (No such file or 
directory)
lstat("/mnt/tmp/dir1/hosts", {st_mode=S_IFREG|0644, st_size=7424, ...}) = 0
newfstatat(AT_FDCWD, "/mnt/tmp/dir2/hosts", 0x78a6bf90, 
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
unlink("/mnt/tmp/dir2/hosts")   = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/mnt/tmp/dir1/hosts", O_RDONLY|O_NOFOLLOW) = 3
openat(AT_FDCWD, "/mnt/tmp/dir2/hosts", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
read(3, "##\n# Host Database\n#\n# Do not re"..., 131072) = 7424
write(4, "##\n# Host Database\n#\n# Do not re"..., 7424) = 7424
:
# lsattr -p /mnt/tmp/dir2
2000 --eP-- /mnt/tmp/dir2/hosts
# ls -li /mnt/tmp/dir2
total 8
786435 8 -rw-r--r--. 1 root root 7424 Oct 18 22:42 hosts

The reason for this limitation is that there is no way to atomically update the 
quota between the two project IDs when a whole subdirectory tree is being moved 
between projects.  There might be thousands of subdirectories and millions of 
files that are being moved, and the project ID needs to be updated on all of 
those files and directories.  This is too large to do atomically in a single 
filesystem transaction.  Rather than try to solve this directly in the kernel, 
the decision of the XFS developers (copied by ext4) is that cross-project 
renames will not be done by the kernel and instead be handled in userspace by 
the "mv" utility, the same way that renames across different filesystems are 
handled.


In Lustre 2.15.0 and later, this cross-project rename constraint has been 
removed for *regular file* renames between directories with different project 
IDs.  This means the file is moved between directories and the project ID and 
associated quota accounting is updated in a single transaction without doing a 
data copy.  However, *directory* renames with PROJINHERIT still have this issue.

To work around this behavior, it is possible to use "chattr - p" (or "lfs 
project -p", they do the same thing) to change the project ID of the source 
files and directories *before* they are renamed so that the file data copy does 
not need to be done, and just the filenames can be moved.

It might be possible to patch "mv" so that instead of bailing on &

Re: [lustre-discuss] Renaming or Moving directories on Lustre?

2023-02-27 Thread Andreas Dilger via lustre-discuss
On Feb 27, 2023, at 11:57, Grigory Shamov 
mailto:grigory.sha...@umanitoba.ca>> wrote:

Hi All,

What happens if a directory on Lustre FS gets moved with a regular CentOS7 mv 
command, within the same filesystem? On CentOS 7, using mv from the distro, 
like this, as root:

mv /project/TEMP/user  /project/XYZ/user

It looks like the content gets copied entirely. Which for large data takes a 
large amount of time.
Is there a way to rename the Lustre directories (changing the name of the top 
directory, only without moving every object in these directories)?  Thanks!

Renaming a file or subdirectory tree between "regular" directories in Lustre 
works as you would expect for a local filesystem, even if the directories are 
on different MDTs.  What you are seeing (full copy of contents between 
directories) is really a result of the implementation/design of project quotas, 
and not directly a Lustre problem.  The same would happen if you have two 
directories using two different project IDs and the "PROJINHERIT" flag set with 
ext4 or XFS, since they also return "-EXDEV" if trying to move (rename) a file 
between directories that do not have the same project ID, and that causes "mv" 
to copy the whole directory tree.

Running the ext4 "mv" under strace shows this:

# df -T /mnt/tmp
Filesystem Type 1K-blocks  Used Available Use% Mounted on
/dev/mapper/vg_test-lvtest ext4  1633778852  15482492   1% /mnt/tmp
# mkdir /mnt/tmp/{dir1,dir2}
# chattr -P -p 1000 /mnt/tmp/dir1
# chattr -P -p 2000 /mnt/tmp/dir2
# cp /etc/hosts /mnt/tmp/dir1
# lsattr /mnt/tmp/dir1
--eP-- /mnt/tmp/dir1/hosts
# ls -li /mnt/tmp/dir1
total 8
655365 8 -rw-r--r--. 1 root root 7424 Oct 18 22:42 hosts
# strace mv /mnt/tmp/dir1/hosts /mnt/tmp/dir2/hosts
:
renameat2(AT_FDCWD, "/mnt/tmp/dir1/hosts", AT_FDCWD, "/mnt/tmp/dir2/hosts", 
RENAME_NOREPLACE) = -1 EXDEV (Invalid cross-device link)
stat("/mnt/tmp/dir2/hosts", 0x78a6c2b0) = -1 ENOENT (No such file or 
directory)
lstat("/mnt/tmp/dir1/hosts", {st_mode=S_IFREG|0644, st_size=7424, ...}) = 0
newfstatat(AT_FDCWD, "/mnt/tmp/dir2/hosts", 0x78a6bf90, 
AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or directory)
unlink("/mnt/tmp/dir2/hosts")   = -1 ENOENT (No such file or 
directory)
openat(AT_FDCWD, "/mnt/tmp/dir1/hosts", O_RDONLY|O_NOFOLLOW) = 3
openat(AT_FDCWD, "/mnt/tmp/dir2/hosts", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
read(3, "##\n# Host Database\n#\n# Do not re"..., 131072) = 7424
write(4, "##\n# Host Database\n#\n# Do not re"..., 7424) = 7424
:
# lsattr -p /mnt/tmp/dir2
2000 --eP-- /mnt/tmp/dir2/hosts
# ls -li /mnt/tmp/dir2
total 8
786435 8 -rw-r--r--. 1 root root 7424 Oct 18 22:42 hosts

The reason for this limitation is that there is no way to atomically update the 
quota between the two project IDs when a whole subdirectory tree is being moved 
between projects.  There might be thousands of subdirectories and millions of 
files that are being moved, and the project ID needs to be updated on all of 
those files and directories.  This is too large to do atomically in a single 
filesystem transaction.  Rather than try to solve this directly in the kernel, 
the decision of the XFS developers (copied by ext4) is that cross-project 
renames will not be done by the kernel and instead be handled in userspace by 
the "mv" utility, the same way that renames across different filesystems are 
handled.


In Lustre 2.15.0 and later, this cross-project rename constraint has been 
removed for *regular file* renames between directories with different project 
IDs.  This means the file is moved between directories and the project ID and 
associated quota accounting is updated in a single transaction without doing a 
data copy.  However, *directory* renames with PROJINHERIT still have this issue.

To work around this behavior, it is possible to use "chattr - p" (or "lfs 
project -p", they do the same thing) to change the project ID of the source 
files and directories *before* they are renamed so that the file data copy does 
not need to be done, and just the filenames can be moved.

It might be possible to patch "mv" so that instead of bailing on "rename()" 
after the first EXDEV return, it creates the target directory and then tries to 
rename the files within the source directory to the target, before it does the 
file copy.  It is likely that ext4 could also be patched to allow regular file 
renames without returning EXDEV.

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Renaming or Moving directories on Lustre?

2023-02-27 Thread Philippe Weill via lustre-discuss



Le 27/02/2023 à 19:57, Grigory Shamov a écrit :

Hi All,

What happens if a directory on Lustre FS gets moved with a regular CentOS7 mv 
command, within the same filesystem? On CentOS 7, using mv from the distro, 
like this, as root:
  
mv /project/TEMP/user  /project/XYZ/user
  
It looks like the content gets copied entirely. Which for large data takes a large amount of time.


Are you using project quota ?


Is there a way to rename the Lustre directories (changing the name of the top 
directory, only without moving every object in these directories)?  Thanks!




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Renaming or Moving directories on Lustre?

2023-02-27 Thread Grigory Shamov
Hi All,

What happens if a directory on Lustre FS gets moved with a regular CentOS7 mv 
command, within the same filesystem? On CentOS 7, using mv from the distro, 
like this, as root:
 
mv /project/TEMP/user  /project/XYZ/user
 
It looks like the content gets copied entirely. Which for large data takes a 
large amount of time.
Is there a way to rename the Lustre directories (changing the name of the top 
directory, only without moving every object in these directories)?  Thanks!


-- 
Grigory Shamov 
Site Lead / HPC Specialist 
University of Manitoba and DRI Alliance Canada 






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org