Hello,

[resending as plain-text, sorry for the noise]

I noticed that the symlinkat() syscall changes behavior when the newpath (i.e. 
link name) has a trailing slash and is the path to a directory residing on NFS 
depending on this path being in the dentry cache or not. I stumbled upon this 
in the context of a Yocto / OE-Core system where I updated coreutils from 
version 8.30 to 8.31. This creates problems with ln in coreutils in 8.31. I am 
currently using kernel 5.4.90.

What I observe is that sylinkat("name", AT_FDCWD, "/path/to/nfs/existing/dir/") 
returns ENOENT when "/path/to/nfs/existing/dir/" is not in the dentry cache but 
EEXIST when it is, but only when "/path/to/nfs/existing/dir/" is on NFS (NFSv3 
in my case). Note that if I remove the trailing slash from the newpath argument 
then it returns EEXIST in all cases.

Following change 
https://github.com/coreutils/coreutils/commit/571f63f5010b047a8a3250304053f05949faded4
 in coreutils this makes "ln -sf name /path/to/nfs/existing/dir/" sometimes 
fail with a "cannot overwrite directory" error (when the path is not in the 
dentry cache). There was no problem before this change because ln did a stat of 
the link name path before calling symlinkat, so the entry was in the dentry 
cache when symlinkat executes.

I have created a simple program to reproduce this more easily, which I have 
attached.

To reproduce do this.
  - Compile the attached symlinkat.c
  - Mount a NFSv3 filesystem at /mnt
  - mkdir /mnt/test
  - To test the error with no dentry cache and trailing slash:
    sync; echo 3 > /proc/sys/vm/drop_caches; ./symlinkat name /mnt/test/
    symlinkat name /mnt/test/ failed: No such file or directory (2)
  - To test with the dentry cache:
    ls -d /mnt/test/; ./symlinkat name /mnt/test/
    symlinkat name /mnt/test/ failed: File exists (17)
  - To test the error with no dentry cache and no trailing slash:
    sync; echo 3 > /proc/sys/vm/drop_caches; ./symlinkat name /mnt/test
    symlinkat name /mnt/test failed: File exists (17)

Although I'm no kernel expert, from what I've understood of the kernel code 
this seems to be a bad interaction between the generic fs handling in 
fs/namei.c and the NFS client implementation. The filename_create() function 
will call __lookup_hash() after setting LOOKUP_EXCL in the flags and if there 
is no dentry cache for the path then nfs_lookup() will be called, will notice 
this flag in the nfs_is_exclusive_create() test, optimize away the lookup and 
not fill d_inode in the dentry. When execution returns to filename_create() the 
special casing will notice that is_dir is not set and last.name has a trailing 
slash and thus returns ENOENT. Looking for LOOKUP_EXCL usage in the kernel only 
NFS does this kind of optimization in current kernels, but in 3.5 and older the 
same optimization was also done by CIFS.

According to the symlink and symlinkat man pages ENOENT is returned when a 
directory component of newpath does not exist or is a dangling symbolic link, 
which is not the case here.

What would be the best course of action to address this issue?

Thanks,

Diego
-- 
Diego Santa Cruz, PhD
Technology Architect
spinetix.com

#include <fcntl.h>
#include <unistd.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(int argc, char *argv[])
{
        int r;

        if (argc != 3) {
                fprintf(stderr, "usage: sylinkat: <oldpath> <newpath>\n");
                return EXIT_FAILURE;
        }
        r = symlinkat(argv[1], AT_FDCWD, argv[2]);
        if (r != 0) {
                fprintf(stderr, "symlinkat %s %s failed: %s (%i)\n",
                        argv[1], argv[2], strerror(errno), errno);
                return EXIT_FAILURE;
        }
        return EXIT_SUCCESS;
}

Reply via email to