Re: [ceph-users] strange error on link() for nfs over cephfs

2017-11-29 Thread Patrick Donnelly
On Wed, Nov 29, 2017 at 3:44 AM, Jens-U. Mozdzen  wrote:
> Hi *,
>
> we recently have switched to using CephFS (with Luminous 12.2.1). On one
> node, we're kernel-mounting the CephFS (kernel 4.4.75, openSUSE version) and
> export it via kernel nfsd. As we're transitioning right now, a number of
> machines still auto-mount users home directories from that nfsd.

You need to try a newer kernel as there have been many fixes since 4.4
which probably have not been backported to your distribution's kernel.

-- 
Patrick Donnelly
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] strange error on link() for nfs over cephfs

2017-11-29 Thread Jens-U. Mozdzen

Hi *,

we recently have switched to using CephFS (with Luminous 12.2.1). On  
one node, we're kernel-mounting the CephFS (kernel 4.4.75, openSUSE  
version) and export it via kernel nfsd. As we're transitioning right  
now, a number of machines still auto-mount users home directories from  
that nfsd.


A strange error that was not present when using the same nfsd  
exporting local-disk-based file systems, has recently surfaced. The  
problem is most visible to the user when doing a ssh-keygen operation  
to remove old keys from their "known_hosts", but it seems likely that  
this error will occur in other constellations, too.


The error report from "ssh_keygen" is:

--- cut here ---
user@host:~> ssh-keygen -R somehost -f /home/user/.ssh/known_hosts
# Host somehost found: line 232
link /home/user/.ssh/known_hosts to /home/user/.ssh/known_hosts.old:  
Not a directory

user@host:~>
--- cut here ---

This error persists... until the user lists the contents of the  
directory containing the "known_hosts" file (~/.ssh). Once that is  
done (i.e. "ls -l ~/.ssh"), ssh_keygen works as expected.


We've strace'd ssh_keygen and see the following steps (and more, of course):

- the original known_hosts file is opened successfully
- a temp file is created in .ssh (successfully)
- a previous backup copy (known_hosts.old) is unlink()ed (not  
successful, since not present)

- a link() from known_hosts to known_hosts.old is tried - ENOTDIR

--- cut here ---
[...]
unlink("/home/user/.ssh/known_hosts.old") = -1 ENOENT (No such file or  
directory)
link("/home/user/.ssh/known_hosts", "/home/user/.ssh/known_hosts.old")  
= -1 ENOTDIR (Not a directory)

--- cut here ---

Once the directory was listed, the link() call works nicely:

--- cut here ---
unlink("/home/user/.ssh/known_hosts.old") = -1 ENOENT (No such file or  
directory)

link("/home/user/.ssh/known_hosts", "/home/user/.ssh/known_hosts.old") = 0
rename("/home/user/.ssh/known_hosts.5trpXBpIgB",  
"/home/user/.ssh/known_hosts") = 0

--- cut here ---

When link() returns an error, the rename is not called, leaving the  
user with ("try" times) temporary files in .ssh - they never got  
renamed.


This does sound like a bug to me, has anybody else stumbled across  
similar symptoms as well?


Regards,
Jens


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com