Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
On Wed, 21 May 2014 00:06:22 -0700 Anand Avati av...@gluster.org wrote: On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: - Original Message - From: Anand Avati av...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Edward Shishkin edw...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 10:53:54 AM Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t There are a few suspicious things going on here.. On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: hi, crypt.t is failing regression builds once in a while and most of the times it is because of the failures just after the remount in the script. TEST rm -f $M0/testfile-symlink TEST rm -f $M0/testfile-link Both of these are failing with ENOTCONN. I got a chance to look at the logs. According to the brick logs, this is what I see: [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected posix_open() happening on a symlink? This should NEVER happen. glusterfs itself should NEVER EVER by triggering symlink resolution on the server. In this case, for whatever reason an open() is attempted on a symlink, and it is getting followed back onto gluster's own mount point (test case is creating an absolute link). So first find out: who is triggering fop-open() on a symlink. Fix the caller. Next: add a check in posix_open() to fail with ELOOP or EINVAL if the inode is a symlink. I think I understood what you are saying. Open call for symlink on fuse mount lead to an open call again for the target on the same fuse mount. It's not that simple. The client VFS is intelligent enough to resolve symlinks and send open() only on non-symlinks. And the test case script was doing an obvious unlink() (TEST rm -f filename), so it was not initiated by an open() attempt in the first place. My guess is that some xlator (probably crypt?) is doing an open() on an inode Ah, it is quite possible, that it is the crypt.. I'll take a look. Thanks for the hint, I stupidly increased the testcases without chances to reproduce the problem.. and that is going through unchecked in posix. It is a bug in both the caller and posix, but the onus/responsibility is on posix to disallow open() on anything but regular files (even open() on character or block devices should not happen in posix). Which lead to deadlock :). That is why we disallow opens on symlink in gluster? That's not just why open on symlink is disallowed in gluster, it is a more generic problem of following symlinks in general inside gluster. Symlink resolution must strictly happen only in the outermost VFS. Following symlinks inside the filesystem is not only an invalid operation, but can lead to all kinds of deadlocks, security holes (what if you opened a symlink which points to /etc/passwd, should it show the contents of the client machine's /etc/passwd or the server? Now what if you wrote to the file through the symlink? etc. you get the idea..) and wrong/weird/dangerous behaviors. This is not just related to following symlinks, even open()ing special devices.. e.g if you create a char device file with major/minor number of an audio device and wrote pcm data into it, should it play music on the client machine or in the server machine? etc. The summary is, following symlinks or opening non-regular files is VFS/client operation and are invalid operations in a filesystem context. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
- Original Message - From: Anand Avati av...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Edward Shishkin edw...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 12:36:22 PM Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: - Original Message - From: Anand Avati av...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Edward Shishkin edw...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 10:53:54 AM Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t There are a few suspicious things going on here.. On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: hi, crypt.t is failing regression builds once in a while and most of the times it is because of the failures just after the remount in the script. TEST rm -f $M0/testfile-symlink TEST rm -f $M0/testfile-link Both of these are failing with ENOTCONN. I got a chance to look at the logs. According to the brick logs, this is what I see: [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected posix_open() happening on a symlink? This should NEVER happen. glusterfs itself should NEVER EVER by triggering symlink resolution on the server. In this case, for whatever reason an open() is attempted on a symlink, and it is getting followed back onto gluster's own mount point (test case is creating an absolute link). So first find out: who is triggering fop-open() on a symlink. Fix the caller. http://review.gluster.org/7824 Next: add a check in posix_open() to fail with ELOOP or EINVAL if the inode is a symlink. http://review.gluster.org/7823 I think I understood what you are saying. Open call for symlink on fuse mount lead to an open call again for the target on the same fuse mount. It's not that simple. The client VFS is intelligent enough to resolve symlinks and send open() only on non-symlinks. And the test case script was doing an obvious unlink() (TEST rm -f filename), so it was not initiated by an open() attempt in the first place. My guess is that some xlator (probably crypt?) is doing an open() on an inode and that is going through unchecked in posix. It is a bug in both the caller and posix, but the onus/responsibility is on posix to disallow open() on anything but regular files (even open() on character or block devices should not happen in posix). Which lead to deadlock :). That is why we disallow opens on symlink in gluster? That's not just why open on symlink is disallowed in gluster, it is a more generic problem of following symlinks in general inside gluster. Symlink resolution must strictly happen only in the outermost VFS. Following symlinks inside the filesystem is not only an invalid operation, but can lead to all kinds of deadlocks, security holes (what if you opened a symlink which points to /etc/passwd, should it show the contents of the client machine's /etc/passwd or the server? Now what if you wrote to the file through the symlink? etc. you get the idea..) and wrong/weird/dangerous behaviors. This is not just related to following symlinks, even open()ing special devices.. e.g if you create a char device file with major/minor number of an audio device and wrote pcm data into it, should it play music on the client machine or in the server machine? etc. The summary is, following symlinks or opening non-regular files is VFS/client operation and are invalid operations in a filesystem context. Now only one question remains. How could it not hang everytime? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
On Sat, 17 May 2014 04:28:45 -0400 (EDT) Pranith Kumar Karampuri pkara...@redhat.com wrote: hi, crypt.t is failing regression builds once in a while and most of the times it is because of the failures just after the remount in the script. TEST rm -f $M0/testfile-symlink TEST rm -f $M0/testfile-link Both of these are failing with ENOTCONN. I got a chance to look at the logs. According to the brick logs, this is what I see: [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected This is the very first time I saw posix failing with ENOTCONN. Do we have these bricks on some other network mounts? I wonder why it fails with ENOTCONN. I also see that it happens right after a call_bail on the mount. Pranith Hello. OK, I'll try to reproduce it. Thanks for the report! Edward. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurios failures in tests/encryption/crypt.t
hi, crypt.t is failing regression builds once in a while and most of the times it is because of the failures just after the remount in the script. TEST rm -f $M0/testfile-symlink TEST rm -f $M0/testfile-link Both of these are failing with ENOTCONN. I got a chance to look at the logs. According to the brick logs, this is what I see: [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected This is the very first time I saw posix failing with ENOTCONN. Do we have these bricks on some other network mounts? I wonder why it fails with ENOTCONN. I also see that it happens right after a call_bail on the mount. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel