Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-21 Thread Edward Shishkin
On Wed, 21 May 2014 00:06:22 -0700
Anand Avati av...@gluster.org wrote:

 On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri 
 pkara...@redhat.com wrote:
 
 
 
  - Original Message -
   From: Anand Avati av...@gluster.org
   To: Pranith Kumar Karampuri pkara...@redhat.com
   Cc: Edward Shishkin edw...@redhat.com, Gluster Devel 
  gluster-devel@gluster.org
   Sent: Wednesday, May 21, 2014 10:53:54 AM
   Subject: Re: [Gluster-devel] spurios failures in
   tests/encryption/crypt.t
  
   There are a few suspicious things going on here..
  
   On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri 
   pkara...@redhat.com wrote:
  
   
  hi,
   crypt.t is failing regression builds once in a while
  and most
  of
  the times it is because of the failures just after the
  remount in
  the
  script.
 
  TEST rm -f $M0/testfile-symlink
  TEST rm -f $M0/testfile-link
 
  Both of these are failing with ENOTCONN. I got a chance to
  look at the logs. According to the brick logs, this is what
  I see: [2014-05-17 05:43:43.363979] E
  [posix.c:2272:posix_open] 0-patchy-posix: open
  on /d/backends/patchy1/testfile-symlink: Transport endpoint
  is not connected
   
  
   posix_open() happening on a symlink? This should NEVER happen.
   glusterfs itself should NEVER EVER by triggering symlink
   resolution on the server.
  In
   this case, for whatever reason an open() is attempted on a
   symlink, and
  it
   is getting followed back onto gluster's own mount point (test
   case is creating an absolute link).
  
   So first find out: who is triggering fop-open() on a symlink.
   Fix the caller.
  
   Next: add a check in posix_open() to fail with ELOOP or EINVAL if
   the
  inode
   is a symlink.
 
  I think I understood what you are saying. Open call for symlink on
  fuse mount lead to an open call again for the target on the same
  fuse mount.
 
 
 It's not that simple. The client VFS is intelligent enough to resolve
 symlinks and send open() only on non-symlinks. And the test case
 script was doing an obvious unlink() (TEST rm -f filename), so it
 was not initiated by an open() attempt in the first place. My guess
 is that some xlator (probably crypt?) is doing an open() on an inode


Ah, it is quite possible, that it is the crypt.. I'll take a look.
Thanks for the hint, I stupidly increased the testcases without chances
to reproduce the problem..


 and that is going through unchecked in posix. It is a bug in both the
 caller and posix, but the onus/responsibility is on posix to disallow
 open() on anything but regular files (even open() on character or
 block devices should not happen in posix).
 
 
 
  Which lead to deadlock :). That is why we disallow opens on symlink
  in gluster?
 
 
 That's not just why open on symlink is disallowed in gluster, it is a
 more generic problem of following symlinks in general inside gluster.
 Symlink resolution must strictly happen only in the outermost VFS.
 Following symlinks inside the filesystem is not only an invalid
 operation, but can lead to all kinds of deadlocks, security holes
 (what if you opened a symlink which points to /etc/passwd, should it
 show the contents of the client machine's /etc/passwd or the server?
 Now what if you wrote to the file through the symlink? etc. you get
 the idea..) and wrong/weird/dangerous behaviors. This is not just
 related to following symlinks, even open()ing special devices.. e.g
 if you create a char device file with major/minor number of an audio
 device and wrote pcm data into it, should it play music on the client
 machine or in the server machine? etc. The summary is, following
 symlinks or opening non-regular files is VFS/client operation and are
 invalid operations in a filesystem context.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-21 Thread Pranith Kumar Karampuri


- Original Message -
 From: Anand Avati av...@gluster.org
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Edward Shishkin edw...@redhat.com, Gluster Devel 
 gluster-devel@gluster.org
 Sent: Wednesday, May 21, 2014 12:36:22 PM
 Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
 
 On Tue, May 20, 2014 at 10:54 PM, Pranith Kumar Karampuri 
 pkara...@redhat.com wrote:
 
 
 
  - Original Message -
   From: Anand Avati av...@gluster.org
   To: Pranith Kumar Karampuri pkara...@redhat.com
   Cc: Edward Shishkin edw...@redhat.com, Gluster Devel 
  gluster-devel@gluster.org
   Sent: Wednesday, May 21, 2014 10:53:54 AM
   Subject: Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t
  
   There are a few suspicious things going on here..
  
   On Tue, May 20, 2014 at 10:07 PM, Pranith Kumar Karampuri 
   pkara...@redhat.com wrote:
  
   
  hi,
   crypt.t is failing regression builds once in a while and most
  of
  the times it is because of the failures just after the remount in
  the
  script.
 
  TEST rm -f $M0/testfile-symlink
  TEST rm -f $M0/testfile-link
 
  Both of these are failing with ENOTCONN. I got a chance to look at
  the logs. According to the brick logs, this is what I see:
  [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
  0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
  Transport endpoint is not connected
   
  
   posix_open() happening on a symlink? This should NEVER happen. glusterfs
   itself should NEVER EVER by triggering symlink resolution on the server.
  In
   this case, for whatever reason an open() is attempted on a symlink, and
  it
   is getting followed back onto gluster's own mount point (test case is
   creating an absolute link).
  
   So first find out: who is triggering fop-open() on a symlink. Fix the
   caller.

http://review.gluster.org/7824

  
   Next: add a check in posix_open() to fail with ELOOP or EINVAL if the
  inode
   is a symlink.

http://review.gluster.org/7823

 
  I think I understood what you are saying. Open call for symlink on fuse
  mount lead to an open call again for the target on the same fuse mount.
 
 
 It's not that simple. The client VFS is intelligent enough to resolve
 symlinks and send open() only on non-symlinks. And the test case script was
 doing an obvious unlink() (TEST rm -f filename), so it was not initiated
 by an open() attempt in the first place. My guess is that some xlator
 (probably crypt?) is doing an open() on an inode and that is going through
 unchecked in posix. It is a bug in both the caller and posix, but the
 onus/responsibility is on posix to disallow open() on anything but regular
 files (even open() on character or block devices should not happen in
 posix).
 
 
 
  Which lead to deadlock :). That is why we disallow opens on symlink in
  gluster?
 
 
 That's not just why open on symlink is disallowed in gluster, it is a more
 generic problem of following symlinks in general inside gluster. Symlink
 resolution must strictly happen only in the outermost VFS. Following
 symlinks inside the filesystem is not only an invalid operation, but can
 lead to all kinds of deadlocks, security holes (what if you opened a
 symlink which points to /etc/passwd, should it show the contents of the
 client machine's /etc/passwd or the server? Now what if you wrote to the
 file through the symlink? etc. you get the idea..) and
 wrong/weird/dangerous behaviors. This is not just related to following
 symlinks, even open()ing special devices.. e.g if you create a char device
 file with major/minor number of an audio device and wrote pcm data into it,
 should it play music on the client machine or in the server machine? etc.
 The summary is, following symlinks or opening non-regular files is
 VFS/client operation and are invalid operations in a filesystem context.
 

Now only one question remains. How could it not hang everytime?

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-19 Thread Edward Shishkin
On Sat, 17 May 2014 04:28:45 -0400 (EDT)
Pranith Kumar Karampuri pkara...@redhat.com wrote:

 hi,
  crypt.t is failing regression builds once in a while and most of
 the times it is because of the failures just after the remount in the
 script.
 
 TEST rm -f $M0/testfile-symlink
 TEST rm -f $M0/testfile-link
 
 Both of these are failing with ENOTCONN. I got a chance to look at
 the logs. According to the brick logs, this is what I see:
 [2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open]
 0-patchy-posix: open on /d/backends/patchy1/testfile-symlink:
 Transport endpoint is not connected
 
 This is the very first time I saw posix failing with ENOTCONN. Do we
 have these bricks on some other network mounts? I wonder why it fails
 with ENOTCONN.
 
 I also see that it happens right after a call_bail on the mount.
 
 Pranith

Hello.
OK, I'll try to reproduce it.

Thanks for the report!
Edward.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] spurios failures in tests/encryption/crypt.t

2014-05-17 Thread Pranith Kumar Karampuri
hi,
 crypt.t is failing regression builds once in a while and most of the times 
it is because of the failures just after the remount in the script.

TEST rm -f $M0/testfile-symlink
TEST rm -f $M0/testfile-link

Both of these are failing with ENOTCONN. I got a chance to look at the logs.
According to the brick logs, this is what I see:
[2014-05-17 05:43:43.363979] E [posix.c:2272:posix_open] 0-patchy-posix: open 
on /d/backends/patchy1/testfile-symlink: Transport endpoint is not connected

This is the very first time I saw posix failing with ENOTCONN. Do we have these 
bricks on some other network mounts? I wonder why it fails with ENOTCONN.

I also see that it happens right after a call_bail on the mount.

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel