[Gluster-users] Too many levels of symbolic links with glusterfs automounting

2012-06-19 Thread harry mangalam
(Apologies if this already posted, but I recently had to change smtp servers 
which scrambled some list permissions, and I haven't seen it post)

I set up a 3.3 gluster volume for another sysadmin and he has added it
to his cluster via automount.  It seems to work initially but after some
time (days) he is now regularly seeing this warning:
Too many levels of symbolic links
when he tries to traverse the mounted filesystems.

$ df: `/share/gl': Too many levels of symbolic links

It's supposed to be mounted on /share/gl with a symlink to /gl
ie:  /gl - /share/gl

I've been using gluster with static mounts on a cluster and have never
seen this behavior; google does not seem to record anyone else seeing
this with gluster. However, I note that the Howto Automount GlusterFS
page at 
http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
has been deleted. Is automounting no longer supported?

His auto.master file is as follows (sorry for the wrapping):

w1
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/
w2
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.3:/
mathbio
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/
tw
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.4:/
shwstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
shwraid.biomol.uci.edu:/
djtstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid.biomol.uci.edu:/
djtstore2
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid2.biomol.uci.edu:/djtraid2:/
djtstore3
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid3.biomol.uci.edu:/djtraid3:/
kevin
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.230:/
samlab
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.237:/
new-data
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  nas-1-1.ib:/
gl-fstype=glusterfs
bs1:/


He has never seen this behavior with the other automounted fs's.  The
system logs from the affected nodes do not have any gluster strings that
appear to be relevant, but /var/log/glusterfs/share-gl.log ends with
this series of odd lines:

[2012-06-18 08:57:38.964243] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server
lk version = 1
[2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
kernel 7.16
[2012-06-18 09:16:48.692701] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693030] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693165] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693394] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc]
0-fuse: unmounting /share/gl
[2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit]
(--/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
(--/lib64/libpthread.so.0 [0x382aa0673d]
(--/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-:
received signum (15), shutting down

Any hints as to why this is happening?




___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Too many levels of symbolic links with glusterfs automounting

2012-06-19 Thread Anand Avati
Can you post the complete logs? Is the 'Too many levels of symbolic links'
(or ELOOP) logs seen in the client log or brick logs?

Avati

On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam hjmanga...@gmail.comwrote:

 (Apologies if this already posted, but I recently had to change smtp
 servers
 which scrambled some list permissions, and I haven't seen it post)

 I set up a 3.3 gluster volume for another sysadmin and he has added it
 to his cluster via automount.  It seems to work initially but after some
 time (days) he is now regularly seeing this warning:
 Too many levels of symbolic links
 when he tries to traverse the mounted filesystems.

 $ df: `/share/gl': Too many levels of symbolic links

 It's supposed to be mounted on /share/gl with a symlink to /gl
 ie:  /gl - /share/gl

 I've been using gluster with static mounts on a cluster and have never
 seen this behavior; google does not seem to record anyone else seeing
 this with gluster. However, I note that the Howto Automount GlusterFS
 page at

 http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
 has been deleted. Is automounting no longer supported?

 His auto.master file is as follows (sorry for the wrapping):

w1
 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/
w2
 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.3:/
mathbio
 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/
tw
 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.4:/
shwstore
 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
 shwraid.biomol.uci.edu:/
djtstore
 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
 djtraid.biomol.uci.edu:/
djtstore2
 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
 djtraid2.biomol.uci.edu:/djtraid2:/
djtstore3
 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
 djtraid3.biomol.uci.edu:/djtraid3:/
kevin
 -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.230:/
samlab
 -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.237:/
new-data
 -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  nas-1-1.ib:/
gl-fstype=glusterfs
 bs1:/


 He has never seen this behavior with the other automounted fs's.  The
 system logs from the affected nodes do not have any gluster strings that
 appear to be relevant, but /var/log/glusterfs/share-gl.log ends with
 this series of odd lines:

 [2012-06-18 08:57:38.964243] I
 [client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server
 lk version = 1
 [2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
 kernel 7.16
 [2012-06-18 09:16:48.692701] W
 [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
 operation failed: Stale NFS file handle.
 Path: /tdlong/RILseq/makebam.commands
 (90193380-d107-4b6c-b02f-ab53a0f65148)
 [2012-06-18 09:16:48.693030] W
 [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
 operation failed: Stale NFS file handle.
 Path: /tdlong/RILseq/makebam.commands
 (90193380-d107-4b6c-b02f-ab53a0f65148)
 [2012-06-18 09:16:48.693165] W
 [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
 operation failed: Stale NFS file handle.
 Path: /tdlong/RILseq/makebam.commands
 (90193380-d107-4b6c-b02f-ab53a0f65148)
 [2012-06-18 09:16:48.693394] W
 [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
 operation failed: Stale NFS file handle.
 Path: /tdlong/RILseq/makebam.commands
 (90193380-d107-4b6c-b02f-ab53a0f65148)
 [2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc]
 0-fuse: unmounting /share/gl
 [2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit]
 (--/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
 (--/lib64/libpthread.so.0 [0x382aa0673d]
 (--/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-:
 received signum (15), shutting down

 Any hints as to why this is happening?




 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Too many levels of symbolic links with glusterfs automounting

2012-06-19 Thread harry mangalam
One client log file is here:
http://goo.gl/FyYfy

On the server side, on bs1  bs4, there is a huge, current nfs.log file
(odd since I neither wanted nor configured an nfs export).  It is filled
entirely with these lines:
 tail -5 nfs.log
[2012-06-19 21:11:54.402567] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-1: tcp connect to  failed (Connection refused)
[2012-06-19 21:11:54.406023] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-2: tcp connect to  failed (Connection refused)
[2012-06-19 21:11:54.409486] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-3: tcp connect to  failed (Connection refused)
[2012-06-19 21:11:54.412822] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-6: tcp connect to 10.2.7.11:24008 failed (Connection
refused)
[2012-06-19 21:11:54.416231] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-7: tcp connect to 10.2.7.11:24008 failed (Connection
refused)

on servers bs2, bs3 there is a current, huge log of this line, repeating
every 3s:

[2012-06-19 21:14:00.907387] I [socket.c:1798:socket_event_handler]
0-transport: disconnecting now



I was reminded as I was copying it that the client and servers are
slightly different - the client is 3.3.0qa42-1 while the server is
3.3.0-1.  Is this enough version skew to cause a difference?  There
are no other problems that I'm aware of but if it's the case that a
slight version skew will be problematic, I'll be careful to keep them
exactly aligned.  I think this was done since the final release binary
did not support the glibc that we were usin gon the compute nodes and
the 3.3.0qa42-1 did.  Perhaps too sloppy...?

gluster volume info
 
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Bricks:
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*

gluster volume status
Status of volume: gl
Gluster process PortOnline
Pid
--
Brick bs2:/raid124009   Y
2908
Brick bs2:/raid224011   Y
2914
Brick bs3:/raid124009   Y
2860
Brick bs3:/raid224011   Y
2866
Brick bs4:/raid124009   Y
2992
Brick bs4:/raid224011   Y
2998
Brick bs1:/raid124013   Y
10122
Brick bs1:/raid224015   Y
10154
NFS Server on localhost 38467   Y
9475
NFS Server on 10.2.7.11 38467   Y
10160
NFS Server on bs2   38467   N
N/A
NFS Server on bs3   38467   N
N/A


Hmm sure enough, bs1 and bs4 (localhost in the above info) appear to be
running NFS servers, while bs2  bs3 are not...?  

OK - after some googling, the gluster nfs serive can be shut off with 
gluster volume set gl nfs.disable on

and now the status looks like this:

gluster volume status
Status of volume: gl
Gluster process PortOnline
Pid
--
Brick bs2:/raid124009   Y
2908
Brick bs2:/raid224011   Y
2914
Brick bs3:/raid124009   Y
2860
Brick bs3:/raid224011   Y
2866
Brick bs4:/raid124009   Y
2992
Brick bs4:/raid224011   Y
2998
Brick bs1:/raid124013   Y
10122
Brick bs1:/raid224015   Y
10154



hjm

On Tue, 2012-06-19 at 13:05 -0700, Anand Avati wrote:
 Can you post the complete logs? Is the 'Too many levels of symbolic
 links' (or ELOOP) logs seen in the client log or brick logs?
 
 
 Avati
 
 On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam
 hjmanga...@gmail.com wrote:
 (Apologies if this already posted, but I recently had to
 change smtp servers
 which scrambled some list permissions, and I haven't seen it
 post)
 
 I set up a 3.3 gluster volume for another sysadmin and he has
 added it
 to his cluster via automount.  It seems to work initially but
 after some
 time (days) he is now regularly seeing this warning:
 Too many levels of symbolic links
 when he tries to traverse the mounted filesystems.
 
 $ df: 

[Gluster-users] Too many levels of symbolic links with glusterfs automounting

2012-06-18 Thread harry mangalam
I set up a 3.3 gluster volume for another sysadmin and he has added it
to his cluster via automount.  It seems to work initially but after some
time (days) he is now regularly seeing this warning:
Too many levels of symbolic links

$ df: `/share/gl': Too many levels of symbolic links

when he tries to traverse the mounted filesystems.

I've been using gluster with static mounts on a cluster and have never
seen this behavior; google does not seem to record anyone else seeing
this with gluster. However, I note that the Howto Automount GlusterFS
page at 
http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
has been deleted. Is automounting no longer supported?

His auto.master file is as follows (sorry for the wrapping):

w1
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/
w2
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.3:/
mathbio
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/
tw
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.4:/
shwstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
shwraid.biomol.uci.edu:/
djtstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid.biomol.uci.edu:/
djtstore2
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid2.biomol.uci.edu:/djtraid2:/
djtstore3
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid3.biomol.uci.edu:/djtraid3:/
kevin
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.230:/
samlab
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.237:/
new-data
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  nas-1-1.ib:/
gl-fstype=glusterfs
bs1:/


He has never seen this behavior with the other automounted fs's.  The
system logs from the affected nodes do not have any gluster strings that
appear to be relevant, but /var/log/glusterfs/share-gl.log ends with
this series of odd lines:

[2012-06-18 08:57:38.964243] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server
lk version = 1
[2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
kernel 7.16
[2012-06-18 09:16:48.692701] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693030] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693165] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693394] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc]
0-fuse: unmounting /share/gl
[2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit]
(--/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
(--/lib64/libpthread.so.0 [0x382aa0673d]
(--/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-:
received signum (15), shutting down





___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users