Re: [Gluster-users] "Too many levels of symbolic links" with glusterfs automounting

2012-06-19 Thread harry mangalam
One client log file is here:
http://goo.gl/FyYfy

On the server side, on bs1 & bs4, there is a huge, current nfs.log file
(odd since I neither wanted nor configured an nfs export).  It is filled
entirely with these lines:
 tail -5 nfs.log
[2012-06-19 21:11:54.402567] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-1: tcp connect to  failed (Connection refused)
[2012-06-19 21:11:54.406023] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-2: tcp connect to  failed (Connection refused)
[2012-06-19 21:11:54.409486] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-3: tcp connect to  failed (Connection refused)
[2012-06-19 21:11:54.412822] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-6: tcp connect to 10.2.7.11:24008 failed (Connection
refused)
[2012-06-19 21:11:54.416231] E [rdma.c:4458:tcp_connect_finish]
0-gl-client-7: tcp connect to 10.2.7.11:24008 failed (Connection
refused)

on servers bs2, bs3 there is a current, huge log of this line, repeating
every 3s:

[2012-06-19 21:14:00.907387] I [socket.c:1798:socket_event_handler]
0-transport: disconnecting now



I was reminded as I was copying it that the client and servers are
slightly different - the client is "3.3.0qa42-1" while the server is
"3.3.0-1".  Is this enough version skew to cause a difference?  There
are no other problems that I'm aware of but if it's the case that a
slight version skew will be problematic, I'll be careful to keep them
exactly aligned.  I think this was done since the final release binary
did not support the glibc that we were usin gon the compute nodes and
the 3.3.0qa42-1 did.  Perhaps too sloppy...?

gluster volume info
 
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Bricks:
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*

gluster volume status
Status of volume: gl
Gluster process PortOnline
Pid
--
Brick bs2:/raid124009   Y
2908
Brick bs2:/raid224011   Y
2914
Brick bs3:/raid124009   Y
2860
Brick bs3:/raid224011   Y
2866
Brick bs4:/raid124009   Y
2992
Brick bs4:/raid224011   Y
2998
Brick bs1:/raid124013   Y
10122
Brick bs1:/raid224015   Y
10154
NFS Server on localhost 38467   Y
9475
NFS Server on 10.2.7.11 38467   Y
10160
NFS Server on bs2   38467   N
N/A
NFS Server on bs3   38467   N
N/A


Hmm sure enough, bs1 and bs4 (localhost in the above info) appear to be
running NFS servers, while bs2 & bs3 are not...?  

OK - after some googling, the gluster nfs serive can be shut off with 
gluster volume set gl nfs.disable on

and now the status looks like this:

gluster volume status
Status of volume: gl
Gluster process PortOnline
Pid
--
Brick bs2:/raid124009   Y
2908
Brick bs2:/raid224011   Y
2914
Brick bs3:/raid124009   Y
2860
Brick bs3:/raid224011   Y
2866
Brick bs4:/raid124009   Y
2992
Brick bs4:/raid224011   Y
2998
Brick bs1:/raid124013   Y
10122
Brick bs1:/raid224015   Y
10154



hjm

On Tue, 2012-06-19 at 13:05 -0700, Anand Avati wrote:
> Can you post the complete logs? Is the 'Too many levels of symbolic
> links' (or ELOOP) logs seen in the client log or brick logs?
> 
> 
> Avati
> 
> On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam
>  wrote:
> (Apologies if this already posted, but I recently had to
> change smtp servers
> which scrambled some list permissions, and I haven't seen it
> post)
> 
> I set up a 3.3 gluster volume for another sysadmin and he has
> added it
> to his cluster via automount.  It seems to work initially but
> after some
> time (days) he is now regularly seeing this warning:
> "Too many levels of symbolic links"
> when he tries to traverse the mounted filesystems.
> 
> $ df

Re: [Gluster-users] "Too many levels of symbolic links" with glusterfs automounting

2012-06-19 Thread Anand Avati
Can you post the complete logs? Is the 'Too many levels of symbolic links'
(or ELOOP) logs seen in the client log or brick logs?

Avati

On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam wrote:

> (Apologies if this already posted, but I recently had to change smtp
> servers
> which scrambled some list permissions, and I haven't seen it post)
>
> I set up a 3.3 gluster volume for another sysadmin and he has added it
> to his cluster via automount.  It seems to work initially but after some
> time (days) he is now regularly seeing this warning:
> "Too many levels of symbolic links"
> when he tries to traverse the mounted filesystems.
>
> $ df: `/share/gl': Too many levels of symbolic links
>
> It's supposed to be mounted on /share/gl with a symlink to /gl
> ie:  /gl -> /share/gl
>
> I've been using gluster with static mounts on a cluster and have never
> seen this behavior; google does not seem to record anyone else seeing
> this with gluster. However, I note that the "Howto Automount GlusterFS"
> page at
>
> http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
> has been deleted. Is automounting no longer supported?
>
> His auto.master file is as follows (sorry for the wrapping):
>
>w1
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  10.1.50.2:/&
>w2
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  10.1.50.3:/&
>mathbio
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  10.1.50.2:/&
>tw
> -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  10.1.50.4:/&
>shwstore
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> shwraid.biomol.uci.edu:/&
>djtstore
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> djtraid.biomol.uci.edu:/&
>djtstore2
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> djtraid2.biomol.uci.edu:/djtraid2:/&
>djtstore3
> -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
> djtraid3.biomol.uci.edu:/djtraid3:/&
>kevin
> -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  10.2.255.230:/&
>samlab
> -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  10.2.255.237:/&
>new-data
> -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
>  nas-1-1.ib:/&
>gl-fstype=glusterfs
> bs1:/&
>
>
> He has never seen this behavior with the other automounted fs's.  The
> system logs from the affected nodes do not have any gluster strings that
> appear to be relevant, but /var/log/glusterfs/share-gl.log ends with
> this series of odd lines:
>
> [2012-06-18 08:57:38.964243] I
> [client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server
> lk version = 1
> [2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
> kernel 7.16
> [2012-06-18 09:16:48.692701] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 09:16:48.693030] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 09:16:48.693165] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 09:16:48.693394] W
> [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
> operation failed: Stale NFS file handle.
> Path: /tdlong/RILseq/makebam.commands
> (90193380-d107-4b6c-b02f-ab53a0f65148)
> [2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc]
> 0-fuse: unmounting /share/gl
> [2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit]
> (-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
> (-->/lib64/libpthread.so.0 [0x382aa0673d]
> (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-:
> received signum (15), shutting down
>
> Any hints as to why this is happening?
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] "Too many levels of symbolic links" with glusterfs automounting

2012-06-19 Thread harry mangalam
(Apologies if this already posted, but I recently had to change smtp servers 
which scrambled some list permissions, and I haven't seen it post)

I set up a 3.3 gluster volume for another sysadmin and he has added it
to his cluster via automount.  It seems to work initially but after some
time (days) he is now regularly seeing this warning:
"Too many levels of symbolic links"
when he tries to traverse the mounted filesystems.

$ df: `/share/gl': Too many levels of symbolic links

It's supposed to be mounted on /share/gl with a symlink to /gl
ie:  /gl -> /share/gl

I've been using gluster with static mounts on a cluster and have never
seen this behavior; google does not seem to record anyone else seeing
this with gluster. However, I note that the "Howto Automount GlusterFS"
page at 
http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
has been deleted. Is automounting no longer supported?

His auto.master file is as follows (sorry for the wrapping):

w1
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/&
w2
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.3:/&
mathbio
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/&
tw
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.4:/&
shwstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
shwraid.biomol.uci.edu:/&
djtstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid.biomol.uci.edu:/&
djtstore2
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid2.biomol.uci.edu:/djtraid2:/&
djtstore3
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid3.biomol.uci.edu:/djtraid3:/&
kevin
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.230:/&
samlab
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.237:/&
new-data
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  nas-1-1.ib:/&
gl-fstype=glusterfs
bs1:/&


He has never seen this behavior with the other automounted fs's.  The
system logs from the affected nodes do not have any gluster strings that
appear to be relevant, but /var/log/glusterfs/share-gl.log ends with
this series of odd lines:

[2012-06-18 08:57:38.964243] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server
lk version = 1
[2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
kernel 7.16
[2012-06-18 09:16:48.692701] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693030] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693165] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693394] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc]
0-fuse: unmounting /share/gl
[2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
(-->/lib64/libpthread.so.0 [0x382aa0673d]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-:
received signum (15), shutting down

Any hints as to why this is happening?




___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] "Too many levels of symbolic links" with glusterfs automounting

2012-06-18 Thread harry mangalam
I set up a 3.3 gluster volume for another sysadmin and he has added it
to his cluster via automount.  It seems to work initially but after some
time (days) he is now regularly seeing this warning:
"Too many levels of symbolic links"

$ df: `/share/gl': Too many levels of symbolic links

when he tries to traverse the mounted filesystems.

I've been using gluster with static mounts on a cluster and have never
seen this behavior; google does not seem to record anyone else seeing
this with gluster. However, I note that the "Howto Automount GlusterFS"
page at 
http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS
has been deleted. Is automounting no longer supported?

His auto.master file is as follows (sorry for the wrapping):

w1
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/&
w2
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.3:/&
mathbio
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.2:/&
tw
-rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.1.50.4:/&
shwstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
shwraid.biomol.uci.edu:/&
djtstore
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid.biomol.uci.edu:/&
djtstore2
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid2.biomol.uci.edu:/djtraid2:/&
djtstore3
-rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async
djtraid3.biomol.uci.edu:/djtraid3:/&
kevin
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.230:/&
samlab
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  10.2.255.237:/&
new-data
-rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async
  nas-1-1.ib:/&
gl-fstype=glusterfs
bs1:/&


He has never seen this behavior with the other automounted fs's.  The
system logs from the affected nodes do not have any gluster strings that
appear to be relevant, but /var/log/glusterfs/share-gl.log ends with
this series of odd lines:

[2012-06-18 08:57:38.964243] I
[client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server
lk version = 1
[2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init]
0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13
kernel 7.16
[2012-06-18 09:16:48.692701] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693030] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693165] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 09:16:48.693394] W
[client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote
operation failed: Stale NFS file handle.
Path: /tdlong/RILseq/makebam.commands
(90193380-d107-4b6c-b02f-ab53a0f65148)
[2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc]
0-fuse: unmounting /share/gl
[2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit]
(-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd]
(-->/lib64/libpthread.so.0 [0x382aa0673d]
(-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-:
received signum (15), shutting down





___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users