Re: [Gluster-users] "Too many levels of symbolic links" with glusterfs automounting
One client log file is here: http://goo.gl/FyYfy On the server side, on bs1 & bs4, there is a huge, current nfs.log file (odd since I neither wanted nor configured an nfs export). It is filled entirely with these lines: tail -5 nfs.log [2012-06-19 21:11:54.402567] E [rdma.c:4458:tcp_connect_finish] 0-gl-client-1: tcp connect to failed (Connection refused) [2012-06-19 21:11:54.406023] E [rdma.c:4458:tcp_connect_finish] 0-gl-client-2: tcp connect to failed (Connection refused) [2012-06-19 21:11:54.409486] E [rdma.c:4458:tcp_connect_finish] 0-gl-client-3: tcp connect to failed (Connection refused) [2012-06-19 21:11:54.412822] E [rdma.c:4458:tcp_connect_finish] 0-gl-client-6: tcp connect to 10.2.7.11:24008 failed (Connection refused) [2012-06-19 21:11:54.416231] E [rdma.c:4458:tcp_connect_finish] 0-gl-client-7: tcp connect to 10.2.7.11:24008 failed (Connection refused) on servers bs2, bs3 there is a current, huge log of this line, repeating every 3s: [2012-06-19 21:14:00.907387] I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now I was reminded as I was copying it that the client and servers are slightly different - the client is "3.3.0qa42-1" while the server is "3.3.0-1". Is this enough version skew to cause a difference? There are no other problems that I'm aware of but if it's the case that a slight version skew will be problematic, I'll be careful to keep them exactly aligned. I think this was done since the final release binary did not support the glibc that we were usin gon the compute nodes and the 3.3.0qa42-1 did. Perhaps too sloppy...? gluster volume info Volume Name: gl Type: Distribute Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332 Status: Started Number of Bricks: 8 Transport-type: tcp,rdma Bricks: Brick1: bs2:/raid1 Brick2: bs2:/raid2 Brick3: bs3:/raid1 Brick4: bs3:/raid2 Brick5: bs4:/raid1 Brick6: bs4:/raid2 Brick7: bs1:/raid1 Brick8: bs1:/raid2 Options Reconfigured: performance.io-cache: on performance.quick-read: on performance.io-thread-count: 64 auth.allow: 10.2.*.*,10.1.*.* gluster volume status Status of volume: gl Gluster process PortOnline Pid -- Brick bs2:/raid124009 Y 2908 Brick bs2:/raid224011 Y 2914 Brick bs3:/raid124009 Y 2860 Brick bs3:/raid224011 Y 2866 Brick bs4:/raid124009 Y 2992 Brick bs4:/raid224011 Y 2998 Brick bs1:/raid124013 Y 10122 Brick bs1:/raid224015 Y 10154 NFS Server on localhost 38467 Y 9475 NFS Server on 10.2.7.11 38467 Y 10160 NFS Server on bs2 38467 N N/A NFS Server on bs3 38467 N N/A Hmm sure enough, bs1 and bs4 (localhost in the above info) appear to be running NFS servers, while bs2 & bs3 are not...? OK - after some googling, the gluster nfs serive can be shut off with gluster volume set gl nfs.disable on and now the status looks like this: gluster volume status Status of volume: gl Gluster process PortOnline Pid -- Brick bs2:/raid124009 Y 2908 Brick bs2:/raid224011 Y 2914 Brick bs3:/raid124009 Y 2860 Brick bs3:/raid224011 Y 2866 Brick bs4:/raid124009 Y 2992 Brick bs4:/raid224011 Y 2998 Brick bs1:/raid124013 Y 10122 Brick bs1:/raid224015 Y 10154 hjm On Tue, 2012-06-19 at 13:05 -0700, Anand Avati wrote: > Can you post the complete logs? Is the 'Too many levels of symbolic > links' (or ELOOP) logs seen in the client log or brick logs? > > > Avati > > On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam > wrote: > (Apologies if this already posted, but I recently had to > change smtp servers > which scrambled some list permissions, and I haven't seen it > post) > > I set up a 3.3 gluster volume for another sysadmin and he has > added it > to his cluster via automount. It seems to work initially but > after some > time (days) he is now regularly seeing this warning: > "Too many levels of symbolic links" > when he tries to traverse the mounted filesystems. > > $ df
Re: [Gluster-users] "Too many levels of symbolic links" with glusterfs automounting
Can you post the complete logs? Is the 'Too many levels of symbolic links' (or ELOOP) logs seen in the client log or brick logs? Avati On Tue, Jun 19, 2012 at 11:22 AM, harry mangalam wrote: > (Apologies if this already posted, but I recently had to change smtp > servers > which scrambled some list permissions, and I haven't seen it post) > > I set up a 3.3 gluster volume for another sysadmin and he has added it > to his cluster via automount. It seems to work initially but after some > time (days) he is now regularly seeing this warning: > "Too many levels of symbolic links" > when he tries to traverse the mounted filesystems. > > $ df: `/share/gl': Too many levels of symbolic links > > It's supposed to be mounted on /share/gl with a symlink to /gl > ie: /gl -> /share/gl > > I've been using gluster with static mounts on a cluster and have never > seen this behavior; google does not seem to record anyone else seeing > this with gluster. However, I note that the "Howto Automount GlusterFS" > page at > > http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS > has been deleted. Is automounting no longer supported? > > His auto.master file is as follows (sorry for the wrapping): > >w1 > -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async > 10.1.50.2:/& >w2 > -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async > 10.1.50.3:/& >mathbio > -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async > 10.1.50.2:/& >tw > -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async > 10.1.50.4:/& >shwstore > -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async > shwraid.biomol.uci.edu:/& >djtstore > -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async > djtraid.biomol.uci.edu:/& >djtstore2 > -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async > djtraid2.biomol.uci.edu:/djtraid2:/& >djtstore3 > -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async > djtraid3.biomol.uci.edu:/djtraid3:/& >kevin > -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async > 10.2.255.230:/& >samlab > -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async > 10.2.255.237:/& >new-data > -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async > nas-1-1.ib:/& >gl-fstype=glusterfs > bs1:/& > > > He has never seen this behavior with the other automounted fs's. The > system logs from the affected nodes do not have any gluster strings that > appear to be relevant, but /var/log/glusterfs/share-gl.log ends with > this series of odd lines: > > [2012-06-18 08:57:38.964243] I > [client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server > lk version = 1 > [2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init] > 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 > kernel 7.16 > [2012-06-18 09:16:48.692701] W > [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote > operation failed: Stale NFS file handle. > Path: /tdlong/RILseq/makebam.commands > (90193380-d107-4b6c-b02f-ab53a0f65148) > [2012-06-18 09:16:48.693030] W > [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote > operation failed: Stale NFS file handle. > Path: /tdlong/RILseq/makebam.commands > (90193380-d107-4b6c-b02f-ab53a0f65148) > [2012-06-18 09:16:48.693165] W > [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote > operation failed: Stale NFS file handle. > Path: /tdlong/RILseq/makebam.commands > (90193380-d107-4b6c-b02f-ab53a0f65148) > [2012-06-18 09:16:48.693394] W > [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote > operation failed: Stale NFS file handle. > Path: /tdlong/RILseq/makebam.commands > (90193380-d107-4b6c-b02f-ab53a0f65148) > [2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc] > 0-fuse: unmounting /share/gl > [2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit] > (-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd] > (-->/lib64/libpthread.so.0 [0x382aa0673d] > (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-: > received signum (15), shutting down > > Any hints as to why this is happening? > > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] "Too many levels of symbolic links" with glusterfs automounting
(Apologies if this already posted, but I recently had to change smtp servers which scrambled some list permissions, and I haven't seen it post) I set up a 3.3 gluster volume for another sysadmin and he has added it to his cluster via automount. It seems to work initially but after some time (days) he is now regularly seeing this warning: "Too many levels of symbolic links" when he tries to traverse the mounted filesystems. $ df: `/share/gl': Too many levels of symbolic links It's supposed to be mounted on /share/gl with a symlink to /gl ie: /gl -> /share/gl I've been using gluster with static mounts on a cluster and have never seen this behavior; google does not seem to record anyone else seeing this with gluster. However, I note that the "Howto Automount GlusterFS" page at http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS has been deleted. Is automounting no longer supported? His auto.master file is as follows (sorry for the wrapping): w1 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.2:/& w2 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.3:/& mathbio -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.2:/& tw -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.4:/& shwstore -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async shwraid.biomol.uci.edu:/& djtstore -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async djtraid.biomol.uci.edu:/& djtstore2 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async djtraid2.biomol.uci.edu:/djtraid2:/& djtstore3 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async djtraid3.biomol.uci.edu:/djtraid3:/& kevin -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.2.255.230:/& samlab -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.2.255.237:/& new-data -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async nas-1-1.ib:/& gl-fstype=glusterfs bs1:/& He has never seen this behavior with the other automounted fs's. The system logs from the affected nodes do not have any gluster strings that appear to be relevant, but /var/log/glusterfs/share-gl.log ends with this series of odd lines: [2012-06-18 08:57:38.964243] I [client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server lk version = 1 [2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16 [2012-06-18 09:16:48.692701] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 09:16:48.693030] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 09:16:48.693165] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 09:16:48.693394] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc] 0-fuse: unmounting /share/gl [2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd] (-->/lib64/libpthread.so.0 [0x382aa0673d] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-: received signum (15), shutting down Any hints as to why this is happening? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] "Too many levels of symbolic links" with glusterfs automounting
I set up a 3.3 gluster volume for another sysadmin and he has added it to his cluster via automount. It seems to work initially but after some time (days) he is now regularly seeing this warning: "Too many levels of symbolic links" $ df: `/share/gl': Too many levels of symbolic links when he tries to traverse the mounted filesystems. I've been using gluster with static mounts on a cluster and have never seen this behavior; google does not seem to record anyone else seeing this with gluster. However, I note that the "Howto Automount GlusterFS" page at http://www.gluster.org/community/documentation/index.php/Howto_Automount_GlusterFS has been deleted. Is automounting no longer supported? His auto.master file is as follows (sorry for the wrapping): w1 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.2:/& w2 -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.3:/& mathbio -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.2:/& tw -rw,intr,bg,v3,rsize=16384,wsize=16384,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.1.50.4:/& shwstore -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async shwraid.biomol.uci.edu:/& djtstore -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async djtraid.biomol.uci.edu:/& djtstore2 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async djtraid2.biomol.uci.edu:/djtraid2:/& djtstore3 -rw,intr,bg,v3,rsize=16384,wsize=16384,lock,defaults,noatime,async djtraid3.biomol.uci.edu:/djtraid3:/& kevin -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.2.255.230:/& samlab -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async 10.2.255.237:/& new-data -rw,intr,bg,rsize=65520,wsize=65520,retrans=10,timeo=20,hard,lock,defaults,noatime,async nas-1-1.ib:/& gl-fstype=glusterfs bs1:/& He has never seen this behavior with the other automounted fs's. The system logs from the affected nodes do not have any gluster strings that appear to be relevant, but /var/log/glusterfs/share-gl.log ends with this series of odd lines: [2012-06-18 08:57:38.964243] I [client-handshake.c:453:client_set_lk_version_cbk] 0-gl-client-6: Server lk version = 1 [2012-06-18 08:57:38.964507] I [fuse-bridge.c:3376:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16 [2012-06-18 09:16:48.692701] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 09:16:48.693030] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 09:16:48.693165] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 09:16:48.693394] W [client3_1-fops.c:2630:client3_1_lookup_cbk] 0-gl-client-4: remote operation failed: Stale NFS file handle. Path: /tdlong/RILseq/makebam.commands (90193380-d107-4b6c-b02f-ab53a0f65148) [2012-06-18 10:56:32.756551] I [fuse-bridge.c:4037:fuse_thread_proc] 0-fuse: unmounting /share/gl [2012-06-18 10:56:32.757148] W [glusterfsd.c:816:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x3829ed44bd] (-->/lib64/libpthread.so.0 [0x382aa0673d] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0x17c) [0x40524c]))) 0-: received signum (15), shutting down ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users