Re: [Gluster-users] Need help in understanding volume heal-info behavior
Thank you very much! On Monday 28 April 2014 07:41 AM, Ravishankar N wrote: On 04/28/2014 01:30 AM, Chalcogen wrote: Hi everyone, I have trouble understanding the following behavior: Suppose I have a replica 2 volume 'testvol' on two servers, server1 and server2, composed of server1:/bricks/testvol/brick and server2:/bricks/testvol/brick. Also, suppose it contains a good number of files. Now, assume I remove one of the two bricks, as: root@server1~# gluster volume remove-brick testvol replica 1 server1:/bricks/testvol/brick Now, I unmount and delete the logical volume supporting the brick and then recreate it (with a different size), and mount it the same way as it was mounted before (at /brick/testvol/). Then, I re-add it as: root@server1~# gluster volume add-brick testvol replica 2 server1:/bricks/testvol/brick I observe that the brick on server1 does not contain any of the data that was in the volume. root@server1~# ls /bricks/testvol/brick root@server1~# This is all right by me, since glusterfs needs some time to discover and sync files that are absent on the brick of server1. In fact, if I leave the setup undisturbed for 15 minutes to half an hour, I find that all data appears within the brick of server1, just as you would expect. Also, if I wish to speed up the process, I simply do a ls -Ra on the directory where the volume is mounted, and all files sync onto server1's brick. This is also very much as expected. However, during the period where data on server1's brick is not available, if you query the heal info for the volume, gluster cli reports that 'Number of entries' is '0', and that too all of 'info', 'heal-failed', and 'split-brain'. This is what becomes a bit of a trouble for me. Fact is, we are attempting to automate the monitoring of our glusterfs volumes, and we depend upon heal info alone to decide whether data on server1 and server2 are in sync. Could somebody, therefore, help me with the following questions? a) Which files exactly show up in heal info? The files which are healed either by the self-heal daemon or by the gluster heal commands. b) What exactly should I look to monitor if we are to ascertain that data on our servers are in sync? After adding a new replica brick, you need to run a full heal (gluster volume heal vol-name full). Then the results will show up in the heal info output. Thanks a lot for your responses! Anirban P.s. I am using glusterfs 3.4.2 over linux kernel version 2.6.34. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Command /etc/init.d/glusterd start failed
I have been plagued by errors of this kind every so often, mainly because we are in a development phase and we reboot our servers so frequently. If you start glusterd in debug mode: sh$ glusterd --debug you can easily pinpoint exactly which volume/peer data is causing the initialization failure for mgmt/glusterd. In addition, from my own experiences, two of the leading reasons for failure include: a) Bad peer data if glusterd is somehow killed during an active peer probe operation, and b) I have noticed that if glusterd needs to update info for volume/brick (say info for volume testvol) in /var/lib/glusterd, it first renames /var/lib/glusterd/vols/testvol/info to info.tmp, and then creates a new file info, which is probably written into _freshly_. If glusterd were to crash at this point, it would cause failures in glusterd startup till this is manually resolved. Usually, moving info.tmp into info works for me. Thanks, Anirban On Saturday 12 April 2014 08:45 AM, 吴保川 wrote: It is tcp. [root@server1 wbc]# gluster volume info Volume Name: gv_replica Type: Replicate Volume ID: 81014863-ee59-409b-8897-6485d411d14d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 192.168.1.3:/home/wbc/vdir/gv_replica Brick2: 192.168.1.4:/home/wbc/vdir/gv_replica Volume Name: gv1 Type: Distribute Volume ID: cfe2b8a0-284b-489d-a153-21182933f266 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: 192.168.1.4:/home/wbc/vdir/gv1 Brick2: 192.168.1.3:/home/wbc/vdir/gv1 Thanks, Baochuan Wu 2014-04-12 10:11 GMT+08:00 Nagaprasad Sathyanarayana nsath...@redhat.com mailto:nsath...@redhat.com: If you run # gluster volume info What is the value set for transport-type? Thanks Naga On 12-Apr-2014, at 7:33 am, 吴保川 wildpointe...@gmail.com mailto:wildpointe...@gmail.com wrote: Thanks, Joe. I found one of my machine has been assigned wrong IP address. This leads to the error. Originally, I thought the following error is critical: [2014-04-11 18:12:03.433371] E [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport: /usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open shared object file: No such file or directory 2014-04-12 5:34 GMT+08:00 Joe Julian j...@julianfamily.org mailto:j...@julianfamily.org: On 04/11/2014 11:18 AM, 吴保川 wrote: [2014-04-11 18:12:05.165989] E [glusterd-store.c:2663:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore I'm pretty sure that means that one of the bricks isn't resolved in your list of peers. ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] One node goes offline, the other node loses its connection to its local Gluster volume
I'm not from the glusterfs development team or anything, but I, too started with glusterfs somewhere around the time frame you mention, and also work with a twin-replicated setup just like yours. When I do what you indicate here on my setup, the command initially hangs, and on both servers for about as long as the peer ping timeout thing (which is defaulted at 48 secs or so). After that it works. If we can see new bugs in this setup then I would be interested, in part because the stability of my product depends upon this, too. Do you think you could share your glulster volume info and gluster volume status? Also, what did heal info say before you performed this exercise? Thanks, Anirban On Sunday 23 February 2014 07:14 AM, Greg Scott wrote: We first went down this path back in July 2013 and now I'm back again for more. It's a similar situation but now with new versions of everything. I'm using glusterfs 3.4.2 with Fedora 20. I have 2 nodes named fw1 and fw2. When I ifdown the NIC I'm using for Gluster on either node, that node cannot see its Gluster volume, but the other node can see it after a timeout. As soon as I ifup that NIC, everyone can see everything again. Is this expected behavior? When that interconnect drops, I want both nodes to see their own local copy and then sync everything back up when the interconnect connects again. Here are details. Node fw1 has an XFS filesystem named gluster-fw1. Node fw2 has an XFS filesystem named gluster-fw2. Those are both gluster bricks and both nodes mount the bricks as /firewall-scripts. So anything one node does in /firewall-scripts should also be on the other node within a few milliseconds. The test is to isolate the nodes from each other and see if they can still access their own local copy of /firewall-scripts. The easiest way to do this is to ifdown the interconnect NIC. But this doesn't work. Here is what happens when I ifdown the NIC on node fw1. Node fw2 can see /firewall-scripts but fw1 shows an error. When I ifdown on fw2, the behavior is identical, but swapping fw1 and fw2. On fw1, after an ifdown I lose connection with my Gluster filesystem. [root@stylmark-fw1 firewall-scripts]# ifdown enp5s4 [root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts ls: cannot access /firewall-scripts: Transport endpoint is not connected [root@stylmark-fw1 firewall-scripts]# df -h df: â/firewall-scriptsâ: Transport endpoint is not connected Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.2G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs996M 0 996M 0% /dev/shm tmpfs996M 564K 996M 1% /run tmpfs996M 0 996M 0% /sys/fs/cgroup tmpfs996M 0 996M 0% /tmp /dev/sda2477M 87M 362M 20% /boot /dev/sda1200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw1 9.8G 33M 9.8G 1% /gluster-fw1 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root@stylmark-fw1 firewall-scripts]# But on fw2, I can still look at it: [root@stylmark-fw2 ~]# ls /firewall-scripts allow-all failover-monitor.sh rcfirewall.conf allow-all-with-nat initial_rc.firewall start-failover-monitor.sh etc rc.firewall var [root@stylmark-fw2 ~]# [root@stylmark-fw2 ~]# [root@stylmark-fw2 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.3G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs996M 0 996M 0% /dev/shm tmpfs996M 560K 996M 1% /run tmpfs996M 0 996M 0% /sys/fs/cgroup tmpfs996M 0 996M 0% /tmp /dev/sda2477M 87M 362M 20% /boot /dev/sda1200M 9.6M 191M 5% /boot/efi /dev/mapper/fedora-gluster--fw2 9.8G 33M 9.8G 1% /gluster-fw2 192.168.253.2:/firewall-scripts 9.8G 33M 9.8G 1% /firewall-scripts 10.10.10.2:/fwmaster 214G 75G 128G 37% /mnt/fwmaster [root@stylmark-fw2 ~]# And back to fw1 -- after an ifup, I can see it again: [root@stylmark-fw1 firewall-scripts]# ifup enp5s4 [root@stylmark-fw1 firewall-scripts]# [root@stylmark-fw1 firewall-scripts]# ls /firewall-scripts allow-all failover-monitor.sh rcfirewall.conf allow-all-with-nat initial_rc.firewall start-failover-monitor.sh etc rc.firewall var [root@stylmark-fw1 firewall-scripts]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/fedora-root 17G 2.2G 14G 14% / devtmpfs 989M 0 989M 0% /dev tmpfs996M 0 996M 0% /dev/shm tmpfs
[Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems
Hi everybody, This is more of a part of a larger wishlist: I found out that when a peer probe is performed by the user, mgmt/glusterd write a file named after the hostname of the peer in question. On successful probes, this file is replaced with a file named after the UUID of the glusterd instance on the peer, while a failed probe causes the temp file to simply get deleted. Here's an illustration: root@someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host [1] 25918 root@someserver:/var/lib/glusterd/peers] cat some_non_host uuid=---- state=0 hostname1=ksome_non_host root@someserver:/var/lib/glusterd/peers] root@someserver:/var/lib/glusterd/peers] peer probe: failed: Probe returned with unknown errno 107 [1]+ Exit 1 gluster peer probe some_non_host root@someserver:/var/lib/glusterd/peers] ls root@someserver:/var/lib/glusterd/peers] Here's the deal. When, for some reason, glulsterd is killed off before it get a chance to clean up on the temp file (say for a peer that really doesn't exist), and then, if you reboot your machine, the temporary file will really break mgmt/glusterd's recovery graph, and glusterd will be unable to initialize any of the existing volumes without having to delete the tmp file manually. It seems to me that mgmt/glusterd should have the intelligence to distinguish between a genuine peer and a temp file created during probe. The temp file should not affect the recovery graph after reboot. Something like a peer-name.tmp? Preferably, also delete any temp file discovered during recovery at startup? I reported a bug over this at bugzilla. Its https://bugzilla.redhat.com/show_bug.cgi?id=1067733. Thanks, Anirban ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] File (setuid) permission changes during volume heal - possible bug?
Hi, I am working on a twin-replicated setup (server1 and server2) with glusterfs 3.4.0. I perform the following steps: 1. Create a distributed volume 'testvol' with the XFS brick server1:/brick/testvol on server1, and mount it using the glusterfs native client at /testvol. 2. I copy the following file to /testvol: server1:~$ ls -l /bin/su -rw*s*r-xr-x 1 root root 84742 Jan 17 2014 /bin/su server1:~$ cp -a /bin/su /testvol 3. Within /testvol if I list out the file I just copied, I find its attributes intact. 4. Now, I add the XFS brick server2:/brick/testvol. server2:~$ gluster volume add-brick testvol replica 2 server2:/brick/testvol At this point, heal kicks in and the file is replicated on server 2. 5. If I list out su in testvol on either server now, now, this is what I see. server1:~$ ls -l /testvol/su -rw*s*r-xr-x 1 root root 84742 Jan 17 2014 /bin/su server2:~$ ls -l /testvol/su -rw*x*r-xr-x 1 root root 84742 Jan 17 2014 /bin/su That is, the 's' file mode gets changed to plain 'x' - meaning, all the attributes are not preserved upon heal completion. Would you consider this a bug? Is the behavior different on a higher release? Thanks a lot. Anirban ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Passing noforget option to glusterfs native client mounts
Hi everybody, A few months back I joined a project where people want to replace their legacy fuse-based (twin-server) replicated file-system with GlusterFS. They also have a high-availability NFS server code tagged with the kernel NFSD that they would wish to retain (the nfs-kernel-server, I mean). The reason they wish to retain the kernel NFS and not use the NFS server that comes with GlusterFS is mainly because there's this bit of code that allows NFS IP's to be migrated from one host server to the other in the case that one happens to go down, and tweaks on the export server configuration allow the file-handles to remain identical on the new host server. The solution was to mount gluster volumes using the mount.glusterfs native client program and then export the directories over the kernel NFS server. This seems to work most of the time, but on rare occasions, 'stale file handle' is reported off certain clients, which really puts a damper over the 'high-availability' thing. After suitably instrumenting the nfsd/fuse code in the kernel, it seems that decoding of the file-handle fails on the server because the inode record corresponding to the nodeid in the handle cannot be looked up. Combining this with the fact that a second attempt by the client to execute lookup on the same file passes, one might suspect that the problem is identical to what many people attempting to export fuse mounts over the kernel's NFS server are facing; viz, fuse 'forgets' the inode records thereby causing ilookup5() to fail. Miklos and other fuse developers/hackers would point towards '-o noforget' while mounting their fuse file-systems. I tried passing '-o noforget' to mount.glusterfs, but it does not seem to recognize it. Could somebody help me out with the correct syntax to pass noforget to gluster volumes? Or, something we could pass to glusterfs that would instruct fuse to allocate a bigger cache for our inodes? Additionally, should you think that something else might be behind our problems, please do let me know. Here's my configuration: Linux kernel version: 2.6.34.12 GlusterFS versionn: 3.4.0 nfs.disable option for volumes: OFF on all volumes Thanks a lot for your time! Anirban P.s. I found quite a few pages on the web that admonish users that GlusterFS is not compatible with the kernel NFS server, but do not really give much detail. Is this one of the reasons for saying so? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Passing noforget option to glusterfs native client mounts
P.s. I think I need to clarify this: I am only reading from the mounts, and not modifying anything on the server. and so the commonest causes on stale file handles do not appy. Anirban On Thursday 19 December 2013 01:16 AM, Chalcogen wrote: Hi everybody, A few months back I joined a project where people want to replace their legacy fuse-based (twin-server) replicated file-system with GlusterFS. They also have a high-availability NFS server code tagged with the kernel NFSD that they would wish to retain (the nfs-kernel-server, I mean). The reason they wish to retain the kernel NFS and not use the NFS server that comes with GlusterFS is mainly because there's this bit of code that allows NFS IP's to be migrated from one host server to the other in the case that one happens to go down, and tweaks on the export server configuration allow the file-handles to remain identical on the new host server. The solution was to mount gluster volumes using the mount.glusterfs native client program and then export the directories over the kernel NFS server. This seems to work most of the time, but on rare occasions, 'stale file handle' is reported off certain clients, which really puts a damper over the 'high-availability' thing. After suitably instrumenting the nfsd/fuse code in the kernel, it seems that decoding of the file-handle fails on the server because the inode record corresponding to the nodeid in the handle cannot be looked up. Combining this with the fact that a second attempt by the client to execute lookup on the same file passes, one might suspect that the problem is identical to what many people attempting to export fuse mounts over the kernel's NFS server are facing; viz, fuse 'forgets' the inode records thereby causing ilookup5() to fail. Miklos and other fuse developers/hackers would point towards '-o noforget' while mounting their fuse file-systems. I tried passing '-o noforget' to mount.glusterfs, but it does not seem to recognize it. Could somebody help me out with the correct syntax to pass noforget to gluster volumes? Or, something we could pass to glusterfs that would instruct fuse to allocate a bigger cache for our inodes? Additionally, should you think that something else might be behind our problems, please do let me know. Here's my configuration: Linux kernel version: 2.6.34.12 GlusterFS versionn: 3.4.0 nfs.disable option for volumes: OFF on all volumes Thanks a lot for your time! Anirban P.s. I found quite a few pages on the web that admonish users that GlusterFS is not compatible with the kernel NFS server, but do not really give much detail. Is this one of the reasons for saying so? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users