On 10/07/2013, at 7:59 PM, Rejy M Cyriac wrote: > On 07/10/2013 11:38 AM, Frank Sonntag wrote: >> Hi Greg, >> >> Try using the same server on both machines when mounting, instead of >> mounting off the local gluster server on both. >> I've used the same approach like you in the past and got into all kinds of >> split-brain problems. >> The drawback of course is that mounts will fail if the machine you chose is >> not available at mount time. It's one of my gripes with gluster that you >> cannot list more than one server in your mount command. >> >> Frank > > Would not the mount option 'backupvolfile-server=<secondary server> help > at mount time, in the case of the primary server not being available ? > > - rejy (rmc) I am still on 3.2 which does not have that option (as far as I know). But thanks for bringing this up. Useful to know. And the OP can make use of it of course.
Frank > >> >> >> >> On 10/07/2013, at 5:26 PM, Greg Scott wrote: >> >>> Bummer. Looks like I’m on my own with this one. >>> >>> - Greg >>> >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Greg Scott >>> Sent: Tuesday, July 09, 2013 12:37 PM >>> To: '[email protected]' >>> Subject: Re: [Gluster-users] One node goes offline, the other node can't >>> see the replicated volume anymore >>> >>> No takers? I am running gluster 3.4beta3 that came with Fedora 19. Is >>> my issue a consequence of some kind of quorum split-brain thing? >>> >>> thanks >>> >>> - Greg Scott >>> >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Greg Scott >>> Sent: Monday, July 08, 2013 8:17 PM >>> To: '[email protected]' >>> Subject: [Gluster-users] One node goes offline, the other node can't see >>> the replicated volume anymore >>> >>> I don’t get this. I have a replicated volume and 2 nodes. My challenge >>> is, when I take one node offline, the other node can no longer access the >>> volume until both nodes are back online again. >>> >>> Details: >>> >>> I have 2 nodes, fw1 and fw2. Each node has an XFS file system, >>> /gluster-fw1 on node fw1 and gluster-fw2 no node fw2. Node fw1 is at IP >>> Address 192.168.253.1. Node fw2 is at 192.168.253.2. >>> >>> I create a gluster volume named firewall-scripts which is a replica of >>> those two XFS file systems. The volume holds a bunch of config files >>> common to both fw1 and fw2. The application is an active/standby pair of >>> firewalls and the idea is to keep config files in a gluster volume. >>> >>> When both nodes are online, everything works as expected. But when I take >>> either node offline, node fw2 behaves badly: >>> >>> [root@chicago-fw2 ~]# ls /firewall-scripts >>> ls: cannot access /firewall-scripts: Transport endpoint is not connected >>> >>> And when I bring the offline node back online, node fw2 eventually behaves >>> normally again. >>> >>> What’s up with that? Gluster is supposed to be resilient and self-healing >>> and able to stand up to this sort of abuse. So I must be doing something >>> wrong. >>> >>> Here is how I set up everything – it doesn’t get much simpler than this and >>> my setup is right out the Getting Started Guide but using my own names. >>> >>> Here are the steps I followed, all from fw1: >>> >>> gluster peer probe 192.168.253.2 >>> gluster peer status >>> >>> Create and start the volume: >>> >>> gluster volume create firewall-scripts replica 2 transport tcp >>> 192.168.253.1:/gluster-fw1 192.168.253.2:/gluster-fw2 >>> gluster volume start firewall-scripts >>> >>> On fw1: >>> >>> mkdir /firewall-scripts >>> mount -t glusterfs 192.168.253.1:/firewall-scripts /firewall-scripts >>> >>> and add this line to /etc/fstab: >>> 192.168.253.1:/firewall-scripts /firewall-scripts glusterfs >>> defaults,_netdev 0 0 >>> >>> on fw2: >>> >>> mkdir /firewall-scripts >>> mount -t glusterfs 192.168.253.2:/firewall-scripts /firewall-scripts >>> >>> and add this line to /etc/fstab: >>> 192.168.253.2:/firewall-scripts /firewall-scripts glusterfs >>> defaults,_netdev 0 0 >>> >>> That’s it. That’s the whole setup. When both nodes are online, everything >>> replicates beautifully. But take one node offline and it all falls apart. >>> >>> Here is the output from gluster volume info, identical on both nodes: >>> >>> [root@chicago-fw1 etc]# gluster volume info >>> >>> Volume Name: firewall-scripts >>> Type: Replicate >>> Volume ID: 239b6401-e873-449d-a2d3-1eb2f65a1d4c >>> Status: Started >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 192.168.253.1:/gluster-fw1 >>> Brick2: 192.168.253.2:/gluster-fw2 >>> [root@chicago-fw1 etc]# >>> >>> Looking at /var/log/glusterfs/firewall-scripts.log on fw2, I see errors >>> like this every couple of seconds: >>> >>> [2013-07-09 00:59:04.706390] I [afr-common.c:3856:afr_local_init] >>> 0-firewall-scripts-replicate-0: no subvolumes up >>> [2013-07-09 00:59:04.706515] W [fuse-bridge.c:1132:fuse_err_cbk] >>> 0-glusterfs-fuse: 3160: FLUSH() ERR => -1 (Transport endpoint is not >>> connected) >>> >>> And then when I bring fw1 back online, I see these messages on fw2: >>> >>> [2013-07-09 01:01:35.006782] I [rpc-clnt.c:1648:rpc_clnt_reconfig] >>> 0-firewall-scripts-client-0: changing port to 49152 (from 0) >>> [2013-07-09 01:01:35.006932] W [socket.c:514:__socket_rwv] >>> 0-firewall-scripts-client-0: readv failed (No data available) >>> [2013-07-09 01:01:35.018546] I >>> [client-handshake.c:1658:select_server_supported_programs] >>> 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), >>> Version (330) >>> [2013-07-09 01:01:35.019273] I >>> [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: >>> Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'. >>> [2013-07-09 01:01:35.019356] I >>> [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: >>> Server and Client lk-version numbers are not same, reopening the fds >>> [2013-07-09 01:01:35.019441] I >>> [client-handshake.c:1308:client_post_handshake] >>> 0-firewall-scripts-client-0: 1 fds open - Delaying child_up until they are >>> re-opened >>> [2013-07-09 01:01:35.020070] I >>> [client-handshake.c:930:client_child_up_reopen_done] >>> 0-firewall-scripts-client-0: last fd open'd/lock-self-heal'd - notifying >>> CHILD-UP >>> [2013-07-09 01:01:35.020282] I [afr-common.c:3698:afr_notify] >>> 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came >>> back up; going online. >>> [2013-07-09 01:01:35.020616] I >>> [client-handshake.c:450:client_set_lk_version_cbk] >>> 0-firewall-scripts-client-0: Server lk version = 1 >>> >>> So how do I make glusterfs survive a node failure, which is the whole point >>> of all this? >>> >>> thanks >>> >>> · Greg Scott >>> >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://supercolony.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list [email protected] http://supercolony.gluster.org/mailman/listinfo/gluster-users
