On 07/10/2013 11:38 AM, Frank Sonntag wrote: > Hi Greg, > > Try using the same server on both machines when mounting, instead of mounting > off the local gluster server on both. > I've used the same approach like you in the past and got into all kinds of > split-brain problems. > The drawback of course is that mounts will fail if the machine you chose is > not available at mount time. It's one of my gripes with gluster that you > cannot list more than one server in your mount command. > > Frank
Would not the mount option 'backupvolfile-server=<secondary server> help at mount time, in the case of the primary server not being available ? - rejy (rmc) > > > > On 10/07/2013, at 5:26 PM, Greg Scott wrote: > >> Bummer. Looks like I’m on my own with this one. >> >> - Greg >> >> From: gluster-users-boun...@gluster.org >> [mailto:gluster-users-boun...@gluster.org] On Behalf Of Greg Scott >> Sent: Tuesday, July 09, 2013 12:37 PM >> To: 'gluster-users@gluster.org' >> Subject: Re: [Gluster-users] One node goes offline, the other node can't see >> the replicated volume anymore >> >> No takers? I am running gluster 3.4beta3 that came with Fedora 19. Is my >> issue a consequence of some kind of quorum split-brain thing? >> >> thanks >> >> - Greg Scott >> >> From: gluster-users-boun...@gluster.org >> [mailto:gluster-users-boun...@gluster.org] On Behalf Of Greg Scott >> Sent: Monday, July 08, 2013 8:17 PM >> To: 'gluster-users@gluster.org' >> Subject: [Gluster-users] One node goes offline, the other node can't see the >> replicated volume anymore >> >> I don’t get this. I have a replicated volume and 2 nodes. My challenge is, >> when I take one node offline, the other node can no longer access the volume >> until both nodes are back online again. >> >> Details: >> >> I have 2 nodes, fw1 and fw2. Each node has an XFS file system, >> /gluster-fw1 on node fw1 and gluster-fw2 no node fw2. Node fw1 is at IP >> Address 192.168.253.1. Node fw2 is at 192.168.253.2. >> >> I create a gluster volume named firewall-scripts which is a replica of those >> two XFS file systems. The volume holds a bunch of config files common to >> both fw1 and fw2. The application is an active/standby pair of firewalls >> and the idea is to keep config files in a gluster volume. >> >> When both nodes are online, everything works as expected. But when I take >> either node offline, node fw2 behaves badly: >> >> [root@chicago-fw2 ~]# ls /firewall-scripts >> ls: cannot access /firewall-scripts: Transport endpoint is not connected >> >> And when I bring the offline node back online, node fw2 eventually behaves >> normally again. >> >> What’s up with that? Gluster is supposed to be resilient and self-healing >> and able to stand up to this sort of abuse. So I must be doing something >> wrong. >> >> Here is how I set up everything – it doesn’t get much simpler than this and >> my setup is right out the Getting Started Guide but using my own names. >> >> Here are the steps I followed, all from fw1: >> >> gluster peer probe 192.168.253.2 >> gluster peer status >> >> Create and start the volume: >> >> gluster volume create firewall-scripts replica 2 transport tcp >> 192.168.253.1:/gluster-fw1 192.168.253.2:/gluster-fw2 >> gluster volume start firewall-scripts >> >> On fw1: >> >> mkdir /firewall-scripts >> mount -t glusterfs 192.168.253.1:/firewall-scripts /firewall-scripts >> >> and add this line to /etc/fstab: >> 192.168.253.1:/firewall-scripts /firewall-scripts glusterfs defaults,_netdev >> 0 0 >> >> on fw2: >> >> mkdir /firewall-scripts >> mount -t glusterfs 192.168.253.2:/firewall-scripts /firewall-scripts >> >> and add this line to /etc/fstab: >> 192.168.253.2:/firewall-scripts /firewall-scripts glusterfs defaults,_netdev >> 0 0 >> >> That’s it. That’s the whole setup. When both nodes are online, everything >> replicates beautifully. But take one node offline and it all falls apart. >> >> Here is the output from gluster volume info, identical on both nodes: >> >> [root@chicago-fw1 etc]# gluster volume info >> >> Volume Name: firewall-scripts >> Type: Replicate >> Volume ID: 239b6401-e873-449d-a2d3-1eb2f65a1d4c >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: 192.168.253.1:/gluster-fw1 >> Brick2: 192.168.253.2:/gluster-fw2 >> [root@chicago-fw1 etc]# >> >> Looking at /var/log/glusterfs/firewall-scripts.log on fw2, I see errors like >> this every couple of seconds: >> >> [2013-07-09 00:59:04.706390] I [afr-common.c:3856:afr_local_init] >> 0-firewall-scripts-replicate-0: no subvolumes up >> [2013-07-09 00:59:04.706515] W [fuse-bridge.c:1132:fuse_err_cbk] >> 0-glusterfs-fuse: 3160: FLUSH() ERR => -1 (Transport endpoint is not >> connected) >> >> And then when I bring fw1 back online, I see these messages on fw2: >> >> [2013-07-09 01:01:35.006782] I [rpc-clnt.c:1648:rpc_clnt_reconfig] >> 0-firewall-scripts-client-0: changing port to 49152 (from 0) >> [2013-07-09 01:01:35.006932] W [socket.c:514:__socket_rwv] >> 0-firewall-scripts-client-0: readv failed (No data available) >> [2013-07-09 01:01:35.018546] I >> [client-handshake.c:1658:select_server_supported_programs] >> 0-firewall-scripts-client-0: Using Program GlusterFS 3.3, Num (1298437), >> Version (330) >> [2013-07-09 01:01:35.019273] I >> [client-handshake.c:1456:client_setvolume_cbk] 0-firewall-scripts-client-0: >> Connected to 192.168.253.1:49152, attached to remote volume '/gluster-fw1'. >> [2013-07-09 01:01:35.019356] I >> [client-handshake.c:1468:client_setvolume_cbk] 0-firewall-scripts-client-0: >> Server and Client lk-version numbers are not same, reopening the fds >> [2013-07-09 01:01:35.019441] I >> [client-handshake.c:1308:client_post_handshake] 0-firewall-scripts-client-0: >> 1 fds open - Delaying child_up until they are re-opened >> [2013-07-09 01:01:35.020070] I >> [client-handshake.c:930:client_child_up_reopen_done] >> 0-firewall-scripts-client-0: last fd open'd/lock-self-heal'd - notifying >> CHILD-UP >> [2013-07-09 01:01:35.020282] I [afr-common.c:3698:afr_notify] >> 0-firewall-scripts-replicate-0: Subvolume 'firewall-scripts-client-0' came >> back up; going online. >> [2013-07-09 01:01:35.020616] I >> [client-handshake.c:450:client_set_lk_version_cbk] >> 0-firewall-scripts-client-0: Server lk version = 1 >> >> So how do I make glusterfs survive a node failure, which is the whole point >> of all this? >> >> thanks >> >> · Greg Scott >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users