Brian, I'm not ready to give up just yet.  

>From Rejy:

>  Would not the mount option 'backupvolfile-server=<secondary server> help 
> at mount time, in the case of the primary server not being available ?

Hmmm - this seems to be a step in the right direction.  On both nodes I did:

umount /firewall-scripts

Then on fw1:

[root@chicago-fw1 gregs]# mount -t glusterfs -o 
backupvolfile-server=192.168.253.2 192.168.253.1:/firewall-scripts 
/firewall-scripts

And on fw2:

[root@chicago-fw2 ~]#  mount -t glusterfs -o backupvolfile-server=192.168.253.1 
192.168.253.2:/firewall-scripts /firewall-scripts

For the test I just ran,  each node still uses its local copy first.  For my 
application, I'm not super concerned about conflicts between one directory and 
the other because my /firewall-scripts directory will be read-mostly when this 
is in production.  And as part of my startup, the node with the lowest IP 
Address takes itself offline for a few  seconds so the other node detects it's 
down and can assume the primary role.  That's what put me on to this Gluster 
behavior in the first place - fw2 could not find its script to take control 
even though a copy of it was sitting right there on its local disk. 

Anyway, this time with the file system mounted as above, I took fw1 offline and 
from fw2 did, "ls /firewall-scripts".  This time fw2 waited several seconds and 
then showed me the directory listing instead of blowing up with an error.   
Which seems strange to me since I told fw2 that fw1 is its backupvolfile-server 
and fw1 went offline.  So the behavior is definitely not intuitive.  

One other detail that may be relevant - I take fw1 offline by inserting a 
firewall rule that does a REJECT on that interface.  That probably explains the 
"Connection refused" message in the log extract below.   I can try a different 
test, changing the rule to DROP so it really really is offline and see what 
happens.

The log on fw2 looks a little different this time.  This tail was taken after 
doing an ls from fw2.  Pranith - is this the log you mean?  If so, I can do the 
tests again and keep a tail -f in a different window when the other node goes 
offline, so we catch the messages right at that event.  Will this be helpful?  
I can send tarballs of the whole log file, but it's huge and finding the key 
messages seems like a needle in a haystack.  

[root@chicago-fw2 ~]# tail /var/log/glusterfs/firewall-scripts.log -f
[2013-07-10 10:37:59.446481] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (Connection reset by peer)
[2013-07-10 10:37:59.446558] W [socket.c:1962:__socket_proto_state_machine] 
0-firewall-scripts-client-0: reading from socket failed. Error (Connection 
reset by peer), peer (192.168.253.1:49152)
[2013-07-10 10:37:59.447322] E [rpc-clnt.c:368:saved_frames_unwind] 
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x48) [0x7f8974409b78] 
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb8) [0x7f8974408028] 
(-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f8974407f4e]))) 
0-firewall-scripts-client-0: forced unwinding frame type(GlusterFS 3.3) 
op(LOOKUP(27)) called at 2013-07-10 10:37:33.563280 (xid=0x24x)
[2013-07-10 10:37:59.447378] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 
0-firewall-scripts-client-0: remote operation failed: Transport endpoint is not 
connected. Path: / (00000000-0000-0000-0000-000000000001)
[2013-07-10 10:37:59.447716] E [rpc-clnt.c:368:saved_frames_unwind] 
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x48) [0x7f8974409b78] 
(-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb8) [0x7f8974408028] 
(-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f8974407f4e]))) 
0-firewall-scripts-client-0: forced unwinding frame type(GlusterFS Handshake) 
op(PING(3)) called at 2013-07-10 10:37:35.949434 (xid=0x25x)
[2013-07-10 10:37:59.447754] W [client-handshake.c:276:client_ping_cbk] 
0-firewall-scripts-client-0: timer must have expired
[2013-07-10 10:37:59.447821] I [client.c:2097:client_rpc_notify] 
0-firewall-scripts-client-0: disconnected
[2013-07-10 10:38:09.963388] E [socket.c:2157:socket_connect_finish] 
0-firewall-scripts-client-0: connection to 192.168.253.1:24007 failed 
(Connection refused)
[2013-07-10 10:38:09.963493] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:38:19.988428] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:38:53.044399] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:38:54.999683] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:38:58.010774] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:04.028362] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:07.033038] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:10.044094] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:16.060406] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:19.066521] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:22.077600] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:25.088684] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:28.099805] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:31.110840] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:34.121921] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:37.133003] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:40.144084] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:43.155168] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:46.166228] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:49.177270] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:52.188359] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
[2013-07-10 10:39:55.199451] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-0: readv failed (No data available)
^C

And the log from fw1 looks like this:

[root@chicago-fw1 gregs]# tail /var/log/glusterfs/firewall-scripts.log -f
[2013-07-10 10:36:19.708342] I [client-handshake.c:1456:client_setvolume_cbk] 
0-firewall-scripts-client-1: Connected to 192.168.253.2:49152, attached to 
remote volume '/gluster-fw2'.
[2013-07-10 10:36:19.708372] I [client-handshake.c:1468:client_setvolume_cbk] 
0-firewall-scripts-client-1: Server and Client lk-version numbers are not same, 
reopening the fds
[2013-07-10 10:36:19.720679] I [fuse-bridge.c:4723:fuse_graph_setup] 0-fuse: 
switched to graph 0
[2013-07-10 10:36:19.721049] I 
[client-handshake.c:450:client_set_lk_version_cbk] 0-firewall-scripts-client-1: 
Server lk version = 1
[2013-07-10 10:36:19.721291] I [fuse-bridge.c:3680:fuse_init] 0-glusterfs-fuse: 
FUSE inited with protocol versions: glusterfs 7.13 kernel 7.21
[2013-07-10 10:36:19.722390] I 
[afr-common.c:2057:afr_set_root_inode_on_first_lookup] 
0-firewall-scripts-replicate-0: added root inode
[2013-07-10 10:36:19.723259] I [afr-common.c:2120:afr_discovery_cbk] 
0-firewall-scripts-replicate-0: selecting local read_child 
firewall-scripts-client-0
[2013-07-10 10:37:47.242308] W [socket.c:514:__socket_rwv] 
0-firewall-scripts-client-1: readv failed (Connection timed out)
[2013-07-10 10:37:47.242385] W [socket.c:1962:__socket_proto_state_machine] 
0-firewall-scripts-client-1: reading from socket failed. Error (Connection 
timed out), peer (192.168.253.2:49152)
[2013-07-10 10:37:47.242462] I [client.c:2097:client_rpc_notify] 
0-firewall-scripts-client-1: disconnected
^C
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to