Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-24 Thread Andreas
Sure I am. Unfortunately it didn't change the result...

# killall glusterd
# ps -ef | grep gluster
root 15755   657  0 18:35 ttyS000:00:00 grep gluster
# rm /var/lib/glusterd/peers/*  
#  /usr/sbin/glusterd -p /var/run/glusterd.pid
# gluster peer probe 10.32.1.144
#
(I killed glusterd and removed the files on both servers.)

Regards
Andreas


On 03/24/15 05:36, Atin Mukherjee wrote:
 If you are okay to do a fresh set up I would recommend you to clean up
 /var/lib/glusterd/peers/* and then restart glusterd in both the nodes
 and then try peer probing.

 ~Atin

 On 03/23/2015 06:44 PM, Andreas wrote:
 Hi,

 # gluster peer detach 10.32.1.144
 (No output here. Similar to the problem with 'gluster peer probe'.)
 # gluster peer detach 10.32.1.144 force
 peer detach: failed: Peer is already being detached from cluster.
 Check peer status by running gluster peer status
 # gluster peer status
 Number of Peers: 1

 Hostname: 10.32.1.144
 Uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429
 State: Probe Sent to Peer (Connected)

 # ping 10.32.1.144
 PING 10.32.1.144 (10.32.1.144): 56 data bytes
 64 bytes from 10.32.1.144: seq=0 ttl=64 time=1.811 ms
 64 bytes from 10.32.1.144: seq=1 ttl=64 time=1.834 ms
 ^C
 --- 10.32.1.144 ping statistics ---
 2 packets transmitted, 2 packets received, 0% packet loss
 round-trip min/avg/max = 1.811/1.822/1.834 ms


 As previously stated, this problem seems to be similar to what I experienced 
 with
 'gluster peer probe'. I can reboot the server, but the situation will be the 
 same
 (I've tried this many times).
 Any ideas of which ports to investigate and how to do it to get the most 
 reliable result?
 Anything else that could cause this?



 Regards
 Andreas


 On 03/23/15 11:10, Atin Mukherjee wrote:
 On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
 2Hi,

 This network problem is persistent. However, I can ping the server so 
 guess it
 depends on the port no, right?
 I tried to telnet to port 24007, but I was not sure how to interpret the 
 result as I
 got no respons and no timeout (it just seemed to be waiting for something).
 That's why I decided to install nmap, but according to that tool the port 
 was
 accessible. Are there any other ports that are vital to gluster peer probe?

 When you say 'deprobe', I guess you mean 'gluster peer detach'? That 
 command shows
 similar behaviour to gluster peer probe.
 Yes I meant peer detach. How about gluster peer detach force?

 Regards
 Andreas

 On 03/23/15 05:34, Atin Mukherjee wrote:
 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: 
 Maximum allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: 
 Using /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 
 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I 
 [glusterd-handler.c:3165:glusterd_probe_begin] 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I 
 [glusterd-handler.c:3098:glusterd_friend_add] 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I 
 [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
 nagement: generated UUID: 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-24 Thread Andreas
Hi,

The problem seems to be solved now(!). We discovered that the global options 
file
(/var/lib/glusterd/options) generated an error in the log file:
 [1970-01-01 00:01:24.423024] E 
 [glusterd-utils.c:5760:glusterd_compare_friend_da
 ta] 0-management: Importing global options failed
For some reason the file was missing previously, but that didn't cause any major
problems to glusterd (except for an error message about the missing file). 
However,
when an empty options file was created to get rid of that error message, this 
new
message appeared which seems to be more serious than the previous. When the 
contents
'global-option-version=0' was added to that options file, all those error 
messages
disappeared and 'gluster peer probe' started to work as expected again. Not that
obvious, at least not for me.

Anyway, thanks for your efforts in trying to solve this problem.


Regards
Andreas

On 03/24/15 10:32, Andreas wrote:
 Sure I am. Unfortunately it didn't change the result...

 # killall glusterd
 # ps -ef | grep gluster
 root 15755   657  0 18:35 ttyS000:00:00 grep gluster
 # rm /var/lib/glusterd/peers/*  
 #  /usr/sbin/glusterd -p /var/run/glusterd.pid
 # gluster peer probe 10.32.1.144
 #
 (I killed glusterd and removed the files on both servers.)

 Regards
 Andreas


 On 03/24/15 05:36, Atin Mukherjee wrote:
 If you are okay to do a fresh set up I would recommend you to clean up
 /var/lib/glusterd/peers/* and then restart glusterd in both the nodes
 and then try peer probing.

 ~Atin

 On 03/23/2015 06:44 PM, Andreas wrote:
 Hi,

 # gluster peer detach 10.32.1.144
 (No output here. Similar to the problem with 'gluster peer probe'.)
 # gluster peer detach 10.32.1.144 force
 peer detach: failed: Peer is already being detached from cluster.
 Check peer status by running gluster peer status
 # gluster peer status
 Number of Peers: 1

 Hostname: 10.32.1.144
 Uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429
 State: Probe Sent to Peer (Connected)

 # ping 10.32.1.144
 PING 10.32.1.144 (10.32.1.144): 56 data bytes
 64 bytes from 10.32.1.144: seq=0 ttl=64 time=1.811 ms
 64 bytes from 10.32.1.144: seq=1 ttl=64 time=1.834 ms
 ^C
 --- 10.32.1.144 ping statistics ---
 2 packets transmitted, 2 packets received, 0% packet loss
 round-trip min/avg/max = 1.811/1.822/1.834 ms


 As previously stated, this problem seems to be similar to what I 
 experienced with
 'gluster peer probe'. I can reboot the server, but the situation will be 
 the same
 (I've tried this many times).
 Any ideas of which ports to investigate and how to do it to get the most 
 reliable result?
 Anything else that could cause this?



 Regards
 Andreas


 On 03/23/15 11:10, Atin Mukherjee wrote:
 On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
 2Hi,

 This network problem is persistent. However, I can ping the server so 
 guess it
 depends on the port no, right?
 I tried to telnet to port 24007, but I was not sure how to interpret the 
 result as I
 got no respons and no timeout (it just seemed to be waiting for 
 something).
 That's why I decided to install nmap, but according to that tool the port 
 was
 accessible. Are there any other ports that are vital to gluster peer 
 probe?

 When you say 'deprobe', I guess you mean 'gluster peer detach'? That 
 command shows
 similar behaviour to gluster peer probe.
 Yes I meant peer detach. How about gluster peer detach force?
 Regards
 Andreas

 On 03/23/15 05:34, Atin Mukherjee wrote:
 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: 
 Maximum allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: 
 Using /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 
 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-23 Thread Andreas
Hi,

# gluster peer detach 10.32.1.144
(No output here. Similar to the problem with 'gluster peer probe'.)
# gluster peer detach 10.32.1.144 force
peer detach: failed: Peer is already being detached from cluster.
Check peer status by running gluster peer status
# gluster peer status
Number of Peers: 1

Hostname: 10.32.1.144
Uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429
State: Probe Sent to Peer (Connected)

# ping 10.32.1.144
PING 10.32.1.144 (10.32.1.144): 56 data bytes
64 bytes from 10.32.1.144: seq=0 ttl=64 time=1.811 ms
64 bytes from 10.32.1.144: seq=1 ttl=64 time=1.834 ms
^C
--- 10.32.1.144 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 1.811/1.822/1.834 ms


As previously stated, this problem seems to be similar to what I experienced 
with
'gluster peer probe'. I can reboot the server, but the situation will be the 
same
(I've tried this many times).
Any ideas of which ports to investigate and how to do it to get the most 
reliable result?
Anything else that could cause this?



Regards
Andreas


On 03/23/15 11:10, Atin Mukherjee wrote:

 On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
 2Hi,

 This network problem is persistent. However, I can ping the server so guess 
 it
 depends on the port no, right?
 I tried to telnet to port 24007, but I was not sure how to interpret the 
 result as I
 got no respons and no timeout (it just seemed to be waiting for something).
 That's why I decided to install nmap, but according to that tool the port was
 accessible. Are there any other ports that are vital to gluster peer probe?

 When you say 'deprobe', I guess you mean 'gluster peer detach'? That command 
 shows
 similar behaviour to gluster peer probe.
 Yes I meant peer detach. How about gluster peer detach force?



 Regards
 Andreas

 On 03/23/15 05:34, Atin Mukherjee wrote:
 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: 
 Maximum allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 
 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I 
 [glusterd-handler.c:3165:glusterd_probe_begin] 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I 
 [glusterd-handler.c:3098:glusterd_friend_add] 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I 
 [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
 nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
 [2015-03-19 13:42:02.273427] I 
 [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
 anagement: Received probe resp from uuid: 
 82cdb873-28cc-4ed0-8cfe-2b6275770429,
 host: 10.32.1.144
 [2015-03-19 13:42:02.273681] I 
 [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
 lusterd: Received resp to probe req
 [2015-03-19 13:42:02.278863] I 
 [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
 versions_ack] 0-management: using the op-version 30600
 [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: 
 bailing
 out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 
 13:42:02.273482.
  timeout = 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-23 Thread Andreas Hollaus
2Hi,

This network problem is persistent. However, I can ping the server so guess it
depends on the port no, right?
I tried to telnet to port 24007, but I was not sure how to interpret the result 
as I
got no respons and no timeout (it just seemed to be waiting for something).
That's why I decided to install nmap, but according to that tool the port was
accessible. Are there any other ports that are vital to gluster peer probe?

When you say 'deprobe', I guess you mean 'gluster peer detach'? That command 
shows
similar behaviour to gluster peer probe.


Regards
Andreas

On 03/23/15 05:34, Atin Mukherjee wrote:

 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum 
 allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I 
 [glusterd-handler.c:3165:glusterd_probe_begin] 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I [glusterd-handler.c:3098:glusterd_friend_add] 
 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 
 0-ma
 nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
 [2015-03-19 13:42:02.273427] I [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 
 0-m
 anagement: Received probe resp from uuid: 
 82cdb873-28cc-4ed0-8cfe-2b6275770429,
 host: 10.32.1.144
 [2015-03-19 13:42:02.273681] I [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 
 0-g
 lusterd: Received resp to probe req
 [2015-03-19 13:42:02.278863] I 
 [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
 versions_ack] 0-management: using the op-version 30600
 [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: 
 bailing
 out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 
 13:42:02.273482.
  timeout = 600 for 10.32.1.144:24007
 Here is the issue, there was some problem in the network at the time
 when peer probe was issued. This is why the call bail is seen. Could you
 try to deprobe and then probe it back again?
 [2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 
 0-socket.mana
 gement: not connected (priv-connected = 255)
 [2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 
 0-rpc-servi
 ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 
 2,
 Proc: 1) to rpc-transport (socket.management)
 [2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 
 0-:
 Reply submission failed



 Logs from 10.32.1.144:
 -
 # more ./.cmd_log_history

 # more ./etc-glusterfs-glusterd.vol.log
 [1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum 
 allo
 wed open file descriptors set to 65536
 [1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-23 Thread Andreas
Hi,

Thanks, but no firewall involved in my distro.

FYI:
# netstat -tan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address State  
tcp0  0 0.0.0.0:24007   0.0.0.0:*   LISTEN


Regards
Andreas

On 03/23/15 14:21, JF Le Fillâtre wrote:
 Hello,

 If you're running RHEL 7 or CentOS 7, what is the state of firewalld on
 your systems?

 firewall-cmd --list-all-zones

 Thanks,
 JF


 On 23/03/15 14:14, Andreas wrote:
 Hi,

 # gluster peer detach 10.32.1.144
 (No output here. Similar to the problem with 'gluster peer probe'.)
 # gluster peer detach 10.32.1.144 force
 peer detach: failed: Peer is already being detached from cluster.
 Check peer status by running gluster peer status
 # gluster peer status
 Number of Peers: 1

 Hostname: 10.32.1.144
 Uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429
 State: Probe Sent to Peer (Connected)

 # ping 10.32.1.144
 PING 10.32.1.144 (10.32.1.144): 56 data bytes
 64 bytes from 10.32.1.144: seq=0 ttl=64 time=1.811 ms
 64 bytes from 10.32.1.144: seq=1 ttl=64 time=1.834 ms
 ^C
 --- 10.32.1.144 ping statistics ---
 2 packets transmitted, 2 packets received, 0% packet loss
 round-trip min/avg/max = 1.811/1.822/1.834 ms


 As previously stated, this problem seems to be similar to what I experienced 
 with
 'gluster peer probe'. I can reboot the server, but the situation will be the 
 same
 (I've tried this many times).
 Any ideas of which ports to investigate and how to do it to get the most 
 reliable result?
 Anything else that could cause this?



 Regards
 Andreas


 On 03/23/15 11:10, Atin Mukherjee wrote:
 On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
 2Hi,

 This network problem is persistent. However, I can ping the server so 
 guess it
 depends on the port no, right?
 I tried to telnet to port 24007, but I was not sure how to interpret the 
 result as I
 got no respons and no timeout (it just seemed to be waiting for something).
 That's why I decided to install nmap, but according to that tool the port 
 was
 accessible. Are there any other ports that are vital to gluster peer probe?

 When you say 'deprobe', I guess you mean 'gluster peer detach'? That 
 command shows
 similar behaviour to gluster peer probe.
 Yes I meant peer detach. How about gluster peer detach force?

 Regards
 Andreas

 On 03/23/15 05:34, Atin Mukherjee wrote:
 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: 
 Maximum allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: 
 Using /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 
 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I 
 [glusterd-handler.c:3165:glusterd_probe_begin] 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I 
 [glusterd-handler.c:3098:glusterd_friend_add] 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I 
 [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
 nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
 [2015-03-19 13:42:02.273427] I 
 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-23 Thread Atin Mukherjee


On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
 2Hi,
 
 This network problem is persistent. However, I can ping the server so guess it
 depends on the port no, right?
 I tried to telnet to port 24007, but I was not sure how to interpret the 
 result as I
 got no respons and no timeout (it just seemed to be waiting for something).
 That's why I decided to install nmap, but according to that tool the port was
 accessible. Are there any other ports that are vital to gluster peer probe?
 
 When you say 'deprobe', I guess you mean 'gluster peer detach'? That command 
 shows
 similar behaviour to gluster peer probe.
Yes I meant peer detach. How about gluster peer detach force?
 
 
 Regards
 Andreas
 
 On 03/23/15 05:34, Atin Mukherjee wrote:

 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum 
 allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I 
 [glusterd-handler.c:3165:glusterd_probe_begin] 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I 
 [glusterd-handler.c:3098:glusterd_friend_add] 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 
 0-ma
 nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
 [2015-03-19 13:42:02.273427] I 
 [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
 anagement: Received probe resp from uuid: 
 82cdb873-28cc-4ed0-8cfe-2b6275770429,
 host: 10.32.1.144
 [2015-03-19 13:42:02.273681] I 
 [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
 lusterd: Received resp to probe req
 [2015-03-19 13:42:02.278863] I 
 [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
 versions_ack] 0-management: using the op-version 30600
 [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: 
 bailing
 out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 
 13:42:02.273482.
  timeout = 600 for 10.32.1.144:24007
 Here is the issue, there was some problem in the network at the time
 when peer probe was issued. This is why the call bail is seen. Could you
 try to deprobe and then probe it back again?
 [2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 
 0-socket.mana
 gement: not connected (priv-connected = 255)
 [2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 
 0-rpc-servi
 ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, 
 ProgVers: 2,
 Proc: 1) to rpc-transport (socket.management)
 [2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 
 0-:
 Reply submission failed



 Logs from 10.32.1.144:
 -
 # more ./.cmd_log_history

 # more ./etc-glusterfs-glusterd.vol.log
 [1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum 
 allo
 wed open file descriptors 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-23 Thread Atin Mukherjee
If you are okay to do a fresh set up I would recommend you to clean up
/var/lib/glusterd/peers/* and then restart glusterd in both the nodes
and then try peer probing.

~Atin

On 03/23/2015 06:44 PM, Andreas wrote:
 Hi,
 
 # gluster peer detach 10.32.1.144
 (No output here. Similar to the problem with 'gluster peer probe'.)
 # gluster peer detach 10.32.1.144 force
 peer detach: failed: Peer is already being detached from cluster.
 Check peer status by running gluster peer status
 # gluster peer status
 Number of Peers: 1
 
 Hostname: 10.32.1.144
 Uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429
 State: Probe Sent to Peer (Connected)
 
 # ping 10.32.1.144
 PING 10.32.1.144 (10.32.1.144): 56 data bytes
 64 bytes from 10.32.1.144: seq=0 ttl=64 time=1.811 ms
 64 bytes from 10.32.1.144: seq=1 ttl=64 time=1.834 ms
 ^C
 --- 10.32.1.144 ping statistics ---
 2 packets transmitted, 2 packets received, 0% packet loss
 round-trip min/avg/max = 1.811/1.822/1.834 ms
 
 
 As previously stated, this problem seems to be similar to what I experienced 
 with
 'gluster peer probe'. I can reboot the server, but the situation will be the 
 same
 (I've tried this many times).
 Any ideas of which ports to investigate and how to do it to get the most 
 reliable result?
 Anything else that could cause this?
 
 
 
 Regards
 Andreas
 
 
 On 03/23/15 11:10, Atin Mukherjee wrote:

 On 03/23/2015 03:28 PM, Andreas Hollaus wrote:
 2Hi,

 This network problem is persistent. However, I can ping the server so guess 
 it
 depends on the port no, right?
 I tried to telnet to port 24007, but I was not sure how to interpret the 
 result as I
 got no respons and no timeout (it just seemed to be waiting for something).
 That's why I decided to install nmap, but according to that tool the port 
 was
 accessible. Are there any other ports that are vital to gluster peer probe?

 When you say 'deprobe', I guess you mean 'gluster peer detach'? That 
 command shows
 similar behaviour to gluster peer probe.
 Yes I meant peer detach. How about gluster peer detach force?
 
 

 Regards
 Andreas

 On 03/23/15 05:34, Atin Mukherjee wrote:
 On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,

 I hope that these are the logs that you requested.

 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1

 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: 
 Maximum allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 
 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I 
 [glusterd-handler.c:3165:glusterd_probe_begin] 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I 
 [glusterd-handler.c:3098:glusterd_friend_add] 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I 
 [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
 nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
 [2015-03-19 13:42:02.273427] I 
 [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
 anagement: Received probe resp from uuid: 
 82cdb873-28cc-4ed0-8cfe-2b6275770429,
 host: 10.32.1.144
 [2015-03-19 13:42:02.273681] I 
 [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
 lusterd: Received resp to probe req
 [2015-03-19 13:42:02.278863] I 
 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-22 Thread Andreas Hollaus
Hi,

I hope that these are the logs that you requested.

Logs from 10.32.0.48:
--
# more /var/log/glusterfs/.cmd_log_history
[2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe returned
 with unknown errno -1

# more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
[2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s
bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/
glusterd -p /var/run/glusterd.pid)
[2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum allo
wed open file descriptors set to 65536
[2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using /var/l
ib/glusterd as working directory
[2015-03-19 13:41:31.247826] I [glusterd-store.c:2063:glusterd_restore_op_versio
n] 0-management: Detected new install. Setting op-version to maximum : 30600
[2015-03-19 13:41:31.247902] I [glusterd-store.c:3497:glusterd_store_retrieve_mi
ssed_snaps_list] 0-management: No missed snaps list.
Final graph:
+--+
  1: volume management
  2: type mgmt/glusterd
  3: option rpc-auth.auth-glusterfs on
  4: option rpc-auth.auth-unix on
  5: option rpc-auth.auth-null on
  6: option transport.socket.listen-backlog 128
  7: option ping-timeout 30
  8: option transport.socket.read-fail-log off
  9: option transport.socket.keepalive-interval 2
 10: option transport.socket.keepalive-time 10
 11: option transport-type socket
 12: option working-directory /var/lib/glusterd
 13: end-volume
 14: 
+--+
[2015-03-19 13:42:02.258403] I [glusterd-handler.c:1015:__glusterd_handle_cli_pr
obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
[2015-03-19 13:42:02.259456] I [glusterd-handler.c:3165:glusterd_probe_begin] 0-
glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
[2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 0-manag
ement: setting frame-timeout to 600
[2015-03-19 13:42:02.260488] I [glusterd-handler.c:3098:glusterd_friend_add] 0-m
anagement: connect returned 0
[2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 0-ma
nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
[2015-03-19 13:42:02.273427] I [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 0-m
anagement: Received probe resp from uuid: 82cdb873-28cc-4ed0-8cfe-2b6275770429,
host: 10.32.1.144
[2015-03-19 13:42:02.273681] I [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 0-g
lusterd: Received resp to probe req
[2015-03-19 13:42:02.278863] I [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
versions_ack] 0-management: using the op-version 30600
[2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: bailing
out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 13:42:02.273482.
 timeout = 600 for 10.32.1.144:24007
[2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 0-socket.mana
gement: not connected (priv-connected = 255)
[2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 0-rpc-servi
ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2,
Proc: 1) to rpc-transport (socket.management)
[2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 0-:
Reply submission failed



Logs from 10.32.1.144:
-
# more ./.cmd_log_history

# more ./etc-glusterfs-glusterd.vol.log
[1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/s
bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: /usr/sbin/
glusterd -p /var/run/glusterd.pid)
[1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum allo
wed open file descriptors set to 65536
[1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using /var/l
ib/glusterd as working directory
[1970-01-01 00:00:53.231653] I [glusterd-store.c:2063:glusterd_restore_op_versio
n] 0-management: Detected new install. Setting op-version to maximum : 30600
[1970-01-01 00:00:53.231730] I [glusterd-store.c:3497:glusterd_store_retrieve_mi
ssed_snaps_list] 0-management: No missed snaps list.
Final graph:
+--+
  1: volume management
  2: type mgmt/glusterd
  3: option rpc-auth.auth-glusterfs on
  4: option rpc-auth.auth-unix on
  5: option rpc-auth.auth-null on
  6: option transport.socket.listen-backlog 128
  7: option ping-timeout 30
  8: option transport.socket.read-fail-log off
  9: option transport.socket.keepalive-interval 2
 10: option transport.socket.keepalive-time 10
 11: option transport-type socket
 12: option working-directory /var/lib/glusterd
 13: end-volume
 14: 

Re: [Gluster-users] gluster peer probe error (v3.6.2)

2015-03-22 Thread Atin Mukherjee


On 03/22/2015 07:11 PM, Andreas Hollaus wrote:
 Hi,
 
 I hope that these are the logs that you requested.
 
 Logs from 10.32.0.48:
 --
 # more /var/log/glusterfs/.cmd_log_history
 [2015-03-19 13:52:03.277438]  : peer probe 10.32.1.144 : FAILED : Probe 
 returned
  with unknown errno -1
 
 # more /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
 [2015-03-19 13:41:31.241768] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [2015-03-19 13:41:31.245352] I [glusterd.c:1214:init] 0-management: Maximum 
 allo
 wed open file descriptors set to 65536
 [2015-03-19 13:41:31.245432] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory
 [2015-03-19 13:41:31.247826] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 30600
 [2015-03-19 13:41:31.247902] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option transport.socket.listen-backlog 128
   7: option ping-timeout 30
   8: option transport.socket.read-fail-log off
   9: option transport.socket.keepalive-interval 2
  10: option transport.socket.keepalive-time 10
  11: option transport-type socket
  12: option working-directory /var/lib/glusterd
  13: end-volume
  14: 
 +--+
 [2015-03-19 13:42:02.258403] I 
 [glusterd-handler.c:1015:__glusterd_handle_cli_pr
 obe] 0-glusterd: Received CLI probe req 10.32.1.144 24007
 [2015-03-19 13:42:02.259456] I [glusterd-handler.c:3165:glusterd_probe_begin] 
 0-
 glusterd: Unable to find peerinfo for host: 10.32.1.144 (24007)
 [2015-03-19 13:42:02.259664] I [rpc-clnt.c:969:rpc_clnt_connection_init] 
 0-manag
 ement: setting frame-timeout to 600
 [2015-03-19 13:42:02.260488] I [glusterd-handler.c:3098:glusterd_friend_add] 
 0-m
 anagement: connect returned 0
 [2015-03-19 13:42:02.270316] I [glusterd.c:176:glusterd_uuid_generate_save] 
 0-ma
 nagement: generated UUID: 4441e237-89d6-4cdf-a212-f17ecb953b58
 [2015-03-19 13:42:02.273427] I [glusterd-rpc-ops.c:244:__glusterd_probe_cbk] 
 0-m
 anagement: Received probe resp from uuid: 
 82cdb873-28cc-4ed0-8cfe-2b6275770429,
 host: 10.32.1.144
 [2015-03-19 13:42:02.273681] I [glusterd-rpc-ops.c:386:__glusterd_probe_cbk] 
 0-g
 lusterd: Received resp to probe req
 [2015-03-19 13:42:02.278863] I 
 [glusterd-handshake.c:1119:__glusterd_mgmt_hndsk_
 versions_ack] 0-management: using the op-version 30600
 [2015-03-19 13:52:03.277422] E [rpc-clnt.c:201:call_bail] 0-management: 
 bailing
 out frame type(Peer mgmt) op(--(2)) xid = 0x6 sent = 2015-03-19 
 13:42:02.273482.
  timeout = 600 for 10.32.1.144:24007
Here is the issue, there was some problem in the network at the time
when peer probe was issued. This is why the call bail is seen. Could you
try to deprobe and then probe it back again?
 [2015-03-19 13:52:03.277453] I [socket.c:3366:socket_submit_reply] 
 0-socket.mana
 gement: not connected (priv-connected = 255)
 [2015-03-19 13:52:03.277468] E [rpcsvc.c:1247:rpcsvc_submit_generic] 
 0-rpc-servi
 ce: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 
 2,
 Proc: 1) to rpc-transport (socket.management)
 [2015-03-19 13:52:03.277483] E [glusterd-utils.c:387:glusterd_submit_reply] 
 0-:
 Reply submission failed
 
 
 
 Logs from 10.32.1.144:
 -
 # more ./.cmd_log_history
 
 # more ./etc-glusterfs-glusterd.vol.log
 [1970-01-01 00:00:53.225739] I [MSGID: 100030] [glusterfsd.c:2018:main] 
 0-/usr/s
 bin/glusterd: Started running /usr/sbin/glusterd version 3.6.2 (args: 
 /usr/sbin/
 glusterd -p /var/run/glusterd.pid)
 [1970-01-01 00:00:53.229222] I [glusterd.c:1214:init] 0-management: Maximum 
 allo
 wed open file descriptors set to 65536
 [1970-01-01 00:00:53.229301] I [glusterd.c:1259:init] 0-management: Using 
 /var/l
 ib/glusterd as working directory
 [1970-01-01 00:00:53.231653] I 
 [glusterd-store.c:2063:glusterd_restore_op_versio
 n] 0-management: Detected new install. Setting op-version to maximum : 30600
 [1970-01-01 00:00:53.231730] I 
 [glusterd-store.c:3497:glusterd_store_retrieve_mi
 ssed_snaps_list] 0-management: No missed snaps list.
 Final graph:
 +--+
   1: volume management
   2: type mgmt/glusterd
   3: option rpc-auth.auth-glusterfs on
   4: option rpc-auth.auth-unix on
   5: option rpc-auth.auth-null on
   6: option