> What linux distro ? > > Anything special about your network configuration ? > > Any chance your server is taking too long to release networking and gluster > is starting before network is ready ? > > Can you completely disable iptables and test again ?
Both nodes are CentOS 6.5 VMs running on VMware ESXi 5.5.0. There is nothing special about network configuration, just static IPs. Ping and ssh works fine. I added "iptables -F" to /etc/rc.local. After simulteneous reboot "gluster peer status" on both nodes is connected and replication works fine. But "gluster volume status" states that NFS server and self-heal daemon on one of them isn't running. So I need to restart glusterd to make them running. Another issue: when everything is OK after "service glusterd restart" on both nodes, I reboot one node and then can see on the rebooted node (ipset02): *[root@ipset02 etc]#* gluster peer status Number of Peers: 1 Hostname: ipset01 Uuid: 6313a4dd-f736-46ff-9836-bdf05c886ffd State: Peer in Cluster (Connected) *[root@ipset02 etc]#* gluster volume status Status of volume: ipset-gv Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick ipset01:/usr/local/etc/ipset 49152 Y 1615 Brick ipset02:/usr/local/etc/ipset 49152 Y 2282 NFS Server on localhost 2049 Y 2289 Self-heal Daemon on localhost N/A Y 2296 NFS Server on ipset01 2049 Y 2258 Self-heal Daemon on ipset01 N/A Y 2262 There are no active volume tasks [root@ipset02 etc]# tail -17 /var/log/glusterfs/glustershd.log [2014-03-26 07:55:48.982456] E [client-handshake.c:1742:client_query_portmap_cbk] 0-ipset-gv-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2014-03-26 07:55:48.982532] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available) [2014-03-26 07:55:48.982555] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected [2014-03-26 07:55:48.982572] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-0: changing port to 49152 (from 0) [2014-03-26 07:55:48.982627] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-0: readv failed (No data available) [2014-03-26 07:55:48.986252] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-03-26 07:55:48.986551] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-0: Connected to 192.168.1.180:49152, attached to remote volume '/usr/local/etc/ipset'. [2014-03-26 07:55:48.986566] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-0: Server and Client lk-version numbers are not same, reopening the fds [2014-03-26 07:55:48.986628] I [afr-common.c:3698:afr_notify] 0-ipset-gv-replicate-0: Subvolume 'ipset-gv-client-0' came back up; going online. [2014-03-26 07:55:48.986743] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-0: Server lk version = 1 [2014-03-26 07:55:52.975670] I [rpc-clnt.c:1676:rpc_clnt_reconfig] 0-ipset-gv-client-1: changing port to 49152 (from 0) [2014-03-26 07:55:52.975717] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (No data available) [2014-03-26 07:55:52.978961] I [client-handshake.c:1659:select_server_supported_programs] 0-ipset-gv-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2014-03-26 07:55:52.979128] I [client-handshake.c:1456:client_setvolume_cbk] 0-ipset-gv-client-1: Connected to 192.168.1.181:49152, attached to remote volume '/usr/local/etc/ipset'. [2014-03-26 07:55:52.979143] I [client-handshake.c:1468:client_setvolume_cbk] 0-ipset-gv-client-1: Server and Client lk-version numbers are not same, reopening the fds [2014-03-26 07:55:52.979269] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ipset-gv-client-1: Server lk version = 1 [2014-03-26 07:55:52.980284] I [afr-self-heald.c:1180:afr_dir_exclusive_crawl] 0-ipset-gv-replicate-0: Another crawl is in progress for ipset-gv-client-1 And on the node that wasn't rebooted: *[root@ipset01 ~]#* gluster peer status Number of Peers: 1 Hostname: ipset02 Uuid: ff14ab0e-53cf-4015-9e49-fb60698c56db State: Peer in Cluster (Disconnected) *[root@ipset01 ~]#* gluster volume status Status of volume: ipset-gv Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick ipset01:/usr/local/etc/ipset 49152 Y 1615 NFS Server on localhost 2049 Y 2258 Self-heal Daemon on localhost N/A Y 2262 There are no active volume tasks [root@ipset01 ~]# tail -3 /var/log/glusterfs/glustershd.log [2014-03-26 07:50:28.881369] W [socket.c:514:__socket_rwv] 0-ipset-gv-client-1: readv failed (Connection reset by peer) [2014-03-26 07:50:28.881421] W [socket.c:1962:__socket_proto_state_machine] 0-ipset-gv-client-1: reading from socket failed. Error (Connection reset by peer), peer (192.168.1.181:49152) [2014-03-26 07:50:28.881463] I [client.c:2097:client_rpc_notify] 0-ipset-gv-client-1: disconnected Howerver, it seems that files replicate fine on both nodes. After "service glusterd restart" on the first node (ipset01) "gluster peer status" is connected. This behavior is strange. > May not be cause of your problems but it does bad things and gluster > sees this as a 'crash' even with graceful shutdown I have no /var/lock/subsys/glusterfsd file too, but there is /var/lock/subsys/glusterd. As far as I know new versions of GlusterFS use glusterd init file instead of glusterfsd. [root@ipset01 etc]# service glusterfsd status glusterfsd (pid 2338) is running... [root@ipset01 etc]# service glusterd stop [ OK ] [root@ipset01 etc]# service glusterd status glusterd dead but subsys locked [root@ipset01 etc]# service glusterfsd status glusterfsd (pid 2338) is running... Is it OK that glusterfsd still running? 2014-03-26 2:16 GMT+04:00 Viktor Villafuerte < viktor.villafue...@optusnet.com.au>: > Also see this bug > https://bugzilla.redhat.com/show_bug.cgi?id=1073217 > > May not be cause of your problems but it does bad things and gluster > sees this as a 'crash' even with graceful shutdown > > v > > > > On Tue 25 Mar 2014 22:24:22, Carlos Capriotti wrote: > > Let's go with the data collection first. > > > > What linux distro ? > > > > Anything special about your network configuration ? > > > > Any chance your server is taking too long to release networking and > gluster > > is starting before network is ready ? > > > > Can you completely disable iptables and test again ? > > > > I am afraid quorum will not help you if you cannot get this issue > > corrected. > > > > > > > > > > On Tue, Mar 25, 2014 at 3:14 PM, Артём Конвалюк <art...@gmail.com> > wrote: > > > > > Hello! > > > > > > I have 2 nodes with GlusterFS 3.4.2. I created one replica volume > using 2 > > > bricks and enabled glusterd autostarts. Also firewall is configured > and I > > > have to run "iptables -F" on nodes after reboot. It is clear that > firewall > > > should be disabled in LAN, but I'm interested in my case. > > > > > > Problem: When I reboot both nodes and run "iptables -F" peer status is > > > still disconnected. I wonder why. After "service glusterd restart" peer > > > status is connected. But I have to run "gluster volume heal > <volume-name>" > > > to make both servers consistent and be able to replicate files. Is > there > > > any way to eliminate this problem? > > > > > > I read about server-quorum, but it needs 3 or more nodes. Am I right? > > > > > > Best Regards, > > > Artem Konvalyuk > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > -- > Regards > > Viktor Villafuerte > Optus Internet Engineering > t: 02 808-25265 >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users