Re: [Gluster-users] Crossover cable: single point of failure?
Hi Whit, Thanks for your reply. I do know that it's not the Gluster-standard thing to use a crossover link. (Seems to me it's the obvious best way to do it, but it's not a configuration they're committed to.) It's possible that if you were doing your replication over the LAN rather than the crossover that Gluster would handle a disconnected system better. Might be worth testing. It is still the same, even if no crossover cable is used and all traffic goes through an ethernet switch. The client can't write to the gluster volume anymore. I discovered that the NFS volume seems to be read-only in this state: client01:~# rm debian-6.0.1a-i386-DVD-1.iso rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file system So all traffic goes through one interface (NFS to the client, glusterfs replication, corosync). I can reproduce the issue with the NFS client on VMware ESXi and with the NFS client on my Linux desktop. My config: Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Regards, Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Daniel, Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? Gluster does not return EROFS in any of the cases you described. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. Avati On Tue, Jun 14, 2011 at 12:58 PM, Daniel Manser dan...@clienta.ch wrote: Hi Whit, Thanks for your reply. I do know that it's not the Gluster-standard thing to use a crossover link. (Seems to me it's the obvious best way to do it, but it's not a configuration they're committed to.) It's possible that if you were doing your replication over the LAN rather than the crossover that Gluster would handle a disconnected system better. Might be worth testing. It is still the same, even if no crossover cable is used and all traffic goes through an ethernet switch. The client can't write to the gluster volume anymore. I discovered that the NFS volume seems to be read-only in this state: client01:~# rm debian-6.0.1a-i386-DVD-1.iso rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file system So all traffic goes through one interface (NFS to the client, glusterfs replication, corosync). I can reproduce the issue with the NFS client on VMware ESXi and with the NFS client on my Linux desktop. My config: Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Regards, Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Hi Thanks for your reply. Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? I was able to delete files on the server. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. I set it to 5 seconds, but the result is still the same. Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Options Reconfigured: network.ping-timeout: 5 Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
On Tue, Jun 14, 2011 at 2:51 AM, Daniel Manser dan...@clienta.ch wrote: Hi Thanks for your reply. Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? I was able to delete files on the server. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. I set it to 5 seconds, but the result is still the same. It will be good to get to bottom of this. Do you see any errors in server logs? Is it possible to do the same test with no vmware in between, just using baremetals? Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Options Reconfigured: network.ping-timeout: 5 Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
I disconnected the crossover (replication) link again and it happened again. When I re-connect it afterwards, it takes some seconds and Gluster NFS works again. If this behavior is normal, then the replication link becomes a single point of failure. Any suggestions? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
On Mon, Jun 13, 2011 at 11:04:18AM +0100, Daniel Manser wrote: I disconnected the crossover (replication) link again and it happened again. When I re-connect it afterwards, it takes some seconds and Gluster NFS works again. If this behavior is normal, then the replication link becomes a single point of failure. Any suggestions? It's sure not the design. So it's a bug. I can't affirm from personal experience that filing bug reports against Gluster results in action, but it's probably where to go with this. I do know that it's not the Gluster-standard thing to use a crossover link. (Seems to me it's the obvious best way to do it, but it's not a configuration they're committed to.) It's possible that if you were doing your replication over the LAN rather than the crossover that Gluster would handle a disconnected system better. Might be worth testing. Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Can you please share NFS and brick logs from the duration of the link going down? Gluster should have worked in the situation you described. Avati On Fri, Jun 10, 2011 at 3:27 PM, Daniel Manser dan...@clienta.ch wrote: Dear community, I have a 2-node gluster cluster with one replicated volume shared to a client via NFS. If the replication link (Ethernet crossover cable) between the Gluster nodes breaks, I discovered that my whole storage is not available anymore. I am using Pacemaker/corosync with two virtual IPs (service IPs exposed to the clients), so each node has its corresponding virtual IP, and if one node fails, corosync assigns the failing IP to the other running node). This mechanism works pretty good so far. So, I have: gluster1: IP 10.196.150.251 and virtual IP 10.196.150.250 gluster2: IP 10.196.150.252 and virtual IP 10.196.150.254 Now I am using DNS round-robin to distribute the load on both gluster nodes (name: gluster.mycompany.tld). If one node goes down, the virtual IP is handed over to the remaining node, and the client works without any disruption. However, if the replication link between the gluster nodes breaks, we examined a service disruption. The client was then unable to write data to the cluster. The replication links between gluster nodes seems to be a single point of failure. Is that correct? Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Can you please share NFS and brick logs from the duration of the link going down? Gluster should have worked in the situation you described. Brick log on gluster1: [2011-06-10 13:12:08.57634] W [socket.c:204:__socket_rwv] 0-tcp.vmware-server: readv failed (Connection timed out) [2011-06-10 13:12:08.57674] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.vmware-server: reading from socket failed. Error (Connection timed out), peer (192.168.150.252:1022) [2011-06-10 13:12:08.57712] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmx [2011-06-10 13:12:08.57778] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-3.log [2011-06-10 13:12:08.57796] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmdk [2011-06-10 13:12:08.57820] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one-flat.vmdk [2011-06-10 13:12:08.57848] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.nvram [2011-06-10 13:12:08.57866] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/.lck-98894e05b84ec3a1 [2011-06-10 13:12:08.57887] I [server.c:438:server_rpc_notify] 0-vmware-server: disconnected connection from 192.168.150.252:1022 [2011-06-10 13:12:08.57933] I [server-helpers.c:783:server_connection_destroy] 0-vmware-server: destroyed connection of gluster2-4038-2011/06/10-08:31:09:180937-vmware-client-0 [2011-06-10 13:12:19.3036] I [server-handshake.c:534:server_setvolume] 0-vmware-server: accepted client from 192.168.150.252:1021 Brick log on gluster2: [2011-06-10 13:12:01.467191] W [socket.c:204:__socket_rwv] 0-tcp.vmware-server: readv failed (Connection timed out) [2011-06-10 13:12:01.467236] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.vmware-server: reading from socket failed. Error (Connection timed out), peer (192.168.150.251:1021) [2011-06-10 13:12:01.467279] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmx [2011-06-10 13:12:01.467345] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmxf [2011-06-10 13:12:01.467362] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmdk [2011-06-10 13:12:01.467379] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one-flat.vmdk [2011-06-10 13:12:01.467413] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-4.log [2011-06-10 13:12:01.467431] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.nvram [2011-06-10 13:12:01.467447] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-5.log [2011-06-10 13:12:01.467463] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-6.log [2011-06-10 13:12:01.467478] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware.log [2011-06-10 13:12:01.467494] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/.lck-98894e05b84ec3a1 [2011-06-10 13:12:01.467510] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/.lck-b14dcb6d11640f7e [2011-06-10 13:12:01.467526] I [server.c:438:server_rpc_notify] 0-vmware-server: disconnected connection from 192.168.150.251:1021 [2011-06-10 13:12:01.467546] I [server-helpers.c:783:server_connection_destroy] 0-vmware-server: destroyed connection of gluster1-3974-2011/06/10-08:31:35:778159-vmware-client-1 [2011-06-10 13:12:18.705503] I [server-handshake.c:534:server_setvolume] 0-vmware-server: accepted client from 192.168.150.251:1021 NFS log on gluster2: [2011-06-10 13:11:56.708490] W [socket.c:204:__socket_rwv] 0-testvolume-client-0: readv failed (Connection timed out) [2011-06-10 13:11:56.708530] W [socket.c:1494:__socket_proto_state_machine] 0-testvolume-client-0: reading from socket failed. Error (Connection timed out), peer (192.168.150.251:24009) [2011-06-10 13:11:56.708564] I [client.c:1883:client_rpc_notify] 0-testvolume-client-0: disconnected [2011-06-10 13:11:56.709490] W [socket.c:204:__socket_rwv] 0-vmware-client-0: readv failed (Connection timed out) [2011-06-10 13:11:56.709521] W [socket.c:1494:__socket_proto_state_machine] 0-vmware-client-0: reading from socket failed. Error (Connection timed out), peer (192.168.150.251:24013) [2011-06-10 13:11:56.709553] I [client.c:1883:client_rpc_notify] 0-vmware-client-0: disconnected [2011-06-10 13:12:07.701752] I [client-handshake.c:1080:select_server_supported_programs] 0-testvolume-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-06-10 13:12:07.702059] I [client-handshake.c:913:client_setvolume_cbk] 0-testvolume-client-0: Connected to