[Gluster-users] Can't set volume options: operation failed
We've had some strange issues on our two-node replicated cluster recently. Now I'm trying to reconfigure the network.ping-timeout setting on a volume, which results in operation failed: gluster volume info vol0_web1 Volume Name: vol0_web1 Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: glu1.example.org:/mnt/vol0/web1 Brick2: glu2.example.org:/mnt/vol0/web1 Options Reconfigured: network.ping-timeout: 1 gluster volume set vol0_web1 network.ping-timeout 20 operation failed The log message in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log: [2011-11-10 14:11:19.464338] E [glusterd-handler.c:1900:glusterd_handle_set_volume] 0-: Unable to set cli op: 16 [2011-11-10 14:11:19.465716] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:987) Has anyone an idea what might wrong here? Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Expand replicated bricks to more nodes/change replica count
I believe you will have to re-create your volume. I have a feature request in for this too. [...] Thanks Jeff. That's definitely a feature I'd love to see. I was expecting that this is possible without service downtime since - IMO - replicated data should give you some sort of flexibility to expand/shrink the cluster. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Expand replicated bricks to more nodes/change replica count
Hi list I have set up several volumes on a two-node Gluster setup using replica 2 configurations. I would like to add two more nodes to the trusted pool so that all volumes are replicated on 4 nodes. I wonder if that can be done online, but after doing some research on this I didn't find evidence that it's possible to change the replica count _after_ the volume has been created. Would I have to stop the volumes, set up a new volume with a replica of 4, and then start the volume again? Or is there a way to do it online? Thanks Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Hi Whit, Thanks for your reply. I do know that it's not the Gluster-standard thing to use a crossover link. (Seems to me it's the obvious best way to do it, but it's not a configuration they're committed to.) It's possible that if you were doing your replication over the LAN rather than the crossover that Gluster would handle a disconnected system better. Might be worth testing. It is still the same, even if no crossover cable is used and all traffic goes through an ethernet switch. The client can't write to the gluster volume anymore. I discovered that the NFS volume seems to be read-only in this state: client01:~# rm debian-6.0.1a-i386-DVD-1.iso rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file system So all traffic goes through one interface (NFS to the client, glusterfs replication, corosync). I can reproduce the issue with the NFS client on VMware ESXi and with the NFS client on my Linux desktop. My config: Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Regards, Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Hi Thanks for your reply. Can you confirm if you backend filesystem is proper? Can you delete the file from the backend? I was able to delete files on the server. Also, try setting a lower ping-timeout and see if it helps in case of crosscable failover test. I set it to 5 seconds, but the result is still the same. Volume Name: vmware Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: gluster1:/mnt/gvolumes/vmware Brick2: gluster2:/mnt/gvolumes/vmware Options Reconfigured: network.ping-timeout: 5 Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
I disconnected the crossover (replication) link again and it happened again. When I re-connect it afterwards, it takes some seconds and Gluster NFS works again. If this behavior is normal, then the replication link becomes a single point of failure. Any suggestions? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Crossover cable: single point of failure?
Dear community, I have a 2-node gluster cluster with one replicated volume shared to a client via NFS. If the replication link (Ethernet crossover cable) between the Gluster nodes breaks, I discovered that my whole storage is not available anymore. I am using Pacemaker/corosync with two virtual IPs (service IPs exposed to the clients), so each node has its corresponding virtual IP, and if one node fails, corosync assigns the failing IP to the other running node). This mechanism works pretty good so far. So, I have: gluster1: IP 10.196.150.251 and virtual IP 10.196.150.250 gluster2: IP 10.196.150.252 and virtual IP 10.196.150.254 Now I am using DNS round-robin to distribute the load on both gluster nodes (name: gluster.mycompany.tld). If one node goes down, the virtual IP is handed over to the remaining node, and the client works without any disruption. However, if the replication link between the gluster nodes breaks, we examined a service disruption. The client was then unable to write data to the cluster. The replication links between gluster nodes seems to be a single point of failure. Is that correct? Daniel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Crossover cable: single point of failure?
Can you please share NFS and brick logs from the duration of the link going down? Gluster should have worked in the situation you described. Brick log on gluster1: [2011-06-10 13:12:08.57634] W [socket.c:204:__socket_rwv] 0-tcp.vmware-server: readv failed (Connection timed out) [2011-06-10 13:12:08.57674] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.vmware-server: reading from socket failed. Error (Connection timed out), peer (192.168.150.252:1022) [2011-06-10 13:12:08.57712] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmx [2011-06-10 13:12:08.57778] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-3.log [2011-06-10 13:12:08.57796] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmdk [2011-06-10 13:12:08.57820] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one-flat.vmdk [2011-06-10 13:12:08.57848] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.nvram [2011-06-10 13:12:08.57866] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/.lck-98894e05b84ec3a1 [2011-06-10 13:12:08.57887] I [server.c:438:server_rpc_notify] 0-vmware-server: disconnected connection from 192.168.150.252:1022 [2011-06-10 13:12:08.57933] I [server-helpers.c:783:server_connection_destroy] 0-vmware-server: destroyed connection of gluster2-4038-2011/06/10-08:31:09:180937-vmware-client-0 [2011-06-10 13:12:19.3036] I [server-handshake.c:534:server_setvolume] 0-vmware-server: accepted client from 192.168.150.252:1021 Brick log on gluster2: [2011-06-10 13:12:01.467191] W [socket.c:204:__socket_rwv] 0-tcp.vmware-server: readv failed (Connection timed out) [2011-06-10 13:12:01.467236] W [socket.c:1494:__socket_proto_state_machine] 0-tcp.vmware-server: reading from socket failed. Error (Connection timed out), peer (192.168.150.251:1021) [2011-06-10 13:12:01.467279] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmx [2011-06-10 13:12:01.467345] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmxf [2011-06-10 13:12:01.467362] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.vmdk [2011-06-10 13:12:01.467379] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one-flat.vmdk [2011-06-10 13:12:01.467413] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-4.log [2011-06-10 13:12:01.467431] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/debian one.nvram [2011-06-10 13:12:01.467447] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-5.log [2011-06-10 13:12:01.467463] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware-6.log [2011-06-10 13:12:01.467478] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/vmware.log [2011-06-10 13:12:01.467494] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/.lck-98894e05b84ec3a1 [2011-06-10 13:12:01.467510] I [server-helpers.c:485:do_fd_cleanup] 0-vmware-server: fd cleanup on /debian one/.lck-b14dcb6d11640f7e [2011-06-10 13:12:01.467526] I [server.c:438:server_rpc_notify] 0-vmware-server: disconnected connection from 192.168.150.251:1021 [2011-06-10 13:12:01.467546] I [server-helpers.c:783:server_connection_destroy] 0-vmware-server: destroyed connection of gluster1-3974-2011/06/10-08:31:35:778159-vmware-client-1 [2011-06-10 13:12:18.705503] I [server-handshake.c:534:server_setvolume] 0-vmware-server: accepted client from 192.168.150.251:1021 NFS log on gluster2: [2011-06-10 13:11:56.708490] W [socket.c:204:__socket_rwv] 0-testvolume-client-0: readv failed (Connection timed out) [2011-06-10 13:11:56.708530] W [socket.c:1494:__socket_proto_state_machine] 0-testvolume-client-0: reading from socket failed. Error (Connection timed out), peer (192.168.150.251:24009) [2011-06-10 13:11:56.708564] I [client.c:1883:client_rpc_notify] 0-testvolume-client-0: disconnected [2011-06-10 13:11:56.709490] W [socket.c:204:__socket_rwv] 0-vmware-client-0: readv failed (Connection timed out) [2011-06-10 13:11:56.709521] W [socket.c:1494:__socket_proto_state_machine] 0-vmware-client-0: reading from socket failed. Error (Connection timed out), peer (192.168.150.251:24013) [2011-06-10 13:11:56.709553] I [client.c:1883:client_rpc_notify] 0-vmware-client-0: disconnected [2011-06-10 13:12:07.701752] I [client-handshake.c:1080:select_server_supported_programs] 0-testvolume-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310) [2011-06-10 13:12:07.702059] I [client-handshake.c:913:client_setvolume_cbk] 0-testvolume-client-0: Connected to