Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-14 Thread Daniel Manser

Hi Whit,

Thanks for your reply.

I do know that it's not the Gluster-standard thing to use a crossover 
link.

(Seems to me it's the obvious best way to do it, but it's not a
configuration they're committed to.) It's possible that if you were 
doing
your replication over the LAN rather than the crossover that Gluster 
would

handle a disconnected system better. Might be worth testing.


It is still the same, even if no crossover cable is used and all 
traffic goes through an ethernet switch. The client can't write to the 
gluster volume anymore. I discovered that the NFS volume seems to be 
read-only in this state:


  client01:~# rm debian-6.0.1a-i386-DVD-1.iso
  rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file 
system


So all traffic goes through one interface (NFS to the client, glusterfs 
replication, corosync).


I can reproduce the issue with the NFS client on VMware ESXi and with 
the NFS client on my Linux desktop.


My config:

  Volume Name: vmware
  Type: Replicate
  Status: Started
  Number of Bricks: 2
  Transport-type: tcp
  Bricks:
  Brick1: gluster1:/mnt/gvolumes/vmware
  Brick2: gluster2:/mnt/gvolumes/vmware

Regards,
Daniel
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-14 Thread Anand Avati
Daniel,
 Can you confirm if you backend filesystem is proper? Can you delete the
file from the backend? Gluster does not return EROFS in any of the cases you
described. Also, try setting a lower ping-timeout and see if it helps in
case of crosscable failover test.

Avati

On Tue, Jun 14, 2011 at 12:58 PM, Daniel Manser dan...@clienta.ch wrote:

 Hi Whit,

 Thanks for your reply.


  I do know that it's not the Gluster-standard thing to use a crossover
 link.
 (Seems to me it's the obvious best way to do it, but it's not a
 configuration they're committed to.) It's possible that if you were doing
 your replication over the LAN rather than the crossover that Gluster would
 handle a disconnected system better. Might be worth testing.


 It is still the same, even if no crossover cable is used and all traffic
 goes through an ethernet switch. The client can't write to the gluster
 volume anymore. I discovered that the NFS volume seems to be read-only in
 this state:

  client01:~# rm debian-6.0.1a-i386-DVD-1.iso
  rm: cannot remove `debian-6.0.1a-i386-DVD-1.iso': Read-only file system

 So all traffic goes through one interface (NFS to the client, glusterfs
 replication, corosync).

 I can reproduce the issue with the NFS client on VMware ESXi and with the
 NFS client on my Linux desktop.

 My config:

  Volume Name: vmware
  Type: Replicate
  Status: Started
  Number of Bricks: 2
  Transport-type: tcp
  Bricks:
  Brick1: gluster1:/mnt/gvolumes/vmware
  Brick2: gluster2:/mnt/gvolumes/vmware

 Regards,
 Daniel

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-14 Thread Daniel Manser

Hi

Thanks for your reply.


 Can you confirm if you backend filesystem is proper? Can you delete
the file from the backend?


I was able to delete files on the server.


Also, try setting a lower ping-timeout and see if
it helps in case of crosscable failover test.


I set it to 5 seconds, but the result is still the same.

  Volume Name: vmware
  Type: Replicate
  Status: Started
  Number of Bricks: 2
  Transport-type: tcp
  Bricks:
  Brick1: gluster1:/mnt/gvolumes/vmware
  Brick2: gluster2:/mnt/gvolumes/vmware
  Options Reconfigured:
  network.ping-timeout: 5

Daniel
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-14 Thread Mohit Anchlia
On Tue, Jun 14, 2011 at 2:51 AM, Daniel Manser dan...@clienta.ch wrote:
 Hi

 Thanks for your reply.

  Can you confirm if you backend filesystem is proper? Can you delete
 the file from the backend?

 I was able to delete files on the server.

 Also, try setting a lower ping-timeout and see if
 it helps in case of crosscable failover test.

 I set it to 5 seconds, but the result is still the same.

It will be good to get to bottom of this. Do you see any errors in
server logs? Is it possible to do the same test with no vmware in
between, just using baremetals?


  Volume Name: vmware
  Type: Replicate
  Status: Started
  Number of Bricks: 2
  Transport-type: tcp
  Bricks:
  Brick1: gluster1:/mnt/gvolumes/vmware
  Brick2: gluster2:/mnt/gvolumes/vmware
  Options Reconfigured:
  network.ping-timeout: 5

 Daniel
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-13 Thread Daniel Manser
I disconnected the crossover (replication) link again and it happened 
again. When I re-connect it afterwards, it takes some seconds and 
Gluster NFS works again. If this behavior is normal, then the 
replication link becomes a single point of failure. Any suggestions?

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-13 Thread Whit Blauvelt
On Mon, Jun 13, 2011 at 11:04:18AM +0100, Daniel Manser wrote:
 I disconnected the crossover (replication) link again and it
 happened again. When I re-connect it afterwards, it takes some
 seconds and Gluster NFS works again. If this behavior is normal,
 then the replication link becomes a single point of failure. Any
 suggestions?

It's sure not the design. So it's a bug. I can't affirm from personal
experience that filing bug reports against Gluster results in action, but
it's probably where to go with this.

I do know that it's not the Gluster-standard thing to use a crossover link.
(Seems to me it's the obvious best way to do it, but it's not a
configuration they're committed to.) It's possible that if you were doing
your replication over the LAN rather than the crossover that Gluster would
handle a disconnected system better. Might be worth testing.

Whit
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-10 Thread Anand Avati
Can you please share NFS and brick logs from the duration of the link going
down? Gluster should have worked in the situation you described.

Avati

On Fri, Jun 10, 2011 at 3:27 PM, Daniel Manser dan...@clienta.ch wrote:

 Dear community,

 I have a 2-node gluster cluster with one replicated volume shared to a
 client via NFS. If the replication link (Ethernet crossover cable) between
 the Gluster nodes breaks, I discovered that my whole storage is not
 available anymore.

 I am using Pacemaker/corosync with two virtual IPs (service IPs exposed to
 the clients), so each node has its corresponding virtual IP, and if one node
 fails, corosync assigns the failing IP to the other running node). This
 mechanism works pretty good so far.

 So, I have:

  gluster1: IP 10.196.150.251 and virtual IP 10.196.150.250
  gluster2: IP 10.196.150.252 and virtual IP 10.196.150.254

 Now I am using DNS round-robin to distribute the load on both gluster nodes
 (name: gluster.mycompany.tld).

 If one node goes down, the virtual IP is handed over to the remaining node,
 and the client works without any disruption. However, if the replication
 link between the gluster nodes breaks, we examined a service disruption. The
 client was then unable to write data to the cluster. The replication links
 between gluster nodes seems to be a single point of failure. Is that
 correct?

 Daniel
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Crossover cable: single point of failure?

2011-06-10 Thread Daniel Manser

Can you please share NFS and brick logs from the duration of the link
going down? Gluster should have worked in the situation you 
described.


Brick log on gluster1:
[2011-06-10 13:12:08.57634] W [socket.c:204:__socket_rwv] 
0-tcp.vmware-server: readv failed (Connection timed out)
[2011-06-10 13:12:08.57674] W 
[socket.c:1494:__socket_proto_state_machine] 0-tcp.vmware-server: 
reading from socket failed. Error (Connection timed out), peer 
(192.168.150.252:1022)
[2011-06-10 13:12:08.57712] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.vmx
[2011-06-10 13:12:08.57778] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/vmware-3.log
[2011-06-10 13:12:08.57796] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.vmdk
[2011-06-10 13:12:08.57820] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one-flat.vmdk
[2011-06-10 13:12:08.57848] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.nvram
[2011-06-10 13:12:08.57866] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/.lck-98894e05b84ec3a1
[2011-06-10 13:12:08.57887] I [server.c:438:server_rpc_notify] 
0-vmware-server: disconnected connection from 192.168.150.252:1022
[2011-06-10 13:12:08.57933] I 
[server-helpers.c:783:server_connection_destroy] 0-vmware-server: 
destroyed connection of 
gluster2-4038-2011/06/10-08:31:09:180937-vmware-client-0
[2011-06-10 13:12:19.3036] I [server-handshake.c:534:server_setvolume] 
0-vmware-server: accepted client from 192.168.150.252:1021


Brick log on gluster2:
[2011-06-10 13:12:01.467191] W [socket.c:204:__socket_rwv] 
0-tcp.vmware-server: readv failed (Connection timed out)
[2011-06-10 13:12:01.467236] W 
[socket.c:1494:__socket_proto_state_machine] 0-tcp.vmware-server: 
reading from socket failed. Error (Connection timed out), peer 
(192.168.150.251:1021)
[2011-06-10 13:12:01.467279] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.vmx
[2011-06-10 13:12:01.467345] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.vmxf
[2011-06-10 13:12:01.467362] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.vmdk
[2011-06-10 13:12:01.467379] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one-flat.vmdk
[2011-06-10 13:12:01.467413] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/vmware-4.log
[2011-06-10 13:12:01.467431] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/debian one.nvram
[2011-06-10 13:12:01.467447] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/vmware-5.log
[2011-06-10 13:12:01.467463] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/vmware-6.log
[2011-06-10 13:12:01.467478] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/vmware.log
[2011-06-10 13:12:01.467494] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/.lck-98894e05b84ec3a1
[2011-06-10 13:12:01.467510] I [server-helpers.c:485:do_fd_cleanup] 
0-vmware-server: fd cleanup on /debian one/.lck-b14dcb6d11640f7e
[2011-06-10 13:12:01.467526] I [server.c:438:server_rpc_notify] 
0-vmware-server: disconnected connection from 192.168.150.251:1021
[2011-06-10 13:12:01.467546] I 
[server-helpers.c:783:server_connection_destroy] 0-vmware-server: 
destroyed connection of 
gluster1-3974-2011/06/10-08:31:35:778159-vmware-client-1
[2011-06-10 13:12:18.705503] I 
[server-handshake.c:534:server_setvolume] 0-vmware-server: accepted 
client from 192.168.150.251:1021


NFS log on gluster2:
[2011-06-10 13:11:56.708490] W [socket.c:204:__socket_rwv] 
0-testvolume-client-0: readv failed (Connection timed out)
[2011-06-10 13:11:56.708530] W 
[socket.c:1494:__socket_proto_state_machine] 0-testvolume-client-0: 
reading from socket failed. Error (Connection timed out), peer 
(192.168.150.251:24009)
[2011-06-10 13:11:56.708564] I [client.c:1883:client_rpc_notify] 
0-testvolume-client-0: disconnected
[2011-06-10 13:11:56.709490] W [socket.c:204:__socket_rwv] 
0-vmware-client-0: readv failed (Connection timed out)
[2011-06-10 13:11:56.709521] W 
[socket.c:1494:__socket_proto_state_machine] 0-vmware-client-0: reading 
from socket failed. Error (Connection timed out), peer 
(192.168.150.251:24013)
[2011-06-10 13:11:56.709553] I [client.c:1883:client_rpc_notify] 
0-vmware-client-0: disconnected
[2011-06-10 13:12:07.701752] I 
[client-handshake.c:1080:select_server_supported_programs] 
0-testvolume-client-0: Using Program GlusterFS-3.1.0, Num (1298437), 
Version (310)
[2011-06-10 13:12:07.702059] I 
[client-handshake.c:913:client_setvolume_cbk] 0-testvolume-client-0: 
Connected to