Re: [Gluster-users] Recovery from network failure

2009-09-19 Thread Liam Slusser
You need to force replication to happen...ie "ls -alR" on your gluster mount.

liam


On Sat, Sep 19, 2009 at 1:57 PM, Georgecooldude
 wrote:
> Hi Guys
>
> I've been trying GlusterFS out today. I'm using Ubuntu 8 LTS, and Gluster
> 2.0.6. In my test lab I have two servers configured to mirror each other.
> Everything there works fine.
>
> However during failover testing I moved a 1GB image to the shared mount
> point on server01. I watched about 250mb of data get replicated to server02
> and then pulled the network cable out of server 02. When I plugged the cable
> back in about 1 minute later I was expecting the file to continue
> replicating however it didn't. Instead server02 got a corrupt 250mb file.
>
> Are there any steps that need to be taken to cover a scenario like this?
>
> Thanks in advance
>
> George
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Recovery from network failure

2009-09-19 Thread Georgecooldude
Hi Guys

I've been trying GlusterFS out today. I'm using Ubuntu 8 LTS, and Gluster
2.0.6. In my test lab I have two servers configured to mirror each other.
Everything there works fine.

However during failover testing I moved a 1GB image to the shared mount
point on server01. I watched about 250mb of data get replicated to server02
and then pulled the network cable out of server 02. When I plugged the cable
back in about 1 minute later I was expecting the file to continue
replicating however it didn't. Instead server02 got a corrupt 250mb file.

Are there any steps that need to be taken to cover a scenario like this?

Thanks in advance

George
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] The continuing story ...

2009-09-19 Thread Mark Mielke

On 09/19/2009 05:39 AM, Anand Avati wrote:


   

[r...@wcarh033]~# ps -ef | grep gluster
root  1548 1  0 21:00 ?00:00:00
/opt/glusterfs/sbin/glusterfsd -f /etc/glusterfs/glusterfsd.vol
root  1861 1  0 21:00 ?00:00:00
/opt/glusterfs/sbin/glusterfs --log-level=NORMAL
--volfile=/etc/glusterfs/tools.vol /gluster/tools
root  1874  1861  0 21:00 ?00:00:00 /bin/mount -i -f -t
fuse.glusterfs -o rw,allow_other,default_permissions,max_read=131072
/etc/glusterfs/tools.vol /gluster/tools
root  2426  2395  0 21:02 pts/200:00:00 grep gluster
[r...@wcarh033]~# ls /gluster/tools
^C^C

Yep - all three nodes locked up. All it took was a simultaneous reboot

of all three machines.

After I kill -9 1874 (kill 1874 without -9 has no effect) from a
different ssh session, I get:

ls: cannot access /gluster/tools: Transport endpoint is not connected

After this, mount works (unmount not necessary it turns out).

I am unable to strace -p the mount -t fuse without it freezing up. I
can
pstack, but it returns 0 lines of output fairly quickly.

The symptoms are identical on all three machines. 3-way replication,
each server has both a server exposing one volume, and a client, with

cluster/replication and a preferred read of the local server.
 


This is a strange hang. I have a few more questions -


1. is this off glusterfs.git master branch or release-2.0? If this is master, 
there have been heavy un-QA'ed modifications to get rid of libfuse dependency.
   


This is release-2.0 - 2.0.6 plus a few patches to pick up the other fix 
you completed for my previous problem.



2. what happens if you try to start the three daemons together now when the 
system is not booting? Is this hang somehow related to the system booting?

3. can you provide dmesg output and glusterfs trace level logs of this scenario?
   


I will try these. Regular output is nothing for the glusterfs (client) 
mounts. It shows the volumes and then no other lines. I'll see about 
turning tracing on.


Cheers,
mark

--
Mark Mielke

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Problmes with XFS and Gluster 2.0.6

2009-09-19 Thread Liam Slusser
Have you checked that you have free inodes on your XFS partitions?

xfs_db -r -c sb -c p /dev/sda1 | egrep 'ifree|icount'

If you're running low - you'll have to mount your partition with the
inode64 option.  Note that it requires a 64bit box and all your
gluster clients will also need to be 64bit for everything to work.

There is a thread here a few months back about inode64 and gluster -
dig through the archives lots of good info in it - but the short is it
works fine as long as everything is 64bit.

liam

On Fri, Sep 18, 2009 at 5:44 PM, Nathan Stratton  wrote:
>
> Anyone else running into problems with XFS and Gluster? Things run fine for
> a while, but then I get things like:
>
> ls: reading directory .: Input/output error
>
> I initially did not think it was a Gluster issue because I saw the errors on
> the raw XFS exported partition. However when I checked I found that the
> problem happened on all 4 nodes. I just don't know how 4 XFS partions on 4
> different boxes could all become corrupted at one time.
>
> Whatever happens it is bad wrong because xfs can't even fix it:
>
> http://share.robotics.net/xfs-crash.txt
>
>> <>
>
> Nathan Stratton                                CTO, BlinkMind, Inc.
> nathan at robotics.net                         nathan at blinkmind.com
> http://www.robotics.net                        http://www.blinkmind.com
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] The continuing story ...

2009-09-19 Thread Anand Avati


> [r...@wcarh033]~# ps -ef | grep gluster
> root  1548 1  0 21:00 ?00:00:00 
> /opt/glusterfs/sbin/glusterfsd -f /etc/glusterfs/glusterfsd.vol
> root  1861 1  0 21:00 ?00:00:00 
> /opt/glusterfs/sbin/glusterfs --log-level=NORMAL 
> --volfile=/etc/glusterfs/tools.vol /gluster/tools
> root  1874  1861  0 21:00 ?00:00:00 /bin/mount -i -f -t 
> fuse.glusterfs -o rw,allow_other,default_permissions,max_read=131072 
> /etc/glusterfs/tools.vol /gluster/tools
> root  2426  2395  0 21:02 pts/200:00:00 grep gluster
> [r...@wcarh033]~# ls /gluster/tools
> ^C^C
> 
> Yep - all three nodes locked up. All it took was a simultaneous reboot
> 
> of all three machines.
> 
> After I kill -9 1874 (kill 1874 without -9 has no effect) from a 
> different ssh session, I get:
> 
> ls: cannot access /gluster/tools: Transport endpoint is not connected
> 
> After this, mount works (unmount not necessary it turns out).
> 
> I am unable to strace -p the mount -t fuse without it freezing up. I
> can 
> pstack, but it returns 0 lines of output fairly quickly.
> 
> The symptoms are identical on all three machines. 3-way replication, 
> each server has both a server exposing one volume, and a client, with
> 
> cluster/replication and a preferred read of the local server.


This is a strange hang. I have a few more questions -


1. is this off glusterfs.git master branch or release-2.0? If this is master, 
there have been heavy un-QA'ed modifications to get rid of libfuse dependency.

2. what happens if you try to start the three daemons together now when the 
system is not booting? Is this hang somehow related to the system booting?

3. can you provide dmesg output and glusterfs trace level logs of this scenario?

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users