Re: [Gluster-users] Gluster failure testing

2012-08-15 Thread Brian Candler
On Tue, Aug 14, 2012 at 08:19:27PM -0700, stephen pierce wrote:
I let both clients run for a while, then I stop one client. I then
reset the brick/server that is not active (the other one is servicing
the HTTP traffic) now.

Do you mean that client1 sends HTTP traffic to brick/server1, and client2
sends HTTP traffic to brick/server2?

While investigating, I discover that there are a lot of phantom
files that are listed with just a filename, and lots of question marks
() when doing an ls l. rm rf * on the Gluster volume seems to
complete, but leaves behind all the broken files.

It would be helpful if you could show the actual ls -l output, but my guess
is you are seeing something like this (demo on a local filesystem, not
gluster):

$ mkdir testdir
$ touch testdir/testfile
$ chmod -x testdir
$ ls -l testdir
ls: cannot access testdir/testfile: Permission denied
total 0
-? ? ? ? ?? testfile

If so, these aren't really phantom files, but the permissions of the
enclosing directory are set wrongly (which might be some intermediate state
in gluster replication, I don't know)

So an ls -ld of the parent directory would also be a good thing. Also, are
these filenames those you'd expect your application to create?

What might be helpful is to trace your backend-application and what's making
it return a 500 server error, which may or may not be related to these
permissions.  If you can see what file operations the backend is trying to
do and what filesystem error is being returned (e.g.  with strace), this may
make it clearer what's going on.  Then you can perhaps crank up gluster logs
at the appropriate place too.

Any log messages talking about split brain would be especially interesting.

Regards,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Problem mounting Gluster volume [3.3]

2012-08-15 Thread Brian Candler
On Wed, Aug 15, 2012 at 12:06:09AM +0200, Paolo Di Tommaso wrote:
Hi,
 
I'm mounting using the following
 
sudo mount -t glusterfs master:/vol1 /soft
 
The command should be right since, it works in on the server node
(master) but it is failing on a client.
 
Also I'm using the latests versions 3.3. (and it was working with the
3.2)

Same version of Linux on the client and on the server?

What's the output of gluster volume info on the server? If you had a
replicated or distributed volume consisting of master:/brick and
slave:/brick, then the client needs to be able to resolve both master and
slave.

Otherwise, something odd is happening. I would first do

grep -i master /etc/hosts

and look for anything obvious (repeat for any other brick hostnames in the
volume).  Next I would run a DNS tcpdump on the client:

sudo tcpdump -i eth0 -nn -s0 udp port 53

and do the mount again in a different window, and go back to the DNS tcpdump
to see what's seen.

If neither of those turn up anything, I would strace the whole mount
process:

sudo strace -f mount -t glusterfs master:/vol1 /soft 2strace.log

and then look at the tail end of this log, and see what it was doing just
before it wrote the error to the screen.

HTH,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] ext4 issue explained

2012-08-15 Thread Joe Julian

Brian Candler asks:

On Tue, Aug 14, 2012 at 12:19:10AM -0700, Joe Julian wrote:

I'm betting that your bricks are formatted ext4. If they are, you have
a bug due to a recent structure change in ext4. If that is the problem,
you can downgrade your kernel to before they backported the change (not
sure which version that is though), or reformat your bricks xfs.

Do you have a link to any info on that issue?

Does it only affect RedHat, or does it also affect distros running new
kernels?

I am using ext4 rather than xfs because I was reliably able to make machines
running xfs lock up (these are Ubuntu not RedHat BTW) just by throwing
bonnie++ load at them, but not when running the same test on ext4.
I have written up an article at 
http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] ext4 issue explained

2012-08-15 Thread Brian Candler
On Wed, Aug 15, 2012 at 01:37:32AM -0700, Joe Julian wrote:
 Do you have a link to any info on that issue?
 
 Does it only affect RedHat, or does it also affect distros running new
 kernels?
 
 I am using ext4 rather than xfs because I was reliably able to make machines
 running xfs lock up (these are Ubuntu not RedHat BTW) just by throwing
 bonnie++ load at them, but not when running the same test on ext4.
 
I have written up an article at
[1]http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/

Many thanks. I'm on Ubuntu 12.04 with a 3.2.0 kernel - so this shouldn't
affect me, as long as the Ubuntu people don't backport this patch as well
:-)

Cheers,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] ext4 issue explained

2012-08-15 Thread Jeff Darcy

On August 15, 2012 5:28:58 AM Brian Candler b.cand...@pobox.com wrote:

Many thanks. I'm on Ubuntu 12.04 with a 3.2.0 kernel - so this shouldn't
affect me, as long as the Ubuntu people don't backport this patch as well


That would be a forward port, and its almost certain to occur.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] ext4 issue explained

2012-08-15 Thread Brian Candler
On Wed, Aug 15, 2012 at 08:19:16AM -0400, Jeff Darcy wrote:
 On August 15, 2012 5:28:58 AM Brian Candler b.cand...@pobox.com wrote:
 Many thanks. I'm on Ubuntu 12.04 with a 3.2.0 kernel - so this shouldn't
 affect me, as long as the Ubuntu people don't backport this patch as well
 
 That would be a forward port

Sorry, how do you mean? The change originated in 3.3.0-rc2, according to the
post, and I'm running a kernel based on 3.2.0.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] ext4 issue explained

2012-08-15 Thread Rodrigo Severo
On Wed, Aug 15, 2012 at 5:37 AM, Joe Julian j...@julianfamily.org wrote:

  Brian Candler asks:

I have written up an article at
 http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/


So Gluster 3.3.1 will be compatible with kernels 3.3+ using ext4 on bricks.
Gluster up to 3.3.0 is not. Is that it?

Any ETA for Gluster 3.3.1?


Regards,

Rodrigo Severo

-- 
---
Rodrigo Severo

Fábrica de Idéias
SBS Quadra 2 - Bloco S - Ed. Empire Center - Sala 1.301
Brasília - DF - CEP 70070-904
Tel. (61) 3321-1357   Fax (61) 3223-1712
---
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster failure testing

2012-08-15 Thread Bryan Whitehead
are you using ext4 with redhat/centos? There is a previous thread that
shows some kind of bug with ext4 that causes similar sounding
problems.

If you are using ext4, try using xfs.

On Tue, Aug 14, 2012 at 11:12 PM, Brian Candler b.cand...@pobox.com wrote:
 On Tue, Aug 14, 2012 at 08:19:27PM -0700, stephen pierce wrote:
I let both clients run for a while, then I stop one client. I then
reset the brick/server that is not active (the other one is servicing
the HTTP traffic) now.

 Do you mean that client1 sends HTTP traffic to brick/server1, and client2
 sends HTTP traffic to brick/server2?

While investigating, I discover that there are a lot of phantom
files that are listed with just a filename, and lots of question marks
() when doing an ls l. rm rf * on the Gluster volume seems to
complete, but leaves behind all the broken files.

 It would be helpful if you could show the actual ls -l output, but my guess
 is you are seeing something like this (demo on a local filesystem, not
 gluster):

 $ mkdir testdir
 $ touch testdir/testfile
 $ chmod -x testdir
 $ ls -l testdir
 ls: cannot access testdir/testfile: Permission denied
 total 0
 -? ? ? ? ?? testfile

 If so, these aren't really phantom files, but the permissions of the
 enclosing directory are set wrongly (which might be some intermediate state
 in gluster replication, I don't know)

 So an ls -ld of the parent directory would also be a good thing. Also, are
 these filenames those you'd expect your application to create?

 What might be helpful is to trace your backend-application and what's making
 it return a 500 server error, which may or may not be related to these
 permissions.  If you can see what file operations the backend is trying to
 do and what filesystem error is being returned (e.g.  with strace), this may
 make it clearer what's going on.  Then you can perhaps crank up gluster logs
 at the appropriate place too.

 Any log messages talking about split brain would be especially interesting.

 Regards,

 Brian.
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users