Re: [Gluster-users] Gluster failure testing
On Tue, Aug 14, 2012 at 08:19:27PM -0700, stephen pierce wrote: I let both clients run for a while, then I stop one client. I then reset the brick/server that is not active (the other one is servicing the HTTP traffic) now. Do you mean that client1 sends HTTP traffic to brick/server1, and client2 sends HTTP traffic to brick/server2? While investigating, I discover that there are a lot of phantom files that are listed with just a filename, and lots of question marks () when doing an ls l. rm rf * on the Gluster volume seems to complete, but leaves behind all the broken files. It would be helpful if you could show the actual ls -l output, but my guess is you are seeing something like this (demo on a local filesystem, not gluster): $ mkdir testdir $ touch testdir/testfile $ chmod -x testdir $ ls -l testdir ls: cannot access testdir/testfile: Permission denied total 0 -? ? ? ? ?? testfile If so, these aren't really phantom files, but the permissions of the enclosing directory are set wrongly (which might be some intermediate state in gluster replication, I don't know) So an ls -ld of the parent directory would also be a good thing. Also, are these filenames those you'd expect your application to create? What might be helpful is to trace your backend-application and what's making it return a 500 server error, which may or may not be related to these permissions. If you can see what file operations the backend is trying to do and what filesystem error is being returned (e.g. with strace), this may make it clearer what's going on. Then you can perhaps crank up gluster logs at the appropriate place too. Any log messages talking about split brain would be especially interesting. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Problem mounting Gluster volume [3.3]
On Wed, Aug 15, 2012 at 12:06:09AM +0200, Paolo Di Tommaso wrote: Hi, I'm mounting using the following sudo mount -t glusterfs master:/vol1 /soft The command should be right since, it works in on the server node (master) but it is failing on a client. Also I'm using the latests versions 3.3. (and it was working with the 3.2) Same version of Linux on the client and on the server? What's the output of gluster volume info on the server? If you had a replicated or distributed volume consisting of master:/brick and slave:/brick, then the client needs to be able to resolve both master and slave. Otherwise, something odd is happening. I would first do grep -i master /etc/hosts and look for anything obvious (repeat for any other brick hostnames in the volume). Next I would run a DNS tcpdump on the client: sudo tcpdump -i eth0 -nn -s0 udp port 53 and do the mount again in a different window, and go back to the DNS tcpdump to see what's seen. If neither of those turn up anything, I would strace the whole mount process: sudo strace -f mount -t glusterfs master:/vol1 /soft 2strace.log and then look at the tail end of this log, and see what it was doing just before it wrote the error to the screen. HTH, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] ext4 issue explained
Brian Candler asks: On Tue, Aug 14, 2012 at 12:19:10AM -0700, Joe Julian wrote: I'm betting that your bricks are formatted ext4. If they are, you have a bug due to a recent structure change in ext4. If that is the problem, you can downgrade your kernel to before they backported the change (not sure which version that is though), or reformat your bricks xfs. Do you have a link to any info on that issue? Does it only affect RedHat, or does it also affect distros running new kernels? I am using ext4 rather than xfs because I was reliably able to make machines running xfs lock up (these are Ubuntu not RedHat BTW) just by throwing bonnie++ load at them, but not when running the same test on ext4. I have written up an article at http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ext4 issue explained
On Wed, Aug 15, 2012 at 01:37:32AM -0700, Joe Julian wrote: Do you have a link to any info on that issue? Does it only affect RedHat, or does it also affect distros running new kernels? I am using ext4 rather than xfs because I was reliably able to make machines running xfs lock up (these are Ubuntu not RedHat BTW) just by throwing bonnie++ load at them, but not when running the same test on ext4. I have written up an article at [1]http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/ Many thanks. I'm on Ubuntu 12.04 with a 3.2.0 kernel - so this shouldn't affect me, as long as the Ubuntu people don't backport this patch as well :-) Cheers, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ext4 issue explained
On August 15, 2012 5:28:58 AM Brian Candler b.cand...@pobox.com wrote: Many thanks. I'm on Ubuntu 12.04 with a 3.2.0 kernel - so this shouldn't affect me, as long as the Ubuntu people don't backport this patch as well That would be a forward port, and its almost certain to occur. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ext4 issue explained
On Wed, Aug 15, 2012 at 08:19:16AM -0400, Jeff Darcy wrote: On August 15, 2012 5:28:58 AM Brian Candler b.cand...@pobox.com wrote: Many thanks. I'm on Ubuntu 12.04 with a 3.2.0 kernel - so this shouldn't affect me, as long as the Ubuntu people don't backport this patch as well That would be a forward port Sorry, how do you mean? The change originated in 3.3.0-rc2, according to the post, and I'm running a kernel based on 3.2.0. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ext4 issue explained
On Wed, Aug 15, 2012 at 5:37 AM, Joe Julian j...@julianfamily.org wrote: Brian Candler asks: I have written up an article at http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/ So Gluster 3.3.1 will be compatible with kernels 3.3+ using ext4 on bricks. Gluster up to 3.3.0 is not. Is that it? Any ETA for Gluster 3.3.1? Regards, Rodrigo Severo -- --- Rodrigo Severo Fábrica de Idéias SBS Quadra 2 - Bloco S - Ed. Empire Center - Sala 1.301 Brasília - DF - CEP 70070-904 Tel. (61) 3321-1357 Fax (61) 3223-1712 --- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster failure testing
are you using ext4 with redhat/centos? There is a previous thread that shows some kind of bug with ext4 that causes similar sounding problems. If you are using ext4, try using xfs. On Tue, Aug 14, 2012 at 11:12 PM, Brian Candler b.cand...@pobox.com wrote: On Tue, Aug 14, 2012 at 08:19:27PM -0700, stephen pierce wrote: I let both clients run for a while, then I stop one client. I then reset the brick/server that is not active (the other one is servicing the HTTP traffic) now. Do you mean that client1 sends HTTP traffic to brick/server1, and client2 sends HTTP traffic to brick/server2? While investigating, I discover that there are a lot of phantom files that are listed with just a filename, and lots of question marks () when doing an ls l. rm rf * on the Gluster volume seems to complete, but leaves behind all the broken files. It would be helpful if you could show the actual ls -l output, but my guess is you are seeing something like this (demo on a local filesystem, not gluster): $ mkdir testdir $ touch testdir/testfile $ chmod -x testdir $ ls -l testdir ls: cannot access testdir/testfile: Permission denied total 0 -? ? ? ? ?? testfile If so, these aren't really phantom files, but the permissions of the enclosing directory are set wrongly (which might be some intermediate state in gluster replication, I don't know) So an ls -ld of the parent directory would also be a good thing. Also, are these filenames those you'd expect your application to create? What might be helpful is to trace your backend-application and what's making it return a 500 server error, which may or may not be related to these permissions. If you can see what file operations the backend is trying to do and what filesystem error is being returned (e.g. with strace), this may make it clearer what's going on. Then you can perhaps crank up gluster logs at the appropriate place too. Any log messages talking about split brain would be especially interesting. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users