Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-14 Thread Vijay Bellur

On 06/13/2013 03:38 PM, John Brunelle wrote:

Hello,

We're having an issue with our distributed gluster filesystem:

* gluster 3.3.1 servers and clients
* distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes
* xfs backend
* nfs clients
* nfs.enable-ino32: On

* servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64
* cleints: CentOS 5.7, 2.6.18-274.12.1.el5

We have a directory containing 3,343 subdirectories.  On some clients,
ls lists only a subset of the directories (a different amount on
different clients).  On others, ls gets stuck in a getdents loop and
consumes more and more memory until it hits ENOMEM.  On yet others, it
works fine.  Having the bad clients remount or drop caches makes the
problem temporarily go away, but eventually it comes back.  The issue
sounds a lot like bug #838784, but we are using xfs on the backend,
and this seems like more of a client issue.


Turning on cluster.readdir-optimize can help readdir when a directory 
contains a number of sub-directories and there are more bricks in the 
volume. Do you observe any change with this option enabled?


-Vijay


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-14 Thread John Brunelle
Thanks for the reply, Vijay.  I set that parameter On, but it hasn't
helped, and in fact it seems a bit worse.  After making the change on
the volume and dropping caches on some test clients, some are now
seeing zero subdirectories at all.  In my tests before, after dropping
caches clients go back to seeing all the subdirectories, and it's only
after a while they start disappearing (and have never gone to zero
before).

Any other ideas?

Thanks,

John

On Fri, Jun 14, 2013 at 10:35 AM, Vijay Bellur vbel...@redhat.com wrote:
 On 06/13/2013 03:38 PM, John Brunelle wrote:

 Hello,

 We're having an issue with our distributed gluster filesystem:

 * gluster 3.3.1 servers and clients
 * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes
 * xfs backend
 * nfs clients
 * nfs.enable-ino32: On

 * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64
 * cleints: CentOS 5.7, 2.6.18-274.12.1.el5

 We have a directory containing 3,343 subdirectories.  On some clients,
 ls lists only a subset of the directories (a different amount on
 different clients).  On others, ls gets stuck in a getdents loop and
 consumes more and more memory until it hits ENOMEM.  On yet others, it
 works fine.  Having the bad clients remount or drop caches makes the
 problem temporarily go away, but eventually it comes back.  The issue
 sounds a lot like bug #838784, but we are using xfs on the backend,
 and this seems like more of a client issue.


 Turning on cluster.readdir-optimize can help readdir when a directory
 contains a number of sub-directories and there are more bricks in the
 volume. Do you observe any change with this option enabled?

 -Vijay


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-14 Thread Jeff Darcy
On 06/13/2013 03:38 PM, John Brunelle wrote:
 We have a directory containing 3,343 subdirectories.  On some
 clients, ls lists only a subset of the directories (a different
 amount on different clients).  On others, ls gets stuck in a getdents
 loop and consumes more and more memory until it hits ENOMEM.  On yet
 others, it works fine.  Having the bad clients remount or drop caches
 makes the problem temporarily go away, but eventually it comes back.
 The issue sounds a lot like bug #838784, but we are using xfs on the
 backend, and this seems like more of a client issue.

The fact that drop_caches makes it go away temporarily suggests to me
that something's going on in FUSE.  The reference to #838784 might also
be significant even though you're not using ext4.  Even the fix for that
still makes some assumptions about how certain directory-entry fields
are used and might still be sensitive to changes in that usage by the
local FS or by FUSE.  That might explain both skipping and looping, as
you say you've seen.  Would it be possible for you to compile and run
the attached program on one of the affected directories so we can see
what d_off values are involved?

 But we are also getting some page allocation failures on the server 
 side, e.g. the stack strace below.  These are nearly identical to
 bug #842206 and bug #767127.  I'm trying to sort out if these are
 related to the above issue or just recoverable nic driver GFP_ATOMIC
 kmalloc failures as suggested in the comments.  Slab allocations for
 dentry, xfs_inode, fuse_inode, fuse_request, etc. are all at ~100%
 active, and the total number appears to be monotonically growing.
 Overall memory looks healthy (2/3 is buffers/cache, almost no swap is
 used).  I'd need some help to determine if the memory is overly
 fragmented or not, but looking at pagetypeinfo and zoneinfo It
 doesn't appear so to me, and the failures are order:1 anyways.

This one definitely seems like one of those innocent victim kind of
things where the real problem is in the network code and we just happen
to be the app that's running.

#include stdio.h
#include sys/types.h
#include dirent.h

int
main (int argc, char **argv)
{
DIR *fd;
struct dirent   *ent;
int counter = 0;

fd = opendir(argv[1]);
if (!fd) {
perror(opendir);
return !0;
}

for (;;) {
ent = readdir(fd);
if (!ent) {
break;
}
printf (0x%016llx %d %.*s\n, ent-d_off, ent-d_type,
sizeof(ent-d_name), ent-d_name);
if (++counter  10) {
break;
}
}

return 0;
}
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-14 Thread Anand Avati
On Fri, Jun 14, 2013 at 10:04 AM, John Brunelle
john_brune...@harvard.eduwrote:

 Thanks, Jeff!  I ran readdir.c on all 23 bricks on the gluster nfs
 server to which my test clients are connected (one client that's
 working, and one that's not; and I ran on those, too).  The results
 are attached.

 The values it prints are all well within 32 bits, *except* for one
 that's suspiciously the max 32-bit signed int:

 $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail
 0xfd59
 0xfd6b
 0xfd7d
 0xfd8f
 0xfda1
 0xfdb3
 0xfdc5
 0xfdd7
 0xfde8
 0x7fff

 That outlier is the same subdirectory on all 23 bricks.  Could this be
 the issue?

 Thanks,

 John



0x7 is the EOF marker. You should find that as last entry in
_every_ directory.

Avati
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-14 Thread John Brunelle
Ah, I did not know that about 0x7.  Is it of note that the
clients do *not* get this?

This is on an NFS mount, and the volume has nfs.enable-ino32 On.  (I
should've pointed that out again when Jeff mentioned FUSE.)

Side note -- we do have a couple FUSE mounts, too, and I had not seen
this issue on any of them before, but when I checked now, zero
subdirectories were listed on some.  Since I had only seen this on NFS
clients after setting cluster.readdir-optimize On, I have now set that
back Off.  FUSE mounts are now behaving fine again.

Thanks,

John

On Fri, Jun 14, 2013 at 2:17 PM, Anand Avati anand.av...@gmail.com wrote:
 Are the ls commands (which list partially, or loop and die of ENOMEM
 eventually) executed on an NFS mount or FUSE mount? Or does it happen on
 both?

 Avati


 On Fri, Jun 14, 2013 at 11:14 AM, Anand Avati anand.av...@gmail.com wrote:




 On Fri, Jun 14, 2013 at 10:04 AM, John Brunelle
 john_brune...@harvard.edu wrote:

 Thanks, Jeff!  I ran readdir.c on all 23 bricks on the gluster nfs
 server to which my test clients are connected (one client that's
 working, and one that's not; and I ran on those, too).  The results
 are attached.

 The values it prints are all well within 32 bits, *except* for one
 that's suspiciously the max 32-bit signed int:

 $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail
 0xfd59
 0xfd6b
 0xfd7d
 0xfd8f
 0xfda1
 0xfdb3
 0xfdc5
 0xfdd7
 0xfde8
 0x7fff

 That outlier is the same subdirectory on all 23 bricks.  Could this be
 the issue?

 Thanks,

 John



 0x7 is the EOF marker. You should find that as last entry in
 _every_ directory.

 Avati


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-14 Thread Jeff Darcy
On 06/14/2013 01:04 PM, John Brunelle wrote:
 Thanks, Jeff!  I ran readdir.c on all 23 bricks on the gluster nfs
 server to which my test clients are connected (one client that's
 working, and one that's not; and I ran on those, too).  The results
 are attached.
 
 The values it prints are all well within 32 bits, *except* for one
 that's suspiciously the max 32-bit signed int:
 
 $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail
 0xfd59
 0xfd6b
 0xfd7d
 0xfd8f
 0xfda1
 0xfdb3
 0xfdc5
 0xfdd7
 0xfde8
 0x7fff
 
 That outlier is the same subdirectory on all 23 bricks.  Could this be
 the issue?

As Avati points out, that's the EOF marker so it's fine.  It might be
interesting to run this on the client as well, to see how those values
relate to those on the bricks - especially at the point where you see
skipping or looping.  You might also want to modify it to print out
d_name as well, so that you can see the file names and correlate things
more easily.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory

2013-06-13 Thread John Brunelle
Hello,

We're having an issue with our distributed gluster filesystem:

* gluster 3.3.1 servers and clients
* distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes
* xfs backend
* nfs clients
* nfs.enable-ino32: On

* servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64
* cleints: CentOS 5.7, 2.6.18-274.12.1.el5

We have a directory containing 3,343 subdirectories.  On some clients,
ls lists only a subset of the directories (a different amount on
different clients).  On others, ls gets stuck in a getdents loop and
consumes more and more memory until it hits ENOMEM.  On yet others, it
works fine.  Having the bad clients remount or drop caches makes the
problem temporarily go away, but eventually it comes back.  The issue
sounds a lot like bug #838784, but we are using xfs on the backend,
and this seems like more of a client issue.

But we are also getting some page allocation failures on the server
side, e.g. the stack strace below.  These are nearly identical to bug
#842206 and bug #767127.  I'm trying to sort out if these are related
to the above issue or just recoverable nic driver GFP_ATOMIC kmalloc
failures as suggested in the comments.  Slab allocations for dentry,
xfs_inode, fuse_inode, fuse_request, etc. are all at ~100% active, and
the total number appears to be monotonically growing.  Overall memory
looks healthy (2/3 is buffers/cache, almost no swap is used).  I'd
need some help to determine if the memory is overly fragmented or not,
but looking at pagetypeinfo and zoneinfo It doesn't appear so to me,
and the failures are order:1 anyways.

Any suggestions for what might be the problem here?

Thanks,

John

Jun 13 09:41:18 myhost kernel: glusterfsd: page allocation failure.
order:1, mode:0x20
Jun 13 09:41:18 myhost kernel: Pid: 20498, comm: glusterfsd Not
tainted 2.6.32-279.14.1.el6.centos.plus.x86_64 #1
Jun 13 09:41:18 myhost kernel: Call Trace:
Jun 13 09:41:18 myhost kernel: IRQ  [8112790f] ?
__alloc_pages_nodemask+0x77f/0x940
Jun 13 09:41:18 myhost kernel: [81162382] ? kmem_getpages+0x62/0x170
Jun 13 09:41:18 myhost kernel: [81162f9a] ? fallback_alloc+0x1ba/0x270
Jun 13 09:41:18 myhost kernel: [811629ef] ? cache_grow+0x2cf/0x320
Jun 13 09:41:18 myhost kernel: [81162d19] ?
cache_alloc_node+0x99/0x160
Jun 13 09:41:18 myhost kernel: [81163afb] ?
kmem_cache_alloc+0x11b/0x190
Jun 13 09:41:18 myhost kernel: [81435298] ? sk_prot_alloc+0x48/0x1c0
Jun 13 09:41:18 myhost kernel: [81435562] ? sk_clone+0x22/0x2e0
Jun 13 09:41:18 myhost kernel: [814833a6] ? inet_csk_clone+0x16/0xd0
Jun 13 09:41:18 myhost kernel: [8149c383] ?
tcp_create_openreq_child+0x23/0x450
Jun 13 09:41:18 myhost kernel: [81499bed] ?
tcp_v4_syn_recv_sock+0x4d/0x310
Jun 13 09:41:18 myhost kernel: [8149c126] ? tcp_check_req+0x226/0x460
Jun 13 09:41:18 myhost kernel: [81437087] ? __kfree_skb+0x47/0xa0
Jun 13 09:41:18 myhost kernel: [8149960b] ? tcp_v4_do_rcv+0x35b/0x430
Jun 13 09:41:18 myhost kernel: [8149ae4e] ? tcp_v4_rcv+0x4fe/0x8d0
Jun 13 09:41:18 myhost kernel: [81432f6c] ? sk_reset_timer+0x1c/0x30
Jun 13 09:41:18 myhost kernel: [81478add] ?
ip_local_deliver_finish+0xdd/0x2d0
Jun 13 09:41:18 myhost kernel: [81478d68] ? ip_local_deliver+0x98/0xa0
Jun 13 09:41:18 myhost kernel: [8147822d] ? ip_rcv_finish+0x12d/0x440
Jun 13 09:41:18 myhost kernel: [814787b5] ? ip_rcv+0x275/0x350
Jun 13 09:41:18 myhost kernel: [81441deb] ?
__netif_receive_skb+0x49b/0x6f0
Jun 13 09:41:18 myhost kernel: [8149813a] ? tcp4_gro_receive+0x5a/0xd0
Jun 13 09:41:18 myhost kernel: [81444068] ?
netif_receive_skb+0x58/0x60
Jun 13 09:41:18 myhost kernel: [81444170] ? napi_skb_finish+0x50/0x70
Jun 13 09:41:18 myhost kernel: [814466a9] ? napi_gro_receive+0x39/0x50
Jun 13 09:41:18 myhost kernel: [a01303b4] ? igb_poll+0x864/0xb00 [igb]
Jun 13 09:41:18 myhost kernel: [810606ec] ?
rebalance_domains+0x3cc/0x5a0
Jun 13 09:41:18 myhost kernel: [814467c3] ? net_rx_action+0x103/0x2f0
Jun 13 09:41:18 myhost kernel: [81096523] ?
hrtimer_get_next_event+0xc3/0x100
Jun 13 09:41:18 myhost kernel: [81073f61] ? __do_softirq+0xc1/0x1e0
Jun 13 09:41:18 myhost kernel: [810dbb70] ?
handle_IRQ_event+0x60/0x170
Jun 13 09:41:18 myhost kernel: [8100c24c] ? call_softirq+0x1c/0x30
Jun 13 09:41:18 myhost kernel: [8100de85] ? do_softirq+0x65/0xa0
Jun 13 09:41:18 myhost kernel: [81073d45] ? irq_exit+0x85/0x90
Jun 13 09:41:18 myhost kernel: [8150d505] ? do_IRQ+0x75/0xf0
Jun 13 09:41:18 myhost kernel: [8100ba53] ? ret_from_intr+0x0/0x11
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users