Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
On 06/14/2013 01:04 PM, John Brunelle wrote: > Thanks, Jeff! I ran readdir.c on all 23 bricks on the gluster nfs > server to which my test clients are connected (one client that's > working, and one that's not; and I ran on those, too). The results > are attached. > > The values it prints are all well within 32 bits, *except* for one > that's suspiciously the max 32-bit signed int: > > $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail > 0xfd59 > 0xfd6b > 0xfd7d > 0xfd8f > 0xfda1 > 0xfdb3 > 0xfdc5 > 0xfdd7 > 0xfde8 > 0x7fff > > That outlier is the same subdirectory on all 23 bricks. Could this be > the issue? As Avati points out, that's the EOF marker so it's fine. It might be interesting to run this on the client as well, to see how those values relate to those on the bricks - especially at the point where you see skipping or looping. You might also want to modify it to print out d_name as well, so that you can see the file names and correlate things more easily. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
Ah, I did not know that about 0x7. Is it of note that the clients do *not* get this? This is on an NFS mount, and the volume has nfs.enable-ino32 On. (I should've pointed that out again when Jeff mentioned FUSE.) Side note -- we do have a couple FUSE mounts, too, and I had not seen this issue on any of them before, but when I checked now, zero subdirectories were listed on some. Since I had only seen this on NFS clients after setting cluster.readdir-optimize On, I have now set that back Off. FUSE mounts are now behaving fine again. Thanks, John On Fri, Jun 14, 2013 at 2:17 PM, Anand Avati wrote: > Are the ls commands (which list partially, or loop and die of ENOMEM > eventually) executed on an NFS mount or FUSE mount? Or does it happen on > both? > > Avati > > > On Fri, Jun 14, 2013 at 11:14 AM, Anand Avati wrote: >> >> >> >> >> On Fri, Jun 14, 2013 at 10:04 AM, John Brunelle >> wrote: >>> >>> Thanks, Jeff! I ran readdir.c on all 23 bricks on the gluster nfs >>> server to which my test clients are connected (one client that's >>> working, and one that's not; and I ran on those, too). The results >>> are attached. >>> >>> The values it prints are all well within 32 bits, *except* for one >>> that's suspiciously the max 32-bit signed int: >>> >>> $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail >>> 0xfd59 >>> 0xfd6b >>> 0xfd7d >>> 0xfd8f >>> 0xfda1 >>> 0xfdb3 >>> 0xfdc5 >>> 0xfdd7 >>> 0xfde8 >>> 0x7fff >>> >>> That outlier is the same subdirectory on all 23 bricks. Could this be >>> the issue? >>> >>> Thanks, >>> >>> John >> >> >> >> 0x7 is the EOF marker. You should find that as last entry in >> _every_ directory. >> >> Avati > > ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
Are the ls commands (which list partially, or loop and die of ENOMEM eventually) executed on an NFS mount or FUSE mount? Or does it happen on both? Avati On Fri, Jun 14, 2013 at 11:14 AM, Anand Avati wrote: > > > > On Fri, Jun 14, 2013 at 10:04 AM, John Brunelle > wrote: > >> Thanks, Jeff! I ran readdir.c on all 23 bricks on the gluster nfs >> server to which my test clients are connected (one client that's >> working, and one that's not; and I ran on those, too). The results >> are attached. >> >> The values it prints are all well within 32 bits, *except* for one >> that's suspiciously the max 32-bit signed int: >> >> $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail >> 0xfd59 >> 0xfd6b >> 0xfd7d >> 0xfd8f >> 0xfda1 >> 0xfdb3 >> 0xfdc5 >> 0xfdd7 >> 0xfde8 >> 0x7fff >> >> That outlier is the same subdirectory on all 23 bricks. Could this be >> the issue? >> >> Thanks, >> >> John > > > > 0x7 is the EOF marker. You should find that as last entry in > _every_ directory. > > Avati > ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
On Fri, Jun 14, 2013 at 10:04 AM, John Brunelle wrote: > Thanks, Jeff! I ran readdir.c on all 23 bricks on the gluster nfs > server to which my test clients are connected (one client that's > working, and one that's not; and I ran on those, too). The results > are attached. > > The values it prints are all well within 32 bits, *except* for one > that's suspiciously the max 32-bit signed int: > > $ cat readdir.out.* | awk '{print $1}' | sort | uniq | tail > 0xfd59 > 0xfd6b > 0xfd7d > 0xfd8f > 0xfda1 > 0xfdb3 > 0xfdc5 > 0xfdd7 > 0xfde8 > 0x7fff > > That outlier is the same subdirectory on all 23 bricks. Could this be > the issue? > > Thanks, > > John 0x7 is the EOF marker. You should find that as last entry in _every_ directory. Avati ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
On 06/13/2013 03:38 PM, John Brunelle wrote: > We have a directory containing 3,343 subdirectories. On some > clients, ls lists only a subset of the directories (a different > amount on different clients). On others, ls gets stuck in a getdents > loop and consumes more and more memory until it hits ENOMEM. On yet > others, it works fine. Having the bad clients remount or drop caches > makes the problem temporarily go away, but eventually it comes back. > The issue sounds a lot like bug #838784, but we are using xfs on the > backend, and this seems like more of a client issue. The fact that drop_caches makes it go away temporarily suggests to me that something's going on in FUSE. The reference to #838784 might also be significant even though you're not using ext4. Even the fix for that still makes some assumptions about how certain directory-entry fields are used and might still be sensitive to changes in that usage by the local FS or by FUSE. That might explain both skipping and looping, as you say you've seen. Would it be possible for you to compile and run the attached program on one of the affected directories so we can see what d_off values are involved? > But we are also getting some page allocation failures on the server > side, e.g. the stack strace below. These are nearly identical to > bug #842206 and bug #767127. I'm trying to sort out if these are > related to the above issue or just recoverable nic driver GFP_ATOMIC > kmalloc failures as suggested in the comments. Slab allocations for > dentry, xfs_inode, fuse_inode, fuse_request, etc. are all at ~100% > active, and the total number appears to be monotonically growing. > Overall memory looks healthy (2/3 is buffers/cache, almost no swap is > used). I'd need some help to determine if the memory is overly > fragmented or not, but looking at pagetypeinfo and zoneinfo It > doesn't appear so to me, and the failures are order:1 anyways. This one definitely seems like one of those "innocent victim" kind of things where the real problem is in the network code and we just happen to be the app that's running. #include #include #include int main (int argc, char **argv) { DIR *fd; struct dirent *ent; int counter = 0; fd = opendir(argv[1]); if (!fd) { perror("opendir"); return !0; } for (;;) { ent = readdir(fd); if (!ent) { break; } printf ("0x%016llx %d %.*s\n", ent->d_off, ent->d_type, sizeof(ent->d_name), ent->d_name); if (++counter > 10) { break; } } return 0; } ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
Thanks for the reply, Vijay. I set that parameter "On", but it hasn't helped, and in fact it seems a bit worse. After making the change on the volume and dropping caches on some test clients, some are now seeing zero subdirectories at all. In my tests before, after dropping caches clients go back to seeing all the subdirectories, and it's only after a while they start disappearing (and have never gone to zero before). Any other ideas? Thanks, John On Fri, Jun 14, 2013 at 10:35 AM, Vijay Bellur wrote: > On 06/13/2013 03:38 PM, John Brunelle wrote: >> >> Hello, >> >> We're having an issue with our distributed gluster filesystem: >> >> * gluster 3.3.1 servers and clients >> * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes >> * xfs backend >> * nfs clients >> * nfs.enable-ino32: On >> >> * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64 >> * cleints: CentOS 5.7, 2.6.18-274.12.1.el5 >> >> We have a directory containing 3,343 subdirectories. On some clients, >> ls lists only a subset of the directories (a different amount on >> different clients). On others, ls gets stuck in a getdents loop and >> consumes more and more memory until it hits ENOMEM. On yet others, it >> works fine. Having the bad clients remount or drop caches makes the >> problem temporarily go away, but eventually it comes back. The issue >> sounds a lot like bug #838784, but we are using xfs on the backend, >> and this seems like more of a client issue. > > > Turning on "cluster.readdir-optimize" can help readdir when a directory > contains a number of sub-directories and there are more bricks in the > volume. Do you observe any change with this option enabled? > > -Vijay > > ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
On 06/13/2013 03:38 PM, John Brunelle wrote: Hello, We're having an issue with our distributed gluster filesystem: * gluster 3.3.1 servers and clients * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes * xfs backend * nfs clients * nfs.enable-ino32: On * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64 * cleints: CentOS 5.7, 2.6.18-274.12.1.el5 We have a directory containing 3,343 subdirectories. On some clients, ls lists only a subset of the directories (a different amount on different clients). On others, ls gets stuck in a getdents loop and consumes more and more memory until it hits ENOMEM. On yet others, it works fine. Having the bad clients remount or drop caches makes the problem temporarily go away, but eventually it comes back. The issue sounds a lot like bug #838784, but we are using xfs on the backend, and this seems like more of a client issue. Turning on "cluster.readdir-optimize" can help readdir when a directory contains a number of sub-directories and there are more bricks in the volume. Do you observe any change with this option enabled? -Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] incomplete listing of a directory, sometimes getdents loops until out of memory
Hello, We're having an issue with our distributed gluster filesystem: * gluster 3.3.1 servers and clients * distributed volume -- 69 bricks (4.6T each) split evenly across 3 nodes * xfs backend * nfs clients * nfs.enable-ino32: On * servers: CentOS 6.3, 2.6.32-279.14.1.el6.centos.plus.x86_64 * cleints: CentOS 5.7, 2.6.18-274.12.1.el5 We have a directory containing 3,343 subdirectories. On some clients, ls lists only a subset of the directories (a different amount on different clients). On others, ls gets stuck in a getdents loop and consumes more and more memory until it hits ENOMEM. On yet others, it works fine. Having the bad clients remount or drop caches makes the problem temporarily go away, but eventually it comes back. The issue sounds a lot like bug #838784, but we are using xfs on the backend, and this seems like more of a client issue. But we are also getting some page allocation failures on the server side, e.g. the stack strace below. These are nearly identical to bug #842206 and bug #767127. I'm trying to sort out if these are related to the above issue or just recoverable nic driver GFP_ATOMIC kmalloc failures as suggested in the comments. Slab allocations for dentry, xfs_inode, fuse_inode, fuse_request, etc. are all at ~100% active, and the total number appears to be monotonically growing. Overall memory looks healthy (2/3 is buffers/cache, almost no swap is used). I'd need some help to determine if the memory is overly fragmented or not, but looking at pagetypeinfo and zoneinfo It doesn't appear so to me, and the failures are order:1 anyways. Any suggestions for what might be the problem here? Thanks, John Jun 13 09:41:18 myhost kernel: glusterfsd: page allocation failure. order:1, mode:0x20 Jun 13 09:41:18 myhost kernel: Pid: 20498, comm: glusterfsd Not tainted 2.6.32-279.14.1.el6.centos.plus.x86_64 #1 Jun 13 09:41:18 myhost kernel: Call Trace: Jun 13 09:41:18 myhost kernel: [] ? __alloc_pages_nodemask+0x77f/0x940 Jun 13 09:41:18 myhost kernel: [] ? kmem_getpages+0x62/0x170 Jun 13 09:41:18 myhost kernel: [] ? fallback_alloc+0x1ba/0x270 Jun 13 09:41:18 myhost kernel: [] ? cache_grow+0x2cf/0x320 Jun 13 09:41:18 myhost kernel: [] ? cache_alloc_node+0x99/0x160 Jun 13 09:41:18 myhost kernel: [] ? kmem_cache_alloc+0x11b/0x190 Jun 13 09:41:18 myhost kernel: [] ? sk_prot_alloc+0x48/0x1c0 Jun 13 09:41:18 myhost kernel: [] ? sk_clone+0x22/0x2e0 Jun 13 09:41:18 myhost kernel: [] ? inet_csk_clone+0x16/0xd0 Jun 13 09:41:18 myhost kernel: [] ? tcp_create_openreq_child+0x23/0x450 Jun 13 09:41:18 myhost kernel: [] ? tcp_v4_syn_recv_sock+0x4d/0x310 Jun 13 09:41:18 myhost kernel: [] ? tcp_check_req+0x226/0x460 Jun 13 09:41:18 myhost kernel: [] ? __kfree_skb+0x47/0xa0 Jun 13 09:41:18 myhost kernel: [] ? tcp_v4_do_rcv+0x35b/0x430 Jun 13 09:41:18 myhost kernel: [] ? tcp_v4_rcv+0x4fe/0x8d0 Jun 13 09:41:18 myhost kernel: [] ? sk_reset_timer+0x1c/0x30 Jun 13 09:41:18 myhost kernel: [] ? ip_local_deliver_finish+0xdd/0x2d0 Jun 13 09:41:18 myhost kernel: [] ? ip_local_deliver+0x98/0xa0 Jun 13 09:41:18 myhost kernel: [] ? ip_rcv_finish+0x12d/0x440 Jun 13 09:41:18 myhost kernel: [] ? ip_rcv+0x275/0x350 Jun 13 09:41:18 myhost kernel: [] ? __netif_receive_skb+0x49b/0x6f0 Jun 13 09:41:18 myhost kernel: [] ? tcp4_gro_receive+0x5a/0xd0 Jun 13 09:41:18 myhost kernel: [] ? netif_receive_skb+0x58/0x60 Jun 13 09:41:18 myhost kernel: [] ? napi_skb_finish+0x50/0x70 Jun 13 09:41:18 myhost kernel: [] ? napi_gro_receive+0x39/0x50 Jun 13 09:41:18 myhost kernel: [] ? igb_poll+0x864/0xb00 [igb] Jun 13 09:41:18 myhost kernel: [] ? rebalance_domains+0x3cc/0x5a0 Jun 13 09:41:18 myhost kernel: [] ? net_rx_action+0x103/0x2f0 Jun 13 09:41:18 myhost kernel: [] ? hrtimer_get_next_event+0xc3/0x100 Jun 13 09:41:18 myhost kernel: [] ? __do_softirq+0xc1/0x1e0 Jun 13 09:41:18 myhost kernel: [] ? handle_IRQ_event+0x60/0x170 Jun 13 09:41:18 myhost kernel: [] ? call_softirq+0x1c/0x30 Jun 13 09:41:18 myhost kernel: [] ? do_softirq+0x65/0xa0 Jun 13 09:41:18 myhost kernel: [] ? irq_exit+0x85/0x90 Jun 13 09:41:18 myhost kernel: [] ? do_IRQ+0x75/0xf0 Jun 13 09:41:18 myhost kernel: [] ? ret_from_intr+0x0/0x11 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users