Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On Monday 21 July 2014 13:14:46 Jeff Darcy wrote: Perhaps it's time to revisit the idea of making assumptions about d_off values and twiddling them back and forth, vs. maintaining a precise mapping between our values and local-FS values. http://review.gluster.org/#/c/4675/ That patch is old and probably incomplete, but at the time it worked just as well as the one that led us into the current situation. I think directory handling has a lot of issues, not only the problem of big offsets. The most important will be scalability when the number of bricks will be greater. Maybe we should try to find a better solution to address all these problems at once. One possible solution is to convert directories into files managed by storage/posix (some changes will also be required in dht and afr probably). We will have full control about the format of this file, so we'll be able to use the directory offset that we want to avoid interferences with upper xlators in readdir(p) calls. This will also allow to optimize directory accesses and even minimize or solve the problem of renames. Additionally, this will give the same reliability to directories that files have (replicated or dispersed). Obviously this is an important architectural change on the brick level, but I think its benefits are worth it. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
One possible solution is to convert directories into files managed by storage/posix (some changes will also be required in dht and afr probably). We will have full control about the format of this file, so we'll be able to use the directory offset that we want to avoid interferences with upper xlators in readdir(p) calls. This will also allow to optimize directory accesses and even minimize or solve the problem of renames. Unfortunately, most of the problems with renames involve multiple directories and/or multiple bricks, so changing how we store directory information within a brick won't solve those particular problems. Additionally, this will give the same reliability to directories that files have (replicated or dispersed). If it's within storage/posix then it's well below either replication or dispersal. I think there's the kernel of a good idea here, but it's going to require changes to multiple components (and how they relate to one another). ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Is this on latest upstream? Pranith If so, where should I send them (size is 2*12*31MB [.tar] + 220kB [pcap]) /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 2014-07-21 13:36, Pranith Kumar Karampuri wrote: On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Yes Is this on latest upstream? kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't know. gluster is from master as of end of last week If there are known issues with ext4 i could switch to something else, but during the last 15 years or so, I have had very little problems with ext2/3/4, thats the reason for choosing it. /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/21/2014 05:17 PM, Anders Blomdell wrote: On 2014-07-21 13:36, Pranith Kumar Karampuri wrote: On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Yes Is this on latest upstream? kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't know. gluster is from master as of end of last week If there are known issues with ext4 i could switch to something else, but during the last 15 years or so, I have had very little problems with ext2/3/4, thats the reason for choosing it. The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier were working on it last I heard(CCed) Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 2014-07-21 13:49, Pranith Kumar Karampuri wrote: On 07/21/2014 05:17 PM, Anders Blomdell wrote: On 2014-07-21 13:36, Pranith Kumar Karampuri wrote: On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Yes Is this on latest upstream? kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't know. gluster is from master as of end of last week If there are known issues with ext4 i could switch to something else, but during the last 15 years or so, I have had very little problems with ext2/3/4, thats the reason for choosing it. The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier were working on it last I heard(CCed) Should I switch to xfs or be guinea pig for testing a fixed version? /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On Monday 21 July 2014 13:53:19 Anders Blomdell wrote: On 2014-07-21 13:49, Pranith Kumar Karampuri wrote: On 07/21/2014 05:17 PM, Anders Blomdell wrote: On 2014-07-21 13:36, Pranith Kumar Karampuri wrote: On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Yes Is this on latest upstream? kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't know. gluster is from master as of end of last week If there are known issues with ext4 i could switch to something else, but during the last 15 years or so, I have had very little problems with ext2/3/4, thats the reason for choosing it. The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier were working on it last I heard(CCed) Should I switch to xfs or be guinea pig for testing a fixed version? There is a patch for this [1]. It should work for this particular configuration, but there are some limitations in the general case, specially for future scalability, that we tried to solve but it seems quite difficult. Maybe Soumya has newer information about that. XFS should work without problems if you need it. Xavi [1] http://review.gluster.org/8201/ ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/21/2014 07:33 PM, Anders Blomdell wrote: On 2014-07-21 14:36, Soumya Koduri wrote: On 07/21/2014 05:35 PM, Xavier Hernandez wrote: On Monday 21 July 2014 13:53:19 Anders Blomdell wrote: On 2014-07-21 13:49, Pranith Kumar Karampuri wrote: On 07/21/2014 05:17 PM, Anders Blomdell wrote: On 2014-07-21 13:36, Pranith Kumar Karampuri wrote: On 07/21/2014 05:03 PM, Anders Blomdell wrote: On 2014-07-19 04:43, Pranith Kumar Karampuri wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Would a pcap network dump + the result from 'tar -c --xattrs /brick/a/gluster' on all the hosts before and after the following commands are run be of any help: # mount -t glusterfs gluster-host:/test /mnt/gluster # mkdir /mnt/gluster/work2 ; # ls /mnt/gluster work2 work2 Are you using ext4? Yes Is this on latest upstream? kernel is 3.14.9-200.fc20.x86_64, if that is latest upstream, I don't know. gluster is from master as of end of last week If there are known issues with ext4 i could switch to something else, but during the last 15 years or so, I have had very little problems with ext2/3/4, thats the reason for choosing it. The problem is afrv2 + dht + ext4 offsets. Soumya and Xavier were working on it last I heard(CCed) Should I switch to xfs or be guinea pig for testing a fixed version? There is a patch for this [1]. It should work for this particular configuration, but there are some limitations in the general case, specially for future scalability, that we tried to solve but it seems quite difficult. Maybe Soumya has newer information about that. XFS should work without problems if you need it. As long as it does not start using 64-bit offsets as well :-) Sounds like I should go for XFS right now? Tell me if you need testers. Sure. yes :) XFS doesn't have this issue. It still seems to use 32-bit offset. Thats right. This patch works fine with the current supported/limited configuration. But we need a much more generalized approach or maybe a design change as Xavi had suggested to make it more scalable. Is that the patch in [1] you are referring to? yes [1] is a possible solution for the current issue. This change is still under review. The problem in short -- 'ext4' uses large offsets/the bits which even GlusterFS may need to store subvol id along with the offset. This could be end up in few offsets being modified when given back to the filesystem resulting in missing files etc. Avati has proposed a solution to overcome this issue based on the assumption that both EXT4/XFS are tolerant in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the closest true offset. For more info, please check http://review.gluster.org/#/c/4711/. This is AFAICT already in the version that failed, as commit e0616e9314c8323dc59fca7cad6972f08d72b936 That's right. This change was done by Anand Avati in the dht translator and it works as expected had AFR not come into picture. When the same change was done in the AFR(v2) translator, it resulted in the loss of brick-id. [1] is a potential fix for now. Had to change the transform-logic in these two translators. But as Xavi had mentioned, our goal is to come up with a solution, to make it uniform across all the translators without any loss of subvol-id and keep the offset gaps to the minimal. But this offset gap widens as and when more translators (which need to store subvol-id) get added to the gluster stack which may eventually result in the similar issue which you are facing now. Thanks, Soumya Xavi [1] http://review.gluster.org/8201/ Thanks! /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 2014-07-21 19:14, Jeff Darcy wrote: But this offset gap widens as and when more translators (which need to store subvol-id) get added to the gluster stack which may eventually result in the similar issue which you are facing now. Perhaps it's time to revisit the idea of making assumptions about d_off values +1 :-) and twiddling them back and forth, vs. maintaining a precise mapping between our values and local-FS values. http://review.gluster.org/#/c/4675/ That patch is old and probably incomplete, but at the time it worked just as well as the one that led us into the current situation. Seems a lot sounder than: However both these filesystmes (EXT4 more importantly) are tolerant in terms of the accuracy of the value presented back in seekdir(). i.e, a seekdir(val) actually seeks to the entry which has the closest true offset. Let me know if you revisit this this one. Thanks Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty This is reproducible for me when running dbench on nfs mounts. I think I may have seen it on glusterfs mounts as well but it seems more reproducible on nfs. I should have caught it sooner but it doesn't error out client side when cleaning up, and the next test I run the deletes are successful. When this happens in the nfs.log I see: This spams the log, from what I can tell it happens when dbench is creating the files: [2014-07-19 13:37:03.271651] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], Then when the deletes fail I see the following when the client is removing the files: [2014-07-18 23:31:44.272465] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 74a6541a: /run8063_dbench/clients = -1 (Directory not empty) . . [2014-07-18 23:31:44.452988] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7ea9541a: /run8063_dbench/clients = -1 (Directory not empty) [2014-07-18 23:31:45.262651] W [client-rpc-fops.c:1354:client3_3_access_cbk] 0-testvol-client-0: remote operation failed: Stale file handle [2014-07-18 23:31:45.263151] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-testvol-replicate-0: Unreadable subvolume -1 found with e vent generation 2. (Possible split-brain) [2014-07-18 23:31:45.264196] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 32ac541a: gfid:b073a189-91ea-46b2-b757-5b320591b848 = -1 (Stale fi le handle) [2014-07-18 23:31:45.264217] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 32ac541a, ACCESS: NFS: 70(Invalid file handle), P OSIX: 116(Stale file handle) [2014-07-18 23:31:45.266818] W [nfs3.c:1532:nfs3svc_access_cbk] 0-nfs: 33ac541a: gfid:b073a189-91ea-46b2-b757-5b320591b848 = -1 (Stale fi le handle) [2014-07-18 23:31:45.266853] W [nfs3-helpers.c:3401:nfs3_log_common_res] 0-nfs-nfsv3: XID: 33ac541a, ACCESS: NFS: 70(Invalid file handle), P OSIX: 116(Stale file handle) Occasionally I see: [2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available) [2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this I'm opening a BZ now, I'll leave systems up and put the repro steps + hostnames in the BZ in case anyone wants to poke around. -b 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. I have not seen this but I am running on a 6x2 volume. I wonder if this may only happen with replica 2? Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On Sat, Jul 19, 2014 at 10:02:33AM -0400, Benjamin Turner wrote: On Fri, Jul 18, 2014 at 10:43 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty This is reproducible for me when running dbench on nfs mounts. I think I may have seen it on glusterfs mounts as well but it seems more reproducible on nfs. I should have caught it sooner but it doesn't error out client side when cleaning up, and the next test I run the deletes are successful. When this happens in the nfs.log I see: This spams the log, from what I can tell it happens when dbench is creating the files: [2014-07-19 13:37:03.271651] I [MSGID: 109036] [dht-common.c:5694:dht_log_new_layout_for_dir_selfheal] 0-testvol-dht: Setting layout of /clients/client3/~dmtmp/SEED with [Subvol_name: testvol-replicate-0, Err: -1 , Start: 2147483647 , Stop: 4294967295 ], [Subvol_name: testvol-replicate-1, Err: -1 , Start: 0 , Stop: 2147483646 ], My guess is that DHT/AFR fail to create the whole directory structure on all bricks (remember that directories should get created on all bricks, even for a dht only volume). If creating a directory fails on a particular brick, self-heal should pick it up... But well, maybe self-heal is not run when deleting directories, causing some directories on some bricks to be non-empty, but empty on others. It may be that this conflict is not handled correctly. You could maybe test with different volumes, and narrow down where the issue occurs: - a volume of one brick - a replicate volume with two bricks - a distribute volume with two bricks Potentially increase the number of bricks when a 2-brick afr-only, or dht-only volume does not trigger the issue reliably or quickly. Occasionally I see: [2014-07-19 13:50:46.091429] W [socket.c:529:__socket_rwv] 0-NLM-client: readv on 192.168.11.102:45823 failed (No data available) [2014-07-19 13:50:46.091570] E [rpc-transport.c:485:rpc_transport_unref] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_rpcclnt_notify+0x5a) [0x7f53775128ea] (--/usr/lib64/glusterfs/3.5qa2/xlator/nfs/server.so(nlm_unset_rpc_clnt+0x75) [0x7f537750e3e5] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_unref+0x63) [0x7f5388914693]))) 0-rpc_transport: invalid argument: this This looks like a bug in the NFS-server, I suggest to file is independently from the directory tree create/delete issue. I'm opening a BZ now, I'll leave systems up and put the repro steps + hostnames in the BZ in case anyone wants to poke around. Thanks! The NFS problem does not need any checking on the running system. Niels -b 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. I have not seen this but I am running on a 6x2 volume. I wonder if this may only happen with replica 2? Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? /Anders Anders, Check Gluster log files present in /var/log/glusterfs. Specifically glusterd logfile i.e. /var/log/glusterfs/etc-glusterfs-glusterd.vol.log. You can also start glusterd with debug mode i.e. $glusterd -L DEBUG and check the log files for more information. Thanks, Lala ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Duplicate entries and other weirdness in a 3*4 volume
On 07/18/2014 07:57 PM, Anders Blomdell wrote: During testing of a 3*4 gluster (from master as of yesterday), I encountered two major weirdnesses: 1. A 'rm -rf some_dir' needed several invocations to finish, each time reporting a number of lines like these: rm: cannot remove ‘a/b/c/d/e/f’: Directory not empty 2. After having successfully deleted all files from the volume, i have a single directory that is duplicated in gluster-fuse, like this: # ls -l /mnt/gluster total 24 drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ drwxr-xr-x 2 root root 12288 18 jul 16.17 work2/ any idea on how to debug this issue? What are the steps to recreate? We need to first find what lead to this. Then probably which xlator leads to this. Pranith /Anders ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel