Re: [Gluster-users] Structure needs cleaning on some files
I created a bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1041109 gr. Johan On 10-12-13 12:52, Johan Huysmans wrote: Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote: I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Structure needs cleaning on some files
I created a bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1041109 gr. Johan On 10-12-13 12:52, Johan Huysmans wrote: Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote: I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster Community Weekly Meeting
On 12/12/2013 10:42 AM, James wrote: RE: meeting, sorry I couldn't make it, but I have some comments: No problem. It would be really good to have everybody in the meeting, but if you cannot comments are definitely welcome :). 1) About the pre-packaged VM comment's. I've gotten Vagrant working on Fedora. I'm using this to rapidly spin up and test GlusterFS. https://ttboj.wordpress.com/2013/12/09/vagrant-on-fedora-with-libvirt/ In the coming week or so, I'll be publishing the Vagrant file for my GlusterFS setup, but if you really want it now I can send you an early version. This obviously integrates with Puppet-Gluster, but whether you use that or not is optional. I think this is the best way to test GlusterFS. If someone gives me hosting, I could publish pre-built images very easily. Let me know what you think. Niels - do you have any thoughts here? 2) I never heard back from any action items from 2 weeks ago. I think someone was going to connect me with a way to get access to some VM's for testing stuff ! I see that there is an ongoing offline thread now. I think that should result in you getting those VMs. 3) Hagarth: RE: typos, I have at least one spell check patch against 3.4.1 I sent it to list before, but someone told me to enroll in the jenkins thing, which wasn't worth it for a small patch. Let me know if you want it. There are more typos now. I ran a cursory check with misspell-check [1] and found quite a few. Having that cleaned up on master and release-3.5 would be great. Since the number is more, I am sure the patch would be non-trivial and having that routed through gerrit would be great. If you need a how to on getting to gerrit, it is available at [2]. 4a) Someone mentioned documentation. Please feel free to merge in https://github.com/purpleidea/puppet-gluster/blob/master/DOCUMENTATION.md (markdown format). I have gone to great lengths to format this so that it displays properly in github markdown, and standard (pandoc) markdown. This way it works on github, and can also be rendered to a pdf easily. Example: https://github.com/purpleidea/puppet-gluster/raw/master/puppet-gluster-documentation.pdf You can use the file as a template! Again having this in gerrit would be useful for merging the puppet documentation. 4b) I think the documentation should be kept in the same repo as GlusterFS. This way, when you submit a feature branch, it can also come with documentation. Lots of people work this way. It helps you get minimal docs there, and/or at least some example code or a few sentences. Also, looking at the docs, you can see what commits came with this I am with you on this one. After we are done with the planned documentation hackathon, let us open a new thread on this to get more opinions. -Vijay [1] https://github.com/lyda/misspell-check [2] http://www.gluster.org/community/documentation/index.php/Development_Work_Flow Thanks! James ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster Community Weekly Meeting
On Thu, Dec 12, 2013 at 1:43 PM, Vijay Bellur vbel...@redhat.com wrote: 4a) Someone mentioned documentation. Please feel free to merge in https://github.com/purpleidea/puppet-gluster/blob/master/DOCUMENTATION.md (markdown format). I have gone to great lengths to format this so that it displays properly in github markdown, and standard (pandoc) markdown. This way it works on github, and can also be rendered to a pdf easily. Example: https://github.com/purpleidea/puppet-gluster/raw/master/puppet-gluster-documentation.pdf You can use the file as a template! Again having this in gerrit would be useful for merging the puppet documentation. Okay, I'll try to look into Gerrit and maybe submit a fake patch for testing. When and where (in the tree) would be a good time to submit a doc patch? It's probably best to wait until after your docs hackathon, right? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster Community Weekly Meeting
On 12/13/2013 12:18 AM, James wrote: On Thu, Dec 12, 2013 at 1:43 PM, Vijay Bellur vbel...@redhat.com wrote: 4a) Someone mentioned documentation. Please feel free to merge in https://github.com/purpleidea/puppet-gluster/blob/master/DOCUMENTATION.md (markdown format). I have gone to great lengths to format this so that it displays properly in github markdown, and standard (pandoc) markdown. This way it works on github, and can also be rendered to a pdf easily. Example: https://github.com/purpleidea/puppet-gluster/raw/master/puppet-gluster-documentation.pdf You can use the file as a template! Again having this in gerrit would be useful for merging the puppet documentation. Okay, I'll try to look into Gerrit and maybe submit a fake patch for testing. When and where (in the tree) would be a good time to submit a doc patch? It's probably best to wait until after your docs hackathon, right? Just added a page in preparation for the documentation hackathon: http://www.gluster.org/community/documentation/index.php/Submitting_Documentation_Patches I think the puppet guide can be under a new hierarchy located at doc/deploy-guide/markdown/en-US/. You can certainly submit the puppet doc patch as part of the hackathon. -Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Structure needs cleaning on some files
How do you mount your Client? FUSE? I had similar problems when playing around with the timeout options for the FUSE mount. If they are too high they cache the metadata for too long. When you move the file the inode should stay the same and on the second node the path should stay in cache for a while so it still knows the inode for that moved files old path thus can act on the file without knowing it's path. The problems kick in when you delete a file and recreate it - the cache tries to access the old inode, which was deleted, thus throwing errors. If I recall correctly the structure needs cleaning is one of two error messages I got, depending on which of the timeout mount options was set to a higher value. -Original Mail- From: Johan Huysmans [johan.huysm...@inuits.be] Sent: 12.12.13 - 14:51:35 To: gluster-users@gluster.org [gluster-users@gluster.org] Subject: Re: [Gluster-users] Structure needs cleaning on some files I created a bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1041109 gr. Johan On 10-12-13 12:52, Johan Huysmans wrote: Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote: I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users smime.p7s Description: S/MIME cryptographic signature ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Structure needs cleaning on some files
I have the same question. Do you have excessively high --entry-timeout parameter to your FUSE mount? In any case, Structure needs cleaning error should not surface up to FUSE and that is still a bug. On Thu, Dec 12, 2013 at 12:46 PM, Maik Kulbe i...@linux-web-development.dewrote: How do you mount your Client? FUSE? I had similar problems when playing around with the timeout options for the FUSE mount. If they are too high they cache the metadata for too long. When you move the file the inode should stay the same and on the second node the path should stay in cache for a while so it still knows the inode for that moved files old path thus can act on the file without knowing it's path. The problems kick in when you delete a file and recreate it - the cache tries to access the old inode, which was deleted, thus throwing errors. If I recall correctly the structure needs cleaning is one of two error messages I got, depending on which of the timeout mount options was set to a higher value. -Original Mail- From: Johan Huysmans [johan.huysm...@inuits.be] Sent: 12.12.13 - 14:51:35 To: gluster-users@gluster.org [gluster-users@gluster.org] Subject: Re: [Gluster-users] Structure needs cleaning on some files I created a bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1041109 gr. Johan On 10-12-13 12:52, Johan Huysmans wrote: Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote: I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list
Re: [Gluster-users] Structure needs cleaning on some files
Looks like your issue was fixed by patch http://review.gluster.org/4989/ in master branch. Backporting this to release-3.4 now. Thanks! Avati On Thu, Dec 12, 2013 at 1:26 PM, Anand Avati av...@gluster.org wrote: I have the same question. Do you have excessively high --entry-timeout parameter to your FUSE mount? In any case, Structure needs cleaning error should not surface up to FUSE and that is still a bug. On Thu, Dec 12, 2013 at 12:46 PM, Maik Kulbe i...@linux-web-development.de wrote: How do you mount your Client? FUSE? I had similar problems when playing around with the timeout options for the FUSE mount. If they are too high they cache the metadata for too long. When you move the file the inode should stay the same and on the second node the path should stay in cache for a while so it still knows the inode for that moved files old path thus can act on the file without knowing it's path. The problems kick in when you delete a file and recreate it - the cache tries to access the old inode, which was deleted, thus throwing errors. If I recall correctly the structure needs cleaning is one of two error messages I got, depending on which of the timeout mount options was set to a higher value. -Original Mail- From: Johan Huysmans [johan.huysm...@inuits.be] Sent: 12.12.13 - 14:51:35 To: gluster-users@gluster.org [gluster-users@gluster.org] Subject: Re: [Gluster-users] Structure needs cleaning on some files I created a bug for this issue: https://bugzilla.redhat.com/show_bug.cgi?id=1041109 gr. Johan On 10-12-13 12:52, Johan Huysmans wrote: Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote: I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Gerrit doesn't use HTTPS
I just noticed that the Gluster Gerrit [1] doesn't use HTTPS! Can this be fixed ASAP? Cheers, James [1] http://review.gluster.org/ ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] gluster fails under heavy array job load load
Hi All, (Gluster Volume Details at bottom) I've posted some of this previously, but even after various upgrades, attempted fixes, etc, it remains a problem. Short version: Our gluster fs (~340TB) provides scratch space for a ~5000core academic compute cluster. Much of our load is streaming IO, doing a lot of genomics work, and that is the load under which we saw this latest failure. Under heavy batch load, especially array jobs, where there might be several 64core nodes doing I/O on the 4servers/8bricks, we often get job failures that have the following profile: Client POV: Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all compute nodes that indicated interaction with the user's files http://pastie.org/8548781 Here are some client Info logs that seem fairly serious: http://pastie.org/8548785 The errors that referenced this user were gathered from all the nodes that were running his code (in compute*) and agglomerated with: cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr and placed here to show the profile of errors that his run generated. http://pastie.org/8548796 so 71 of them were: W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote operation failed: Transport endpoint is not connected. etc We've seen this before and previously discounted it bc it seems to have been related to the problem of spurious NFS-related bugs, but now I'm wondering whether it's a real problem. Also the 'remote operation failed: Stale file handle. ' warnings. There were no Errors logged per se, tho some of the W's looked fairly nasty, like the 'dht_layout_dir_mismatch' From the server side, however, during the same period, there were: 0 Warnings about this user's files 0 Errors 458 Info lines of which only 1 line was not a 'cleanup' line like this: --- 10.2.7.11:[2013-12-12 21:22:01.064289] I [server-helpers.c:460:do_fd_cleanup] 0-gl-server: fd cleanup on /path/to/file --- it was: --- 10.2.7.14:[2013-12-12 21:00:35.209015] I [server-rpc- fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server: 113697332: SETXATTR /bio/tdlong/RNAseqIII/ckpt.1084030 (c9488341-c063-4175-8492-75e2e282f690) == trusted.glusterfs.dht --- We're losing about 10% of these kinds of array jobs bc of this, which is just not supportable. Gluster details servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2 Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4 $ gluster volume info Volume Name: gl Type: Distribute Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332 Status: Started Number of Bricks: 8 Transport-type: tcp,rdma Bricks: Brick1: bs2:/raid1 Brick2: bs2:/raid2 Brick3: bs3:/raid1 Brick4: bs3:/raid2 Brick5: bs4:/raid1 Brick6: bs4:/raid2 Brick7: bs1:/raid1 Brick8: bs1:/raid2 Options Reconfigured: performance.write-behind-window-size: 1024MB performance.flush-behind: on performance.cache-size: 268435456 nfs.disable: on performance.io-cache: on performance.quick-read: on performance.io-thread-count: 64 auth.allow: 10.2.*.*,10.1.*.* 'gluster volume status gl detail': http://pastie.org/8548826 --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] qemu remote insecure connections
Hello, I'm having a problem getting remote servers to connect to Gluster with qemu. I have 5 servers, 4 of which run Gluster and host a volume. The qemu user on all 5 servers has the same uid. storage.owner-uid and storage.owner-gid is set to that user. In addition, server.allow-insecure is on and is also set in the glusterd.vol file. glusterd has also been restarted (numerous times). When attempting to create a qemu file by connecting to the same server, everything works: qemu@192.168.1.11 qemu-img create gluster://192.168.1.11/volumes/v.img 1M Formatting 'gluster://192.168.1.11/volumes/v.img', fmt=raw size=1048576 qemu@192.168.1.11 But when trying to do it remotely, the command hangs indefinitely: qemu@192.168.1.12 qemu-img create gluster://192.168.1.11/volumes/v.img 1M Formatting 'gluster://192.168.1.11/volumes/v.img', fmt=raw size=1048576 ^C Yet when 192.168.1.12 connects to gluster://192.168.1.12, the command works and the file shows up in the distributed volume. Further, when turning server.allow-insecure off, I get an immediate error no matter what the source and destination connection is: qemu@192.168.1.12 qemu-img create gluster://192.168.1.11/volumes/v.img 1M Formatting 'gluster://192.168.1.11/volumes/v.img', fmt=raw size=1048576 qemu-img: Gluster connection failed for server=192.168.1.11 port=0 volume=volumes image=v.img transport=tcp qemu-img: gluster://192.168.1.11/volumes/v.img: error while creating raw: No data available Does anyone have any ideas how I can have an unprivileged user connect to remote gluster servers? Thanks, Joe ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Documentation hackathon for 3.5
Hi All, The documentation hackathon for 3.5 is underway. You can find more details here [1]. Anybody who submits a documentation patch that gets accepted between now and next week will stand a chance to get some swag :). Keep your patches coming! Cheers, Vijay [1] http://www.gluster.org/community/documentation/index.php/3.5_Documentation_Hackathon ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster fails under heavy array job load load
Please provide the full client and server logs (in a bug report). The snippets give some hints, but are not very meaningful without the full context/history since mount time (they have after-the-fact symptoms, but not the part which show the reason why disconnects happened). Even before looking into the full logs here are some quick observations: - write-behind-window-size = 1024MB seems *excessively* high. Please set this to 1MB (default) and check if the stability improves. - I see RDMA is enabled on the volume. Are you mounting clients through RDMA? If so, for the purpose of diagnostics can you mount through TCP and check the stability improves? If you are using RDMA with such a high write-behind-window-size, spurious ping-timeouts are an almost certainty during heavy writes. The RDMA driver has limited flow control, and setting such a high window-size can easily congest all the RDMA buffers resulting in spurious ping-timeouts and disconnections. Avati On Thu, Dec 12, 2013 at 5:03 PM, harry mangalam harry.manga...@uci.eduwrote: Hi All, (Gluster Volume Details at bottom) I've posted some of this previously, but even after various upgrades, attempted fixes, etc, it remains a problem. Short version: Our gluster fs (~340TB) provides scratch space for a ~5000core academic compute cluster. Much of our load is streaming IO, doing a lot of genomics work, and that is the load under which we saw this latest failure. Under heavy batch load, especially array jobs, where there might be several 64core nodes doing I/O on the 4servers/8bricks, we often get job failures that have the following profile: Client POV: Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all compute nodes that indicated interaction with the user's files http://pastie.org/8548781 Here are some client Info logs that seem fairly serious: http://pastie.org/8548785 The errors that referenced this user were gathered from all the nodes that were running his code (in compute*) and agglomerated with: cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr and placed here to show the profile of errors that his run generated. http://pastie.org/8548796 so 71 of them were: W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote operation failed: Transport endpoint is not connected. etc We've seen this before and previously discounted it bc it seems to have been related to the problem of spurious NFS-related bugs, but now I'm wondering whether it's a real problem. Also the 'remote operation failed: Stale file handle. ' warnings. There were no Errors logged per se, tho some of the W's looked fairly nasty, like the 'dht_layout_dir_mismatch' From the server side, however, during the same period, there were: 0 Warnings about this user's files 0 Errors 458 Info lines of which only 1 line was not a 'cleanup' line like this: --- 10.2.7.11:[2013-12-12 21:22:01.064289] I [server-helpers.c:460:do_fd_cleanup] 0-gl-server: fd cleanup on /path/to/file --- it was: --- 10.2.7.14:[2013-12-12 21:00:35.209015] I [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server: 113697332: SETXATTR /bio/tdlong/RNAseqIII/ckpt.1084030 (c9488341-c063-4175-8492-75e2e282f690) == trusted.glusterfs.dht --- We're losing about 10% of these kinds of array jobs bc of this, which is just not supportable. Gluster details servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2 Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4 $ gluster volume info Volume Name: gl Type: Distribute Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332 Status: Started Number of Bricks: 8 Transport-type: tcp,rdma Bricks: Brick1: bs2:/raid1 Brick2: bs2:/raid2 Brick3: bs3:/raid1 Brick4: bs3:/raid2 Brick5: bs4:/raid1 Brick6: bs4:/raid2 Brick7: bs1:/raid1 Brick8: bs1:/raid2 Options Reconfigured: performance.write-behind-window-size: 1024MB performance.flush-behind: on performance.cache-size: 268435456 nfs.disable: on performance.io-cache: on performance.quick-read: on performance.io-thread-count: 64 auth.allow: 10.2.*.*,10.1.*.* 'gluster volume status gl detail': http://pastie.org/8548826 --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users