Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Johan Huysmans

I created a bug for this issue:

https://bugzilla.redhat.com/show_bug.cgi?id=1041109

gr.
Johan

On 10-12-13 12:52, Johan Huysmans wrote:

Hi All,

It seems I can easily reproduce the problem.

* on node 1 create a file (touch , cat , ...).
* on node 2 take md5sum of direct file (md5sum /path/to/file)
* on node 1 move file to other name (mv file file1)
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is 
still working although the file is not really there

* on node 1 change file content
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is 
still working and has a changed md5sum


This is really strange behaviour.
Is this normal, can this be altered with a a setting?

Thanks for any info,
gr.
Johan

On 10-12-13 10:02, Johan Huysmans wrote:
I could reproduce this problem with while my mount point is running 
in debug mode.

logfile is attached.

gr.
Johan Huysmans

On 10-12-13 09:30, Johan Huysmans wrote:

Hi All,

When reading some files we get this error:
md5sum: /path/to/file.xml: Structure needs cleaning

in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
[2013-12-10 08:07:32.256910] W 
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: 
remote operation failed: No such file or directory
[2013-12-10 08:07:32.257436] W 
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: 
remote operation failed: No such file or directory
[2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 
0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure 
needs cleaning)


We are using gluster 3.4.1-3 on CentOS6.
Our servers are 64-bit, our clients 32-bit (we are already using 
--enable-ino32 on the mountpoint)


This is my gluster configuration:
Volume Name: testvolume
Type: Replicate
Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SRV-1:/gluster/brick1
Brick2: SRV-2:/gluster/brick2
Options Reconfigured:
performance.force-readdirp: on
performance.stat-prefetch: off
network.ping-timeout: 5

And this is how the applications work:
We have 2 client nodes who both have a fuse.glusterfs mountpoint.
On 1 client node we have a application which writes files.
On the other client node we have a application which reads these files.
On the node where the files are written we don't see any problem, 
and can read that file without problems.
On the other node we have problems (error messages above) reading 
that file.
The problem occurs when we perform a md5sum on the exact file, when 
perform a md5sum on all files in that directory there is no problem.



How can we solve this problem as this is annoying.
The problem occurs after some time (can be days), an umount and 
mount of the mountpoint solves it for some days.

Once it occurs (and we don't remount) it occurs every time.


I hope someone can help me with this problems.

Thanks,
Johan Huysmans
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Johan Huysmans

I created a bug for this issue:

https://bugzilla.redhat.com/show_bug.cgi?id=1041109

gr.
Johan

On 10-12-13 12:52, Johan Huysmans wrote:

Hi All,

It seems I can easily reproduce the problem.

* on node 1 create a file (touch , cat , ...).
* on node 2 take md5sum of direct file (md5sum /path/to/file)
* on node 1 move file to other name (mv file file1)
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is 
still working although the file is not really there

* on node 1 change file content
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is 
still working and has a changed md5sum


This is really strange behaviour.
Is this normal, can this be altered with a a setting?

Thanks for any info,
gr.
Johan

On 10-12-13 10:02, Johan Huysmans wrote:
I could reproduce this problem with while my mount point is running 
in debug mode.

logfile is attached.

gr.
Johan Huysmans

On 10-12-13 09:30, Johan Huysmans wrote:

Hi All,

When reading some files we get this error:
md5sum: /path/to/file.xml: Structure needs cleaning

in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
[2013-12-10 08:07:32.256910] W 
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: 
remote operation failed: No such file or directory
[2013-12-10 08:07:32.257436] W 
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: 
remote operation failed: No such file or directory
[2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 
0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure 
needs cleaning)


We are using gluster 3.4.1-3 on CentOS6.
Our servers are 64-bit, our clients 32-bit (we are already using 
--enable-ino32 on the mountpoint)


This is my gluster configuration:
Volume Name: testvolume
Type: Replicate
Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SRV-1:/gluster/brick1
Brick2: SRV-2:/gluster/brick2
Options Reconfigured:
performance.force-readdirp: on
performance.stat-prefetch: off
network.ping-timeout: 5

And this is how the applications work:
We have 2 client nodes who both have a fuse.glusterfs mountpoint.
On 1 client node we have a application which writes files.
On the other client node we have a application which reads these files.
On the node where the files are written we don't see any problem, 
and can read that file without problems.
On the other node we have problems (error messages above) reading 
that file.
The problem occurs when we perform a md5sum on the exact file, when 
perform a md5sum on all files in that directory there is no problem.



How can we solve this problem as this is annoying.
The problem occurs after some time (can be days), an umount and 
mount of the mountpoint solves it for some days.

Once it occurs (and we don't remount) it occurs every time.


I hope someone can help me with this problems.

Thanks,
Johan Huysmans
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster Community Weekly Meeting

2013-12-12 Thread Vijay Bellur

On 12/12/2013 10:42 AM, James wrote:

RE: meeting, sorry I couldn't make it, but I have some comments:


No problem. It would be really good to have everybody in the meeting, 
but if you cannot comments are definitely welcome :).




1) About the pre-packaged VM comment's. I've gotten Vagrant working on
Fedora. I'm using this to rapidly spin up and test GlusterFS.
https://ttboj.wordpress.com/2013/12/09/vagrant-on-fedora-with-libvirt/
In the coming week or so, I'll be publishing the Vagrant file for my
GlusterFS setup, but if you really want it now I can send you an early
version. This obviously integrates with Puppet-Gluster, but whether
you use that or not is optional. I think this is the best way to test
GlusterFS. If someone gives me hosting, I could publish pre-built
images very easily. Let me know what you think.


Niels - do you have any thoughts here?



2) I never heard back from any action items from 2 weeks ago. I think
someone was going to connect me with a way to get access to some VM's
for testing stuff !


I see that there is an ongoing offline thread now. I think that should 
result in you getting those VMs.




3) Hagarth:  RE: typos, I have at least one spell check patch against
3.4.1 I sent it to list before, but someone told me to enroll in the
jenkins thing, which wasn't worth it for a small patch. Let me know if
you want it.


There are more typos now. I ran a cursory check with misspell-check [1] 
and found quite a few. Having that cleaned up on master and release-3.5 
would be great. Since the number is more, I am sure the patch would be 
non-trivial and having that routed through gerrit would be great. If you 
need a how to on getting to gerrit, it is available at [2].




4a) Someone mentioned documentation. Please feel free to merge in
https://github.com/purpleidea/puppet-gluster/blob/master/DOCUMENTATION.md
(markdown format). I have gone to great lengths to format this so that
it displays properly in github markdown, and standard (pandoc)
markdown. This way it works on github, and can also be rendered to a
pdf easily. Example:
https://github.com/purpleidea/puppet-gluster/raw/master/puppet-gluster-documentation.pdf
  You can use the file as a template!


Again having this in gerrit would be useful for merging the puppet 
documentation.




4b) I think the documentation should be kept in the same repo as
GlusterFS. This way, when you submit a feature branch, it can also
come with documentation. Lots of people work this way. It helps you
get minimal docs there, and/or at least some example code or a few
sentences. Also, looking at the docs, you can see what commits came
with this


I am with you on this one. After we are done with the planned 
documentation hackathon, let us open a new thread on this to get more 
opinions.


-Vijay

[1] https://github.com/lyda/misspell-check

[2] 
http://www.gluster.org/community/documentation/index.php/Development_Work_Flow




Thanks!

James




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster Community Weekly Meeting

2013-12-12 Thread James
On Thu, Dec 12, 2013 at 1:43 PM, Vijay Bellur vbel...@redhat.com wrote:
 4a) Someone mentioned documentation. Please feel free to merge in
 https://github.com/purpleidea/puppet-gluster/blob/master/DOCUMENTATION.md
 (markdown format). I have gone to great lengths to format this so that
 it displays properly in github markdown, and standard (pandoc)
 markdown. This way it works on github, and can also be rendered to a
 pdf easily. Example:

 https://github.com/purpleidea/puppet-gluster/raw/master/puppet-gluster-documentation.pdf
   You can use the file as a template!


 Again having this in gerrit would be useful for merging the puppet
 documentation.


Okay, I'll try to look into Gerrit and maybe submit a fake patch for testing.
When and where (in the tree) would be a good time to submit a doc
patch? It's probably best to wait until after your docs hackathon,
right?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster Community Weekly Meeting

2013-12-12 Thread Vijay Bellur

On 12/13/2013 12:18 AM, James wrote:

On Thu, Dec 12, 2013 at 1:43 PM, Vijay Bellur vbel...@redhat.com wrote:

4a) Someone mentioned documentation. Please feel free to merge in
https://github.com/purpleidea/puppet-gluster/blob/master/DOCUMENTATION.md
(markdown format). I have gone to great lengths to format this so that
it displays properly in github markdown, and standard (pandoc)
markdown. This way it works on github, and can also be rendered to a
pdf easily. Example:

https://github.com/purpleidea/puppet-gluster/raw/master/puppet-gluster-documentation.pdf
   You can use the file as a template!



Again having this in gerrit would be useful for merging the puppet
documentation.



Okay, I'll try to look into Gerrit and maybe submit a fake patch for testing.
When and where (in the tree) would be a good time to submit a doc
patch? It's probably best to wait until after your docs hackathon,
right?



Just added a page in preparation for the documentation hackathon:

http://www.gluster.org/community/documentation/index.php/Submitting_Documentation_Patches

I think the puppet guide can be under a new hierarchy located at 
doc/deploy-guide/markdown/en-US/. You can certainly submit the puppet 
doc patch as part of the hackathon.


-Vijay

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Maik Kulbe

How do you mount your Client? FUSE? I had similar problems when playing around 
with the timeout options for the FUSE mount. If they are too high they cache 
the metadata for too long. When you move the file the inode should stay the 
same and on the second node the path should stay in cache for a while so it 
still knows the inode for that moved files old path thus can act on the file 
without knowing it's path.

The problems kick in when you delete a file and recreate it - the cache tries to access 
the old inode, which was deleted, thus throwing errors. If I recall correctly the 
structure needs cleaning is one of two error messages I got, depending on 
which of the timeout mount options was set to a higher value.

-Original Mail-
From: Johan Huysmans [johan.huysm...@inuits.be]
Sent: 12.12.13 - 14:51:35
To: gluster-users@gluster.org [gluster-users@gluster.org]

Subject: Re: [Gluster-users] Structure needs cleaning on some files


I created a bug for this issue:

https://bugzilla.redhat.com/show_bug.cgi?id=1041109

gr.
Johan

On 10-12-13 12:52, Johan Huysmans wrote:

Hi All,

It seems I can easily reproduce the problem.

* on node 1 create a file (touch , cat , ...).
* on node 2 take md5sum of direct file (md5sum /path/to/file)
* on node 1 move file to other name (mv file file1)
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is
still working although the file is not really there
* on node 1 change file content
* on node 2 take md5sum of direct file (md5sum /path/to/file), this is
still working and has a changed md5sum

This is really strange behaviour.
Is this normal, can this be altered with a a setting?

Thanks for any info,
gr.
Johan

On 10-12-13 10:02, Johan Huysmans wrote:

I could reproduce this problem with while my mount point is running in
debug mode.
logfile is attached.

gr.
Johan Huysmans

On 10-12-13 09:30, Johan Huysmans wrote:

Hi All,

When reading some files we get this error:
md5sum: /path/to/file.xml: Structure needs cleaning

in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
[2013-12-10 08:07:32.256910] W
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
remote operation failed: No such file or directory
[2013-12-10 08:07:32.257436] W
[client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
remote operation failed: No such file or directory
[2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure
needs cleaning)

We are using gluster 3.4.1-3 on CentOS6.
Our servers are 64-bit, our clients 32-bit (we are already using
--enable-ino32 on the mountpoint)

This is my gluster configuration:
Volume Name: testvolume
Type: Replicate
Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: SRV-1:/gluster/brick1
Brick2: SRV-2:/gluster/brick2
Options Reconfigured:
performance.force-readdirp: on
performance.stat-prefetch: off
network.ping-timeout: 5

And this is how the applications work:
We have 2 client nodes who both have a fuse.glusterfs mountpoint.
On 1 client node we have a application which writes files.
On the other client node we have a application which reads these
files.
On the node where the files are written we don't see any problem,
and can read that file without problems.
On the other node we have problems (error messages above) reading
that file.
The problem occurs when we perform a md5sum on the exact file, when
perform a md5sum on all files in that directory there is no problem.

How can we solve this problem as this is annoying.
The problem occurs after some time (can be days), an umount and
mount of the mountpoint solves it for some days.
Once it occurs (and we don't remount) it occurs every time.

I hope someone can help me with this problems.

Thanks,
Johan Huysmans
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


smime.p7s
Description: S/MIME cryptographic signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Anand Avati
I have the same question. Do you have excessively high --entry-timeout
parameter to your FUSE mount? In any case, Structure needs cleaning error
should not surface up to FUSE and that is still a bug.


On Thu, Dec 12, 2013 at 12:46 PM, Maik Kulbe
i...@linux-web-development.dewrote:

 How do you mount your Client? FUSE? I had similar problems when playing
 around with the timeout options for the FUSE mount. If they are too high
 they cache the metadata for too long. When you move the file the inode
 should stay the same and on the second node the path should stay in cache
 for a while so it still knows the inode for that moved files old path thus
 can act on the file without knowing it's path.

 The problems kick in when you delete a file and recreate it - the cache
 tries to access the old inode, which was deleted, thus throwing errors. If
 I recall correctly the structure needs cleaning is one of two error
 messages I got, depending on which of the timeout mount options was set to
 a higher value.

 -Original Mail-
 From: Johan Huysmans [johan.huysm...@inuits.be]
 Sent: 12.12.13 - 14:51:35
 To: gluster-users@gluster.org [gluster-users@gluster.org]

 Subject: Re: [Gluster-users] Structure needs cleaning on some files


  I created a bug for this issue:

 https://bugzilla.redhat.com/show_bug.cgi?id=1041109

 gr.
 Johan

 On 10-12-13 12:52, Johan Huysmans wrote:

 Hi All,

 It seems I can easily reproduce the problem.

 * on node 1 create a file (touch , cat , ...).
 * on node 2 take md5sum of direct file (md5sum /path/to/file)
 * on node 1 move file to other name (mv file file1)
 * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
 still working although the file is not really there
 * on node 1 change file content
 * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
 still working and has a changed md5sum

 This is really strange behaviour.
 Is this normal, can this be altered with a a setting?

 Thanks for any info,
 gr.
 Johan

 On 10-12-13 10:02, Johan Huysmans wrote:

 I could reproduce this problem with while my mount point is running in
 debug mode.
 logfile is attached.

 gr.
 Johan Huysmans

 On 10-12-13 09:30, Johan Huysmans wrote:

 Hi All,

 When reading some files we get this error:
 md5sum: /path/to/file.xml: Structure needs cleaning

 in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
 [2013-12-10 08:07:32.256910] W
 [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
 remote operation failed: No such file or directory
 [2013-12-10 08:07:32.257436] W
 [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
 remote operation failed: No such file or directory
 [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure
 needs cleaning)

 We are using gluster 3.4.1-3 on CentOS6.
 Our servers are 64-bit, our clients 32-bit (we are already using
 --enable-ino32 on the mountpoint)

 This is my gluster configuration:
 Volume Name: testvolume
 Type: Replicate
 Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: SRV-1:/gluster/brick1
 Brick2: SRV-2:/gluster/brick2
 Options Reconfigured:
 performance.force-readdirp: on
 performance.stat-prefetch: off
 network.ping-timeout: 5

 And this is how the applications work:
 We have 2 client nodes who both have a fuse.glusterfs mountpoint.
 On 1 client node we have a application which writes files.
 On the other client node we have a application which reads these
 files.
 On the node where the files are written we don't see any problem,
 and can read that file without problems.
 On the other node we have problems (error messages above) reading
 that file.
 The problem occurs when we perform a md5sum on the exact file, when
 perform a md5sum on all files in that directory there is no problem.

 How can we solve this problem as this is annoying.
 The problem occurs after some time (can be days), an umount and
 mount of the mountpoint solves it for some days.
 Once it occurs (and we don't remount) it occurs every time.

 I hope someone can help me with this problems.

 Thanks,
 Johan Huysmans
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users


 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list

Re: [Gluster-users] Structure needs cleaning on some files

2013-12-12 Thread Anand Avati
Looks like your issue was fixed by patch http://review.gluster.org/4989/ in
master branch. Backporting this to release-3.4 now.

Thanks!
Avati


On Thu, Dec 12, 2013 at 1:26 PM, Anand Avati av...@gluster.org wrote:

 I have the same question. Do you have excessively high --entry-timeout
 parameter to your FUSE mount? In any case, Structure needs cleaning error
 should not surface up to FUSE and that is still a bug.


 On Thu, Dec 12, 2013 at 12:46 PM, Maik Kulbe 
 i...@linux-web-development.de wrote:

 How do you mount your Client? FUSE? I had similar problems when playing
 around with the timeout options for the FUSE mount. If they are too high
 they cache the metadata for too long. When you move the file the inode
 should stay the same and on the second node the path should stay in cache
 for a while so it still knows the inode for that moved files old path thus
 can act on the file without knowing it's path.

 The problems kick in when you delete a file and recreate it - the cache
 tries to access the old inode, which was deleted, thus throwing errors. If
 I recall correctly the structure needs cleaning is one of two error
 messages I got, depending on which of the timeout mount options was set to
 a higher value.

 -Original Mail-
 From: Johan Huysmans [johan.huysm...@inuits.be]
 Sent: 12.12.13 - 14:51:35
 To: gluster-users@gluster.org [gluster-users@gluster.org]

 Subject: Re: [Gluster-users] Structure needs cleaning on some files


  I created a bug for this issue:

 https://bugzilla.redhat.com/show_bug.cgi?id=1041109

 gr.
 Johan

 On 10-12-13 12:52, Johan Huysmans wrote:

 Hi All,

 It seems I can easily reproduce the problem.

 * on node 1 create a file (touch , cat , ...).
 * on node 2 take md5sum of direct file (md5sum /path/to/file)
 * on node 1 move file to other name (mv file file1)
 * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
 still working although the file is not really there
 * on node 1 change file content
 * on node 2 take md5sum of direct file (md5sum /path/to/file), this is
 still working and has a changed md5sum

 This is really strange behaviour.
 Is this normal, can this be altered with a a setting?

 Thanks for any info,
 gr.
 Johan

 On 10-12-13 10:02, Johan Huysmans wrote:

 I could reproduce this problem with while my mount point is running in
 debug mode.
 logfile is attached.

 gr.
 Johan Huysmans

 On 10-12-13 09:30, Johan Huysmans wrote:

 Hi All,

 When reading some files we get this error:
 md5sum: /path/to/file.xml: Structure needs cleaning

 in /var/log/glusterfs/mnt-sharedfs.log we see these errors:
 [2013-12-10 08:07:32.256910] W
 [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0:
 remote operation failed: No such file or directory
 [2013-12-10 08:07:32.257436] W
 [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1:
 remote operation failed: No such file or directory
 [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk]
 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure
 needs cleaning)

 We are using gluster 3.4.1-3 on CentOS6.
 Our servers are 64-bit, our clients 32-bit (we are already using
 --enable-ino32 on the mountpoint)

 This is my gluster configuration:
 Volume Name: testvolume
 Type: Replicate
 Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: SRV-1:/gluster/brick1
 Brick2: SRV-2:/gluster/brick2
 Options Reconfigured:
 performance.force-readdirp: on
 performance.stat-prefetch: off
 network.ping-timeout: 5

 And this is how the applications work:
 We have 2 client nodes who both have a fuse.glusterfs mountpoint.
 On 1 client node we have a application which writes files.
 On the other client node we have a application which reads these
 files.
 On the node where the files are written we don't see any problem,
 and can read that file without problems.
 On the other node we have problems (error messages above) reading
 that file.
 The problem occurs when we perform a md5sum on the exact file, when
 perform a md5sum on all files in that directory there is no problem.

 How can we solve this problem as this is annoying.
 The problem occurs after some time (can be days), an umount and
 mount of the mountpoint solves it for some days.
 Once it occurs (and we don't remount) it occurs every time.

 I hope someone can help me with this problems.

 Thanks,
 Johan Huysmans
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users


 

[Gluster-users] Gerrit doesn't use HTTPS

2013-12-12 Thread James
I just noticed that the Gluster Gerrit [1] doesn't use HTTPS!

Can this be fixed ASAP?

Cheers,
James

[1] http://review.gluster.org/
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] gluster fails under heavy array job load load

2013-12-12 Thread harry mangalam
Hi All,
(Gluster Volume Details at bottom)

I've posted some of this previously, but even after various upgrades, 
attempted fixes, etc, it remains a problem.


Short version:  Our gluster fs (~340TB) provides scratch space for a ~5000core 
academic compute cluster.  
Much of our load is streaming IO, doing a lot of genomics work, and that is 
the load under which we saw this latest failure.
Under heavy batch load, especially array jobs, where there might be several 
64core nodes doing I/O on the 4servers/8bricks, we often get job failures that 
have the following profile:

Client POV:
Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all 
compute nodes that indicated interaction with the user's files
http://pastie.org/8548781

Here are some client Info logs that seem fairly serious:
http://pastie.org/8548785

The errors that referenced this user were gathered from all the nodes that 
were running his code (in compute*) and agglomerated with:

cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr 

and placed here to show the profile of errors that his run generated.
http://pastie.org/8548796

so 71 of them were:
  W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote 
operation failed: Transport endpoint is not connected. 
etc

We've seen this before and previously discounted it bc it seems to have been 
related to the problem of spurious NFS-related bugs, but now I'm wondering 
whether it's a real problem. 
Also the 'remote operation failed: Stale file handle. ' warnings.

There were no Errors logged per se, tho some of the W's looked fairly nasty, 
like the 'dht_layout_dir_mismatch'

From the server side, however, during the same period, there were:
0 Warnings about this user's files
0 Errors 
458 Info lines
of which only 1 line was not a 'cleanup' line like this:
---
10.2.7.11:[2013-12-12 21:22:01.064289] I [server-helpers.c:460:do_fd_cleanup] 
0-gl-server: fd cleanup on /path/to/file
---
it was:
---
10.2.7.14:[2013-12-12 21:00:35.209015] I [server-rpc-
fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server: 113697332: SETXATTR 
/bio/tdlong/RNAseqIII/ckpt.1084030 (c9488341-c063-4175-8492-75e2e282f690) == 
trusted.glusterfs.dht
---

We're losing about 10% of these kinds of array jobs bc of this, which is just 
not supportable.



Gluster details

servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2 
Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4

$ gluster volume info
 
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Bricks:
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.write-behind-window-size: 1024MB
performance.flush-behind: on
performance.cache-size: 268435456
nfs.disable: on
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*


'gluster volume status gl detail': 
http://pastie.org/8548826

---
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
---
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] qemu remote insecure connections

2013-12-12 Thread Joe Topjian
Hello,

I'm having a problem getting remote servers to connect to Gluster with qemu.

I have 5 servers, 4 of which run Gluster and host a volume. The qemu user
on all 5 servers has the same uid.

storage.owner-uid and storage.owner-gid is set to that user.

In addition, server.allow-insecure is on and is also set in the
glusterd.vol file. glusterd has also been restarted (numerous times).

When attempting to create a qemu file by connecting to the same server,
everything works:

qemu@192.168.1.11 qemu-img create gluster://192.168.1.11/volumes/v.img 1M
Formatting 'gluster://192.168.1.11/volumes/v.img', fmt=raw size=1048576
qemu@192.168.1.11

But when trying to do it remotely, the command hangs indefinitely:

qemu@192.168.1.12 qemu-img create gluster://192.168.1.11/volumes/v.img 1M
Formatting 'gluster://192.168.1.11/volumes/v.img', fmt=raw size=1048576
^C

Yet when 192.168.1.12 connects to gluster://192.168.1.12, the command works
and the file shows up in the distributed volume.

Further, when turning server.allow-insecure off, I get an immediate error
no matter what the source and destination connection is:

qemu@192.168.1.12 qemu-img create gluster://192.168.1.11/volumes/v.img 1M
Formatting 'gluster://192.168.1.11/volumes/v.img', fmt=raw size=1048576
qemu-img: Gluster connection failed for server=192.168.1.11 port=0
volume=volumes image=v.img transport=tcp
qemu-img: gluster://192.168.1.11/volumes/v.img: error while creating raw:
No data available

Does anyone have any ideas how I can have an unprivileged user connect to
remote gluster servers?

Thanks,
Joe
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Documentation hackathon for 3.5

2013-12-12 Thread Vijay Bellur

Hi All,

The documentation hackathon for 3.5 is underway. You can find more 
details here [1].


Anybody who submits a documentation patch that gets accepted between now 
and next week will stand a chance to get some swag :).


Keep your patches coming!

Cheers,
Vijay

[1] 
http://www.gluster.org/community/documentation/index.php/3.5_Documentation_Hackathon

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster fails under heavy array job load load

2013-12-12 Thread Anand Avati
Please provide the full client and server logs (in a bug report). The
snippets give some hints, but are not very meaningful without the full
context/history since mount time (they have after-the-fact symptoms, but
not the part which show the reason why disconnects happened).

Even before looking into the full logs here are some quick observations:

- write-behind-window-size = 1024MB seems *excessively* high. Please set
this to 1MB (default) and check if the stability improves.

- I see RDMA is enabled on the volume. Are you mounting clients through
RDMA? If so, for the purpose of diagnostics can you mount through TCP and
check the stability improves? If you are using RDMA with such a high
write-behind-window-size, spurious ping-timeouts are an almost certainty
during heavy writes. The RDMA driver has limited flow control, and setting
such a high window-size can easily congest all the RDMA buffers resulting
in spurious ping-timeouts and disconnections.

Avati


On Thu, Dec 12, 2013 at 5:03 PM, harry mangalam harry.manga...@uci.eduwrote:

  Hi All,

 (Gluster Volume Details at bottom)



 I've posted some of this previously, but even after various upgrades,
 attempted fixes, etc, it remains a problem.





 Short version: Our gluster fs (~340TB) provides scratch space for a
 ~5000core academic compute cluster.

 Much of our load is streaming IO, doing a lot of genomics work, and that
 is the load under which we saw this latest failure.

 Under heavy batch load, especially array jobs, where there might be
 several 64core nodes doing I/O on the 4servers/8bricks, we often get job
 failures that have the following profile:



 Client POV:

 Here is a sampling of the client logs (/var/log/glusterfs/gl.log) for all
 compute nodes that indicated interaction with the user's files

 http://pastie.org/8548781



 Here are some client Info logs that seem fairly serious:

 http://pastie.org/8548785



 The errors that referenced this user were gathered from all the nodes that
 were running his code (in compute*) and agglomerated with:



 cut -f2,3 -d']' compute* |cut -f1 -dP | sort | uniq -c | sort -gr



 and placed here to show the profile of errors that his run generated.

 http://pastie.org/8548796



 so 71 of them were:

 W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-7: remote
 operation failed: Transport endpoint is not connected.

 etc



 We've seen this before and previously discounted it bc it seems to have
 been related to the problem of spurious NFS-related bugs, but now I'm
 wondering whether it's a real problem.

 Also the 'remote operation failed: Stale file handle. ' warnings.



 There were no Errors logged per se, tho some of the W's looked fairly
 nasty, like the 'dht_layout_dir_mismatch'



 From the server side, however, during the same period, there were:

 0 Warnings about this user's files

 0 Errors

 458 Info lines

 of which only 1 line was not a 'cleanup' line like this:

 ---

 10.2.7.11:[2013-12-12 21:22:01.064289] I
 [server-helpers.c:460:do_fd_cleanup] 0-gl-server: fd cleanup on
 /path/to/file

 ---

 it was:

 ---

 10.2.7.14:[2013-12-12 21:00:35.209015] I
 [server-rpc-fops.c:898:_gf_server_log_setxattr_failure] 0-gl-server:
 113697332: SETXATTR /bio/tdlong/RNAseqIII/ckpt.1084030
 (c9488341-c063-4175-8492-75e2e282f690) == trusted.glusterfs.dht

 ---



 We're losing about 10% of these kinds of array jobs bc of this, which is
 just not supportable.







 Gluster details



 servers and clients running gluster 3.4.0-8.el6 over QDR IB, IPoIB, thru 2
 Mellanox, 1 Voltaire switches, Mellanox cards, CentOS 6.4



 $ gluster volume info

  Volume Name: gl

 Type: Distribute

 Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332

 Status: Started

 Number of Bricks: 8

 Transport-type: tcp,rdma

 Bricks:

 Brick1: bs2:/raid1

 Brick2: bs2:/raid2

 Brick3: bs3:/raid1

 Brick4: bs3:/raid2

 Brick5: bs4:/raid1

 Brick6: bs4:/raid2

 Brick7: bs1:/raid1

 Brick8: bs1:/raid2

 Options Reconfigured:

 performance.write-behind-window-size: 1024MB

 performance.flush-behind: on

 performance.cache-size: 268435456

 nfs.disable: on

 performance.io-cache: on

 performance.quick-read: on

 performance.io-thread-count: 64

 auth.allow: 10.2.*.*,10.1.*.*





 'gluster volume status gl detail':

 http://pastie.org/8548826



 ---

 Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine

 [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487

 415 South Circle View Dr, Irvine, CA, 92697 [shipping]

 MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)

 ---



 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users