On 02/07/2011 11:49 PM, Raghavendra G wrote:
Hi Steve,
Are the back-end file systems working correctly? I am seeing lots of errors in
server log files while accessing back-end filesystem.
gluster-01-brick.log.1:[2011-01-26 03:43:07.353445] E [posix.c:2193:posix_open]
post-posix: open on /gluster/01/bri
ck/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.cmd:
Read-only file system
gluster-01-brick.log.1:[2011-01-26 03:43:07.353857] E
[posix.c:678:posix_setattr] post-posix: setattr (utimes) on /
gluster/01/brick/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.cmd
failed: Re
ad-only file system
gluster-01-brick.log.1:[2011-01-26 03:43:07.354827] E
[posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
28e50dc1c8: Input/output error
gluster-01-brick.log.1:[2011-01-26 03:43:07.357396] E [posix.c:2193:posix_open]
post-posix: open on /gluster/01/bri
ck/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.ps:
Read-only file system
gluster-01-brick.log.1:[2011-01-26 03:43:07.357794] E
[posix.c:678:posix_setattr] post-posix: setattr (utimes) on /
gluster/01/brick/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.ps
failed: Rea
d-only file system
gluster-01-brick.log.1:[2011-01-26 03:43:07.358865] E
[posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
28e50dc1c8: Input/output error
gluster-01-brick.log.1:[2011-01-26 03:43:07.359264] E
[posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
28e50dc1c8: Input/output error
gluster-01-brick.log.1:[2011-01-26 03:43:07.359548] E
[posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
28e50dc1c8: Input/output error
gluster-01-brick.log.1:[2011-01-26 03:43:07.367163] E
[posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
I am seeing other errors, which indicate that the backend is read-only
filesystem. Due to this distribute and replicate are not able to store the
metadata (using xattrs), which in turn is resulting in lots of split-brains and
layout NULL errors. Can you please check the backend file system?
regards,
Yes, the filesystem was read-only for a time when a disk failed. We
then rebuilt the brick on that disk from the corresponding brick in the
second server (with the volume stopped, of course) using:
rsync -aXv brick/ stanley:/gluster/06/brick/
Following some instructions we found on the mailing list we then:
1) deleted the volume
2) ran "find /gluster -exec setfattr -x trusted.gfid \{\} \;" on
the bricks
3) created the volume again
4) mounted the volume
5) ran "find . -print0 | xargs --null stat > /dev/null" on the
mounted volume
This returned us to what seemed to be a stable state (i.e., no errors
from running "ls -alR" from the top of the volume). Then after putting
the volume back into service, these errors started occurring again. I
have noticed that turning off "performance.stat-prefetch" has brought
about a great improvement. We continue to see some errors like this on
one of the servers:
[2011-02-08 14:22:08.360799] I [dht-common.c:369:dht_revalidate_cbk]
post-dht: subvolume post-replicate-1 returned -1 (Invalid argument)
[2011-02-08 14:22:08.836672] I [dht-common.c:369:dht_revalidate_cbk]
post-dht: subvolume post-replicate-4 returned -1 (Invalid argument)
[2011-02-08 14:22:39.468388] I [dht-common.c:369:dht_revalidate_cbk]
post-dht: subvolume post-replicate-0 returned -1 (Invalid argument)
[2011-02-08 14:22:39.468436] W [fuse-bridge.c:184:fuse_entry_cbk]
glusterfs-fuse: 22465136: LOOKUP() /home/lev/.Xauthority => -1
(Invalid argument)
[2011-02-08 14:22:40.462910] I [dht-common.c:369:dht_revalidate_cbk]
post-dht: subvolume post-replicate-5 returned -1 (Invalid argument)
[2011-02-08 14:22:40.462958] W [fuse-bridge.c:184:fuse_entry_cbk]
glusterfs-fuse: 22466110: LOOKUP() /home/lev/.viminfo => -1 (Invalid
argument)
And the user sees:
root@stanley:/net/post/lev# ls -al .viminfo .Xauthority
ls: cannot access .viminfo: Invalid argument
ls: cannot access .Xauthority: Invalid argument
But only from one client (which also happens to be the server giving the
errors above). Another client (the other server) shows these same files
without problem:
root@pablo:/net/post/lev# ls -al .viminfo .Xauthority
-rw--- 1 lev post 9400 2011-02-07 22:52 .viminfo
-rw--- 1 lev post 7401 2011-02-08 00:27 .Xauthority
Steve
- Original Message -
From: "Steve Wilson"
To: "Lakshmipathi"
Cc: "Raghavendra G"
Sent: Thursday, February 3, 2011 7:21:36 PM
Subject: Re: [Gluster-users] 3.1.2 with "No such file" and "Invalid argument"
errors
Hi,
Thanks for looking into this. Any ideas so far? Or anything you'd like
me to try?
Here's some other perhaps relevant information:
* all bricks are formatted ext4 and mounted with the noatime option
in addition to default opt