Re: [Gluster-users] Issue with files on glusterfs becoming unreadable.

Elbert Lai Thu, 11 Jun 2009 11:08:39 -0700

Also, in the logfiles on the clients, it looks like I get these typesof messages whenever I try to access a file that is no longeraccessible.

2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse:22068570: /hourlogs/myDir0/1243432800.log => -1 (5)2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068579: /hourlogs/myDir1/1243400400.log => -1 (116)2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir1/1243400400.log: entry_count is 32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir1/1243400400.log: found on afr32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir1/1243400400.log: found on afr22009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir1/1243400400.log: found on afr-ns2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse:22068580: /hourlogs/myDir1/1243400400.log => -1 (5)2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068583: /hourlogs/myDir2/1243411200.log => -1 (116)2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir2/1243411200.log: entry_count is 32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir2/1243411200.log: found on afr12009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir2/1243411200.log: found on afr32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir2/1243411200.log: found on afr-ns2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse:22068584: /hourlogs/myDir2/1243411200.log => -1 (5)2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068599: /hourlogs/myDir3/1243472400.log => -1 (116)2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir3/1243472400.log: entry_count is 32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir3/1243472400.log: found on afr12009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir3/1243472400.log: found on afr32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir3/1243472400.log: found on afr-ns2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse:22068600: /hourlogs/myDir3/1243472400.log => -1 (5)2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068603: /hourlogs/myDir4/1243404000.log => -1 (116)2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir4/1243404000.log: entry_count is 32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir4/1243404000.log: found on afr12009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir4/1243404000.log: found on afr-ns2009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir4/1243404000.log: found on afr32009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse:22068604: /hourlogs/myDir5/1243404000.log => -1 (5)2009-06-11 07:58:24 E [fuse-bridge.c:436:fuse_entry_cbk] glusterfs-fuse: 22068619: /hourlogs/myDir5/1243447200.log => -1 (116)2009-06-11 07:58:24 E [unify.c:850:unify_open] unify: /hourlogs/myDir5/1243447200.log: entry_count is 42009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr12009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr32009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr22009-06-11 07:58:24 E [unify.c:853:unify_open] unify: /hourlogs/myDir5/1243447200.log: found on afr-ns2009-06-11 07:58:24 E [fuse-bridge.c:675:fuse_fd_cbk] glusterfs-fuse:22068620: /hourlogs/myDir5/1243447200.log => -1 (5)


On Jun 11, 2009, at 10:33 AM, Elbert Lai wrote:

elb...@host1:~$ dpkg -l|grep glusterfs
ii glusterfs-client1.3.8-0pre2 GlusterFS fuse clientii glusterfs-server1.3.8-0pre2 GlusterFS fuse serverii libglusterfs01.3.8-0pre2 GlusterFS libraries andtranslator modules
I have 2 hosts set up to use AFR with the package versions listedabove. I have been experiencing an issue where a file that is copiedto glusterfs is readable/writable for a while, then at some point ittime, it ceases to be. Trying to access it only retrieves the errormessage, "cannot open `filename' for reading: Input/output error".
Files enter glusterfs either via the "cp" command from a client orvia "rsync". In the case of cp, the clients are all local andcopying across a very fast connection. In the case of rsync, the 1client is itself a gluster client. We are testing out a laterversion of gluster, and it rsync's across a vpn.
elb...@host2:~$ dpkg -l|grep glusterfs
ii glusterfs-client 2.0.1-1 clustered file-systemii glusterfs-server 2.0.1-1 clustered file-systemii libglusterfs0 2.0.1-1 GlusterFSlibraries and translator modulesii libglusterfsclient0 2.0.1-1 GlusterFS clientlibrary
=========
What causes files to become inaccessible? I read that fstat() had abug in version 1.3.x whereas stat() did not, and that it was beingworked on. Could this be related?
When a file becomes inaccessible, I have been manually removing thefile from the mount point, then copying it back in via scp. Then thefile becomes accessible. Below I've pasted a sample of what I'mseeing.
elb...@tool3.:hourlogs$ cd myDir
ls 1244682000.log
elb...@tool3.:myDir$ ls 1244682000.log
1244682000.log
elb...@tool3.:myDir$ stat 1244682000.log
  File: `1244682000.log'
  Size: 40265114        Blocks: 78744      IO Block: 4096   regular file
Device: 15h/21d Inode: 42205749    Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 1003/ elbert) Gid:( 6000/ ops)
Access: 2009-06-11 02:25:10.000000000 +0000
Modify: 2009-06-11 02:26:02.000000000 +0000
Change: 2009-06-11 02:26:02.000000000 +0000
elb...@tool3.:myDir$ tail 1244682000.log
tail: cannot open `1244682000.log' for reading: Input/output error
At this point, I am able to rm the file. Then, if I scp it back in,I am able to successfully tail it.
So,
I have observed cases where the files had a Size of 0, and otherwisethey were in the same state. I'm not totally certain, but it lookslike if a file gets into this state from rsync, either it getsdeposited in this state immediately (before I try to read it), orelse it quickly enters this state. Speaking generally, file sizestend to be several MB up to 150 MB.
Here's my server config:
# Gluster Server configuration /etc/glusterfs/glusterfs-server.vol
# Configured for AFR & Unify features

volume brick
 type storage/posix
 option directory /var/gluster/data/
end-volume

volume brick-ns
 type storage/posix
 option directory /var/gluster/ns/
end-volume

volume server
 type protocol/server
 option transport-type tcp/server
 subvolumes brick brick-ns
 option auth.ip.brick.allow 165.193.245.*,10.11.*
 option auth.ip.brick-ns.allow 165.193.245.*,10.11.*
end-volume

Here's my client config:
# Gluster Client configuration /etc/glusterfs/glusterfs-client.vol
# Configured for AFR & Unify features

volume brick1
 type protocol/client
 option transport-type tcp/client     # for TCP/IP transport
 option remote-host 10.11.16.68    # IP address of the remote brick
 option remote-subvolume brick        # name of the remote volume
end-volume

volume brick2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.71
 option remote-subvolume brick
end-volume

volume brick3
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.69
 option remote-subvolume brick
end-volume

volume brick4
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.70
 option remote-subvolume brick
end-volume

volume brick5
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.119
 option remote-subvolume brick
end-volume

volume brick6
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.120
 option remote-subvolume brick
end-volume

volume brick-ns1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.68
option remote-subvolume brick-ns # Note the different remotevolume name.
end-volume

volume brick-ns2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.71
option remote-subvolume brick-ns # Note the different remotevolume name.
end-volume

volume afr1
 type cluster/afr
 subvolumes brick1 brick2
end-volume

volume afr2
 type cluster/afr
 subvolumes brick3 brick4
end-volume

volume afr3
 type cluster/afr
 subvolumes brick5 brick6
end-volume

volume afr-ns
 type cluster/afr
 subvolumes brick-ns1 brick-ns2
end-volume

volume unify
 type cluster/unify
 subvolumes afr1 afr2 afr3
 option namespace afr-ns

 # use the ALU scheduler
 option scheduler alu
# This option makes brick5 to be readonly, where no new files arecreated.
 ##option alu.read-only-subvolumes brick5##

 # Don't create files one a volume with less than 5% free diskspace
 option alu.limits.min-free-disk  10%

 # Don't create files on a volume with more than 10000 files open
 option alu.limits.max-open-files 10000
# When deciding where to place a file, first look at the disk-usage, then at# read-usage, write-usage, open files, and finally the disk-speed-usage.option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage
# Kick in if the discrepancy in disk-usage between volumes is morethan 2GB
 option alu.disk-usage.entry-threshold 2GB
# Don't stop writing to the least-used volume until the discrepancyis 1988MB
 option alu.disk-usage.exit-threshold  60MB

 # Kick in if the discrepancy in open files is 1024
 option alu.open-files-usage.entry-threshold 1024

 # Don't stop until 992 files have been written the least-used volume
 option alu.open-files-usage.exit-threshold 32

 # Kick in when the read-usage discrepancy is 20%
 option alu.read-usage.entry-threshold 20%

 # Don't stop until the discrepancy has been reduced to 16% (20% - 4%)
 option alu.read-usage.exit-threshold 4%

 # Kick in when the write-usage discrepancy is 20%
 option alu.write-usage.entry-threshold 20%

## Don't stop until the discrepancy has been reduced to 16%
 option alu.write-usage.exit-threshold 4%

 # Refresh the statistics used for decision-making every 10 seconds
 option alu.stat-refresh.interval 10sec
# Refresh the statistics used for decision-making after creating 10files
# option alu.stat-refresh.num-file-create 10
end-volume


#writebehind improves write performance a lot
volume writebehind
  type performance/write-behind
  option aggregate-size 131072 # in bytes
  subvolumes unify
end-volume

Has anyone seen this issue before? Any suggestions?

Thanks,
-elb-
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Issue with files on glusterfs becoming unreadable.

Reply via email to