[Gluster-users] Issue with files on glusterfs becoming unreadable.

Elbert Lai Thu, 11 Jun 2009 10:37:42 -0700

elb...@host1:~$ dpkg -l|grep glusterfs

ii glusterfs-client1.3.8-0pre2 GlusterFS fuse clientii glusterfs-server1.3.8-0pre2 GlusterFS fuse serverii libglusterfs01.3.8-0pre2 GlusterFS libraries andtranslator modules

I have 2 hosts set up to use AFR with the package versions listedabove. I have been experiencing an issue where a file that is copiedto glusterfs is readable/writable for a while, then at some point ittime, it ceases to be. Trying to access it only retrieves the errormessage, "cannot open `filename' for reading: Input/output error".

Files enter glusterfs either via the "cp" command from a client or via"rsync". In the case of cp, the clients are all local and copyingacross a very fast connection. In the case of rsync, the 1 client isitself a gluster client. We are testing out a later version ofgluster, and it rsync's across a vpn.


elb...@host2:~$ dpkg -l|grep glusterfs

ii glusterfs-client 2.0.1-1 clustered file-systemii glusterfs-server 2.0.1-1 clustered file-systemii libglusterfs0 2.0.1-1 GlusterFSlibraries and translator modulesii libglusterfsclient0 2.0.1-1 GlusterFS clientlibrary


=========

What causes files to become inaccessible? I read that fstat() had abug in version 1.3.x whereas stat() did not, and that it was beingworked on. Could this be related?

When a file becomes inaccessible, I have been manually removing thefile from the mount point, then copying it back in via scp. Then thefile becomes accessible. Below I've pasted a sample of what I'm seeing.

elb...@tool3.:hourlogs$ cd myDir
ls 1244682000.log
elb...@tool3.:myDir$ ls 1244682000.log
1244682000.log
elb...@tool3.:myDir$ stat 1244682000.log
  File: `1244682000.log'
  Size: 40265114        Blocks: 78744      IO Block: 4096   regular file
Device: 15h/21d Inode: 42205749    Links: 1

Access: (0755/-rwxr-xr-x) Uid: ( 1003/ elbert) Gid: ( 6000/ops)

Access: 2009-06-11 02:25:10.000000000 +0000
Modify: 2009-06-11 02:26:02.000000000 +0000
Change: 2009-06-11 02:26:02.000000000 +0000
elb...@tool3.:myDir$ tail 1244682000.log
tail: cannot open `1244682000.log' for reading: Input/output error

At this point, I am able to rm the file. Then, if I scp it back in, Iam able to successfully tail it.

So,

I have observed cases where the files had a Size of 0, and otherwisethey were in the same state. I'm not totally certain, but it lookslike if a file gets into this state from rsync, either it getsdeposited in this state immediately (before I try to read it), or elseit quickly enters this state. Speaking generally, file sizes tend tobe several MB up to 150 MB.


Here's my server config:
# Gluster Server configuration /etc/glusterfs/glusterfs-server.vol
# Configured for AFR & Unify features

volume brick
 type storage/posix
 option directory /var/gluster/data/
end-volume

volume brick-ns
 type storage/posix
 option directory /var/gluster/ns/
end-volume

volume server
 type protocol/server
 option transport-type tcp/server
 subvolumes brick brick-ns
 option auth.ip.brick.allow 165.193.245.*,10.11.*
 option auth.ip.brick-ns.allow 165.193.245.*,10.11.*
end-volume

Here's my client config:
# Gluster Client configuration /etc/glusterfs/glusterfs-client.vol
# Configured for AFR & Unify features

volume brick1
 type protocol/client
 option transport-type tcp/client     # for TCP/IP transport
 option remote-host 10.11.16.68    # IP address of the remote brick
 option remote-subvolume brick        # name of the remote volume
end-volume

volume brick2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.71
 option remote-subvolume brick
end-volume

volume brick3
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.69
 option remote-subvolume brick
end-volume

volume brick4
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.70
 option remote-subvolume brick
end-volume

volume brick5
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.119
 option remote-subvolume brick
end-volume

volume brick6
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.120
 option remote-subvolume brick
end-volume

volume brick-ns1
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.68

option remote-subvolume brick-ns # Note the different remote volumename.

end-volume

volume brick-ns2
 type protocol/client
 option transport-type tcp/client
 option remote-host 10.11.16.71

option remote-subvolume brick-ns # Note the different remote volumename.

end-volume

volume afr1
 type cluster/afr
 subvolumes brick1 brick2
end-volume

volume afr2
 type cluster/afr
 subvolumes brick3 brick4
end-volume

volume afr3
 type cluster/afr
 subvolumes brick5 brick6
end-volume

volume afr-ns
 type cluster/afr
 subvolumes brick-ns1 brick-ns2
end-volume

volume unify
 type cluster/unify
 subvolumes afr1 afr2 afr3
 option namespace afr-ns

 # use the ALU scheduler
 option scheduler alu

# This option makes brick5 to be readonly, where no new files arecreated.

 ##option alu.read-only-subvolumes brick5##

 # Don't create files one a volume with less than 5% free diskspace
 option alu.limits.min-free-disk  10%

 # Don't create files on a volume with more than 10000 files open
 option alu.limits.max-open-files 10000

# When deciding where to place a file, first look at the disk-usage,then at# read-usage, write-usage, open files, and finally the disk-speed-usage.option alu.order disk-usage:read-usage:write-usage:open-files-usage:disk-speed-usage

# Kick in if the discrepancy in disk-usage between volumes is morethan 2GB

 option alu.disk-usage.entry-threshold 2GB

# Don't stop writing to the least-used volume until the discrepancyis 1988MB

 option alu.disk-usage.exit-threshold  60MB

 # Kick in if the discrepancy in open files is 1024
 option alu.open-files-usage.entry-threshold 1024

 # Don't stop until 992 files have been written the least-used volume
 option alu.open-files-usage.exit-threshold 32

 # Kick in when the read-usage discrepancy is 20%
 option alu.read-usage.entry-threshold 20%

 # Don't stop until the discrepancy has been reduced to 16% (20% - 4%)
 option alu.read-usage.exit-threshold 4%

 # Kick in when the write-usage discrepancy is 20%
 option alu.write-usage.entry-threshold 20%

## Don't stop until the discrepancy has been reduced to 16%
 option alu.write-usage.exit-threshold 4%

 # Refresh the statistics used for decision-making every 10 seconds
 option alu.stat-refresh.interval 10sec

# Refresh the statistics used for decision-making after creating 10files

# option alu.stat-refresh.num-file-create 10
end-volume


#writebehind improves write performance a lot
volume writebehind
  type performance/write-behind
  option aggregate-size 131072 # in bytes
  subvolumes unify
end-volume

Has anyone seen this issue before? Any suggestions?

Thanks,
-elb-

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

[Gluster-users] Issue with files on glusterfs becoming unreadable.

Reply via email to