[Gluster-users] GlusterFS v3.1.5 Stable Configuration
Hi, We've been using GlusterFS to manage shared files across a number of hosts in the past few months and have ran into a few problems -- basically one every month, roughly. The problems are occasionally extremely difficult to track down to GlusterFS, as they often masquerade as something else in the application log files that we have. The problems have been one instance of split-brain and then a number of instances of "stuck" files (i.e. any stat calls would block for an hour and then timeout with an error) as well as a couple instances of "ghost" files (remove the file, but GlusterFS continues to show it for a little while until the cache times out). We do *not* place a large amount of load on GlusterFS, and don't have any significant performance issues to deal with. With that in mind, the core question of this e-mail is: "How can I modify our configuration to be the absolute *most* stable (problem free) that it can be, even if it means sacrificing performance?" In sum, I don't have any particular performance concerns at this moment, but the GlusterFS bugs that we encounter are quite problematic -- so I'm willing to entertain any suggested stability improvement, even if it has a negative impact on performance (I suspect that the answer here is just "turn off all performance-enhancing gluster caching", but I wanted to validate that is actually true before going so far). Thus please suggest anything that could be done to improve the stability of our setup -- as an aside, I think that this would be an advantageous thing to add to the FAQ. Right now the FAQ contains information for *performance* tuning, but not for *stability* tuning. Thanks for any help that you can give/suggestions that you can make. Here are the details of our environment: OS: RHEL5 GlusterFS Version: 3.1.5 Mount method: glusterfsd/FUSE GlusterFS Servers: web01, web02 GlusterFS Clients: web01, web02, dj01, dj02 $ sudo gluster volume info Volume Name: shared-application-data Type: Replicate Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: web01:/var/glusterfs/bricks/shared Brick2: web02:/var/glusterfs/bricks/shared Options Reconfigured: network.ping-timeout: 5 nfs.disable: on Configuration File Contents: */etc/glusterd/vols/shared-application-data/shared-application-data-fuse.vol * volume shared-application-data-client-0 type protocol/client option remote-host web01 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-client-1 type protocol/client option remote-host web02 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-replicate-0 type cluster/replicate subvolumes shared-application-data-client-0 shared-application-data-client-1 end-volume volume shared-application-data-write-behind type performance/write-behind subvolumes shared-application-data-replicate-0 end-volume volume shared-application-data-read-ahead type performance/read-ahead subvolumes shared-application-data-write-behind end-volume volume shared-application-data-io-cache type performance/io-cache subvolumes shared-application-data-read-ahead end-volume volume shared-application-data-quick-read type performance/quick-read subvolumes shared-application-data-io-cache end-volume volume shared-application-data-stat-prefetch type performance/stat-prefetch subvolumes shared-application-data-quick-read end-volume volume shared-application-data type debug/io-stats subvolumes shared-application-data-stat-prefetch end-volume */etc/glusterfs/glusterd.vol* volume management type mgmt/glusterd option working-directory /etc/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 end-volume -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | r...@goclio.com www.goclio.com | blog <http://www.goclio.com/blog> | twitter<http://www.twitter.com/goclio> | facebook <http://www.facebook.com/goclio> _⌠ oo ⌡_ (_ _) || ⌡_⌡⌡_⌡ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebuild Distributed/Replicated Setup
Hi Pranith, OK, thanks, I can do that. Is there any sign of _how_ we got into this situation? Anything that I can go back and look for in the logs that might tell us more about how this happened and how we can prevent it from happening again? Thanks again, Remi On Thu, May 19, 2011 at 03:12, Pranith Kumar. Karampuri < prani...@gluster.com> wrote: > Remi, > Sorry I think you want to keep web02 as the source and web01 as the > sink, so the commands need to be executed on web01: > > 1) sudo setxattr -n trusted.afr.shared-application-data-client-1 -v > 0s . > 2) then do a find on the , > > Thanks > Pranith > > - Original Message - > From: "Pranith Kumar. Karampuri" > To: "Remi Broemeling" > Cc: gluster-users@gluster.org > Sent: Thursday, May 19, 2011 2:14:52 PM > Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup > > hi Remi, >This is a classic case of split-brain. See if the md5sum of the files in > question matches on both web01, web02. If yes you can safely reset the xattr > of the file on one of the replicas to trigger self-heal. If the md5sums dont > match, you will have to select the machine you want to keep as the source > (In your case it is web01), go to the other machine (In your case it is > web02) and execute the following commands: > > 1) sudo setxattr -n trusted.afr.shared-application-data-client-0 -v > 0s . > 2) then do a find on the , > that will trigger self-heal and both copies will be in replication again. > > Self-heal can cause a performance hit if you trigger self-heal for all the > files at once if they are BIG files. so trigger 1 after the other upon > completion in that case. > > Let me know if you need any more help with this. Removing the whole web02 > data and triggering a total self-heal is very expensive operation, I > wouldn't do that. > > Pranith. > - Original Message - > From: "Remi Broemeling" > To: "Pranith Kumar. Karampuri" > Cc: gluster-users@gluster.org > Sent: Wednesday, May 18, 2011 8:21:33 PM > Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup > > Sure, > > These files are just a sampling -- a lot of other files are showing the > same "split-brain" behaviour. > > [14:42:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809223185/contact.log > # file: agc/production/log/809223185/contact.log > trusted.afr.shared-application-data-client-0=0s > trusted.afr.shared-application-data-client-1=0sBQAA > [14:45:15][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809223185/contact.log > # file: agc/production/log/809223185/contact.log > trusted.afr.shared-application-data-client-0=0sAAACOwAA > trusted.afr.shared-application-data-client-1=0s > > [14:42:53][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809223185/event.log > # file: agc/production/log/809223185/event.log > trusted.afr.shared-application-data-client-0=0s > trusted.afr.shared-application-data-client-1=0sDgAA > [14:45:24][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809223185/event.log > # file: agc/production/log/809223185/event.log > trusted.afr.shared-application-data-client-0=0sAAAGXQAA > trusted.afr.shared-application-data-client-1=0s > > [14:43:02][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809223635/contact.log > # file: agc/production/log/809223635/contact.log > trusted.afr.shared-application-data-client-0=0s > trusted.afr.shared-application-data-client-1=0sCgAA > [14:45:28][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809223635/contact.log > # file: agc/production/log/809223635/contact.log > trusted.afr.shared-application-data-client-0=0sAAAELQAA > trusted.afr.shared-application-data-client-1=0s > > [14:43:39][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809224061/contact.log > # file: agc/production/log/809224061/contact.log > trusted.afr.shared-application-data-client-0=0s > trusted.afr.shared-application-data-client-1=0sCQAA > [14:45:32][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m > "trusted.afr*" agc/production/log/809224061/contact.log > # file: agc/production/log/8
Re: [Gluster-users] Rebuild Distributed/Replicated Setup
Sure, These files are just a sampling -- a lot of other files are showing the same "split-brain" behaviour. [14:42:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809223185/contact.log # file: agc/production/log/809223185/contact.log trusted.afr.shared-application-data-client-0=0s trusted.afr.shared-application-data-client-1=0sBQAA [14:45:15][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809223185/contact.log # file: agc/production/log/809223185/contact.log trusted.afr.shared-application-data-client-0=0sAAACOwAA trusted.afr.shared-application-data-client-1=0s [14:42:53][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809223185/event.log # file: agc/production/log/809223185/event.log trusted.afr.shared-application-data-client-0=0s trusted.afr.shared-application-data-client-1=0sDgAA [14:45:24][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809223185/event.log # file: agc/production/log/809223185/event.log trusted.afr.shared-application-data-client-0=0sAAAGXQAA trusted.afr.shared-application-data-client-1=0s [14:43:02][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809223635/contact.log # file: agc/production/log/809223635/contact.log trusted.afr.shared-application-data-client-0=0s trusted.afr.shared-application-data-client-1=0sCgAA [14:45:28][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809223635/contact.log # file: agc/production/log/809223635/contact.log trusted.afr.shared-application-data-client-0=0sAAAELQAA trusted.afr.shared-application-data-client-1=0s [14:43:39][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809224061/contact.log # file: agc/production/log/809224061/contact.log trusted.afr.shared-application-data-client-0=0s trusted.afr.shared-application-data-client-1=0sCQAA [14:45:32][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809224061/contact.log # file: agc/production/log/809224061/contact.log trusted.afr.shared-application-data-client-0=0sAAAD+AAA trusted.afr.shared-application-data-client-1=0s [14:43:42][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809224321/contact.log # file: agc/production/log/809224321/contact.log trusted.afr.shared-application-data-client-0=0s trusted.afr.shared-application-data-client-1=0sCAAA [14:45:37][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809224321/contact.log # file: agc/production/log/809224321/contact.log trusted.afr.shared-application-data-client-0=0sAAAERAAA trusted.afr.shared-application-data-client-1=0s [14:43:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809215319/event.log # file: agc/production/log/809215319/event.log trusted.afr.shared-application-data-client-0=0s trusted.afr.shared-application-data-client-1=0sBwAA [14:45:45][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m "trusted.afr*" agc/production/log/809215319/event.log # file: agc/production/log/809215319/event.log trusted.afr.shared-application-data-client-0=0sAAAC/QAA trusted.afr.shared-application-data-client-1=0s On Wed, May 18, 2011 at 01:31, Pranith Kumar. Karampuri < prani...@gluster.com> wrote: > hi Remi, > It seems the split-brain is detected on following files: > /agc/production/log/809223185/contact.log > /agc/production/log/809223185/event.log > /agc/production/log/809223635/contact.log > /agc/production/log/809224061/contact.log > /agc/production/log/809224321/contact.log > /agc/production/log/809215319/event.log > > Could you give the output of the following command for each file above on > both the bricks in the replica pair. > > getxattr -d -m "trusted.afr*" > > Thanks > Pranith > > - Original Message - > From: "Remi Broemeling" > To: gluster-users@gluster.org > Sent: Tuesday, May 17, 2011 9:02:44 PM > Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup > > > Hi Pranith. Sure, here is a pastebin sampling of logs from one of the > hosts: http://pastebin.com/1U1ziwjC > > > On Mon, May 16, 2011 at 20:48, Pranith Kumar. Karampuri < > prani...@gluster.com > wrote: > > > hi Remi
Re: [Gluster-users] Rebuild Distributed/Replicated Setup
Hi Pranith. Sure, here is a pastebin sampling of logs from one of the hosts: http://pastebin.com/1U1ziwjC On Mon, May 16, 2011 at 20:48, Pranith Kumar. Karampuri < prani...@gluster.com> wrote: > hi Remi, >Would it be possible to post the logs on the client, so that we can find > what issue you are running into. > > Pranith > - Original Message - > From: "Remi Broemeling" > To: gluster-users@gluster.org > Sent: Monday, May 16, 2011 10:47:33 PM > Subject: [Gluster-users] Rebuild Distributed/Replicated Setup > > > Hi, > > I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) > setup across two servers (web01 and web02) with the following vol config: > > volume shared-application-data-client-0 > type protocol/client > option remote-host web01 > option remote-subvolume /var/glusterfs/bricks/shared > option transport-type tcp > option ping-timeout 5 > end-volume > > volume shared-application-data-client-1 > type protocol/client > option remote-host web02 > option remote-subvolume /var/glusterfs/bricks/shared > option transport-type tcp > option ping-timeout 5 > end-volume > > volume shared-application-data-replicate-0 > type cluster/replicate > subvolumes shared-application-data-client-0 > shared-application-data-client-1 > end-volume > > volume shared-application-data-write-behind > type performance/write-behind > subvolumes shared-application-data-replicate-0 > end-volume > > volume shared-application-data-read-ahead > type performance/read-ahead > subvolumes shared-application-data-write-behind > end-volume > > volume shared-application-data-io-cache > type performance/io-cache > subvolumes shared-application-data-read-ahead > end-volume > > volume shared-application-data-quick-read > type performance/quick-read > subvolumes shared-application-data-io-cache > end-volume > > volume shared-application-data-stat-prefetch > type performance/stat-prefetch > subvolumes shared-application-data-quick-read > end-volume > > volume shared-application-data > type debug/io-stats > subvolumes shared-application-data-stat-prefetch > end-volume > > In total, four servers mount this via GlusterFS FUSE. For whatever reason > (I'm really not sure why), the GlusterFS filesystem has run into a bit of > split-brain nightmare (although to my knowledge an actual split brain > situation has never occurred in this environment), and I have been getting > solidly corrupted issues across the filesystem as well as complaints that > the filesystem cannot be self-healed. > > What I would like to do is completely empty one of the two servers (here I > am trying to empty server web01), making the other one (in this case web02) > the authoritative source for the data; and then have web01 completely > rebuild it's mirror directly from web02. > > What's the easiest/safest way to do this? Is there a command that I can run > that will force web01 to re-initialize it's mirror directly from web02 (and > thus completely eradicate all of the split-brain errors and data > inconsistencies)? > > Thanks! > -- > > Remi Broemeling > System Administrator > Clio - Practice Management Simplified > 1-888-858-2546 x(2^5) | r...@goclio.com > www.goclio.com | blog | twitter | facebook > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | r...@goclio.com www.goclio.com | blog <http://www.goclio.com/blog> | twitter<http://www.twitter.com/goclio> | facebook <http://www.facebook.com/goclio> ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Rebuild Distributed/Replicated Setup
Hi, I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup across two servers (web01 and web02) with the following vol config: volume shared-application-data-client-0 type protocol/client option remote-host web01 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-client-1 type protocol/client option remote-host web02 option remote-subvolume /var/glusterfs/bricks/shared option transport-type tcp option ping-timeout 5 end-volume volume shared-application-data-replicate-0 type cluster/replicate subvolumes shared-application-data-client-0 shared-application-data-client-1 end-volume volume shared-application-data-write-behind type performance/write-behind subvolumes shared-application-data-replicate-0 end-volume volume shared-application-data-read-ahead type performance/read-ahead subvolumes shared-application-data-write-behind end-volume volume shared-application-data-io-cache type performance/io-cache subvolumes shared-application-data-read-ahead end-volume volume shared-application-data-quick-read type performance/quick-read subvolumes shared-application-data-io-cache end-volume volume shared-application-data-stat-prefetch type performance/stat-prefetch subvolumes shared-application-data-quick-read end-volume volume shared-application-data type debug/io-stats subvolumes shared-application-data-stat-prefetch end-volume In total, four servers mount this via GlusterFS FUSE. For whatever reason (I'm really not sure why), the GlusterFS filesystem has run into a bit of split-brain nightmare (although to my knowledge an actual split brain situation has never occurred in this environment), and I have been getting solidly corrupted issues across the filesystem as well as complaints that the filesystem cannot be self-healed. What I would like to do is completely empty one of the two servers (here I am trying to empty server web01), making the other one (in this case web02) the authoritative source for the data; and then have web01 completely rebuild it's mirror directly from web02. What's the easiest/safest way to do this? Is there a command that I can run that will force web01 to re-initialize it's mirror directly from web02 (and thus completely eradicate all of the split-brain errors and data inconsistencies)? Thanks! -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | r...@goclio.com www.goclio.com | blog <http://www.goclio.com/blog> | twitter<http://www.twitter.com/goclio> | facebook <http://www.facebook.com/goclio> ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Glut of "pure path resolution" warnings
Hi! I'm having a lot of "pure path resolution" warnings thrown into my brick logfile on two servers, and wanted to enquire how I could fix the issue or otherwise get rid of them. I'm using GlusterFS to replicate data across two systems. I'm accessing the data via the GlusterFS FUSE plugin, both from the two hosts involved (web01/web02) as well as a few other hosts that need access to the data. What I'm seeing is an absolute ton of these: [2011-04-29 22:22:21.35197] W [server-resolve.c:565:server_resolve] shared-application-data-server: pure path resolution for /PATH/FOO/BAR (OPEN) [2011-04-29 22:22:22.432584] W [server-resolve.c:565:server_resolve] shared-application-data-server: pure path resolution for /PATH/FOO/BAR2 (OPEN) [2011-04-29 22:22:24.507079] W [server-resolve.c:565:server_resolve] shared-application-data-server: pure path resolution for /PATH/FOO/BAR3 (INODELK) The setup/volfile is this: Given volfile: +--+ 1: volume shared-application-data-client-0 2: type protocol/client 3: option remote-host web01 4: option remote-subvolume /var/glusterfs/bricks/shared 5: option transport-type tcp 6: option ping-timeout 5 7: end-volume 8: 9: volume shared-application-data-client-1 10: type protocol/client 11: option remote-host web02 12: option remote-subvolume /var/glusterfs/bricks/shared 13: option transport-type tcp 14: option ping-timeout 5 15: end-volume 16: 17: volume shared-application-data-replicate-0 18: type cluster/replicate 19: subvolumes shared-application-data-client-0 shared-application-data-client-1 20: end-volume 21: 22: volume shared-application-data-write-behind 23: type performance/write-behind 24: subvolumes shared-application-data-replicate-0 25: end-volume 26: 27: volume shared-application-data-read-ahead 28: type performance/read-ahead 29: subvolumes shared-application-data-write-behind 30: end-volume 31: 32: volume shared-application-data-io-cache 33: type performance/io-cache 34: subvolumes shared-application-data-read-ahead 35: end-volume 36: 37: volume shared-application-data-quick-read 38: type performance/quick-read 39: subvolumes shared-application-data-io-cache 40: end-volume 41: 42: volume shared-application-data-stat-prefetch 43: type performance/stat-prefetch 44: subvolumes shared-application-data-quick-read 45: end-volume 46: 47: volume shared-application-data 48: type debug/io-stats 49: subvolumes shared-application-data-stat-prefetch 50: end-volume +--+ Any pointers on how to fix the pure path resolution warnings? -- Remi Broemeling System Administrator Clio - Practice Management Simplified 1-888-858-2546 x(2^5) | r...@goclio.com www.goclio.com | blog <http://www.goclio.com/blog> | twitter<http://www.twitter.com/goclio> | facebook <http://www.facebook.com/goclio> ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users