Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-19 Thread Pranith Kumar. Karampuri
hi Remi,
This is a classic case of split-brain. See if the md5sum of the files in 
question matches on both web01, web02. If yes you can safely reset the xattr of 
the file on one of the replicas to trigger self-heal. If the md5sums dont 
match, you will have to select the machine you want to keep as the source (In 
your case it is web01), go to the other machine (In your case it is web02) and 
execute the following commands:

1) sudo setxattr -n trusted.afr.shared-application-data-client-0 -v 
0s file-name.
2) then do a find on the file-name,
 that will trigger self-heal and both copies will be in replication again.

Self-heal can cause a performance hit if you trigger self-heal for all the 
files at once if they are BIG files. so trigger 1 after the other upon 
completion in that case.

Let me know if you need any more help with this. Removing the whole web02 data 
and triggering a total self-heal is very expensive operation, I wouldn't do 
that.

Pranith.
- Original Message -
From: Remi Broemeling r...@goclio.com
To: Pranith Kumar. Karampuri prani...@gluster.com
Cc: gluster-users@gluster.org
Sent: Wednesday, May 18, 2011 8:21:33 PM
Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup

Sure, 

These files are just a sampling -- a lot of other files are showing the same 
split-brain behaviour. 

[14:42:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/contact.log 
# file: agc/production/log/809223185/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sBQAA 
[14:45:15][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/contact.log 
# file: agc/production/log/809223185/contact.log 
trusted.afr.shared-application-data-client-0=0sAAACOwAA 
trusted.afr.shared-application-data-client-1=0s 

[14:42:53][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/event.log 
# file: agc/production/log/809223185/event.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sDgAA 
[14:45:24][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/event.log 
# file: agc/production/log/809223185/event.log 
trusted.afr.shared-application-data-client-0=0sAAAGXQAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:02][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223635/contact.log 
# file: agc/production/log/809223635/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sCgAA 
[14:45:28][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223635/contact.log 
# file: agc/production/log/809223635/contact.log 
trusted.afr.shared-application-data-client-0=0sAAAELQAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:39][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224061/contact.log 
# file: agc/production/log/809224061/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sCQAA 
[14:45:32][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224061/contact.log 
# file: agc/production/log/809224061/contact.log 
trusted.afr.shared-application-data-client-0=0sAAAD+AAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:42][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224321/contact.log 
# file: agc/production/log/809224321/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sCAAA 
[14:45:37][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224321/contact.log 
# file: agc/production/log/809224321/contact.log 
trusted.afr.shared-application-data-client-0=0sAAAERAAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809215319/event.log 
# file: agc/production/log/809215319/event.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sBwAA 
[14:45:45][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809215319/event.log 
# file: agc/production/log/809215319/event.log 
trusted.afr.shared-application-data-client-0=0sAAAC/QAA 
trusted.afr.shared-application-data-client-1=0s

Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-19 Thread Pranith Kumar. Karampuri
Remi,
 Sorry I think you want to keep web02 as the source and web01 as the sink, 
so the commands need to be executed on web01:

1) sudo setxattr -n trusted.afr.shared-application-data-client-1 -v 
0s file-name.
2) then do a find on the file-name,

Thanks
Pranith

- Original Message -
From: Pranith Kumar. Karampuri prani...@gluster.com
To: Remi Broemeling r...@goclio.com
Cc: gluster-users@gluster.org
Sent: Thursday, May 19, 2011 2:14:52 PM
Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup

hi Remi,
This is a classic case of split-brain. See if the md5sum of the files in 
question matches on both web01, web02. If yes you can safely reset the xattr of 
the file on one of the replicas to trigger self-heal. If the md5sums dont 
match, you will have to select the machine you want to keep as the source (In 
your case it is web01), go to the other machine (In your case it is web02) and 
execute the following commands:

1) sudo setxattr -n trusted.afr.shared-application-data-client-0 -v 
0s file-name.
2) then do a find on the file-name,
 that will trigger self-heal and both copies will be in replication again.

Self-heal can cause a performance hit if you trigger self-heal for all the 
files at once if they are BIG files. so trigger 1 after the other upon 
completion in that case.

Let me know if you need any more help with this. Removing the whole web02 data 
and triggering a total self-heal is very expensive operation, I wouldn't do 
that.

Pranith.
- Original Message -
From: Remi Broemeling r...@goclio.com
To: Pranith Kumar. Karampuri prani...@gluster.com
Cc: gluster-users@gluster.org
Sent: Wednesday, May 18, 2011 8:21:33 PM
Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup

Sure, 

These files are just a sampling -- a lot of other files are showing the same 
split-brain behaviour. 

[14:42:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/contact.log 
# file: agc/production/log/809223185/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sBQAA 
[14:45:15][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/contact.log 
# file: agc/production/log/809223185/contact.log 
trusted.afr.shared-application-data-client-0=0sAAACOwAA 
trusted.afr.shared-application-data-client-1=0s 

[14:42:53][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/event.log 
# file: agc/production/log/809223185/event.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sDgAA 
[14:45:24][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223185/event.log 
# file: agc/production/log/809223185/event.log 
trusted.afr.shared-application-data-client-0=0sAAAGXQAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:02][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223635/contact.log 
# file: agc/production/log/809223635/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sCgAA 
[14:45:28][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809223635/contact.log 
# file: agc/production/log/809223635/contact.log 
trusted.afr.shared-application-data-client-0=0sAAAELQAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:39][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224061/contact.log 
# file: agc/production/log/809224061/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sCQAA 
[14:45:32][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224061/contact.log 
# file: agc/production/log/809224061/contact.log 
trusted.afr.shared-application-data-client-0=0sAAAD+AAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:42][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224321/contact.log 
# file: agc/production/log/809224321/contact.log 
trusted.afr.shared-application-data-client-0=0s 
trusted.afr.shared-application-data-client-1=0sCAAA 
[14:45:37][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m 
trusted.afr* agc/production/log/809224321/contact.log 
# file: agc/production/log/809224321/contact.log 
trusted.afr.shared-application-data-client-0=0sAAAERAAA 
trusted.afr.shared-application-data-client-1=0s 

[14:43:45][root@web01:/var/glusterfs/bricks/shared]# getfattr

Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-19 Thread Remi Broemeling
Hi Pranith,

OK, thanks, I can do that.

Is there any sign of _how_ we got into this situation?  Anything that I can
go back and look for in the logs that might tell us more about how this
happened and how we can prevent it from happening again?

Thanks again,

Remi

On Thu, May 19, 2011 at 03:12, Pranith Kumar. Karampuri 
prani...@gluster.com wrote:

 Remi,
 Sorry I think you want to keep web02 as the source and web01 as the
 sink, so the commands need to be executed on web01:

 1) sudo setxattr -n trusted.afr.shared-application-data-client-1 -v
 0s file-name.
 2) then do a find on the file-name,

 Thanks
 Pranith

 - Original Message -
 From: Pranith Kumar. Karampuri prani...@gluster.com
 To: Remi Broemeling r...@goclio.com
 Cc: gluster-users@gluster.org
 Sent: Thursday, May 19, 2011 2:14:52 PM
 Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup

 hi Remi,
This is a classic case of split-brain. See if the md5sum of the files in
 question matches on both web01, web02. If yes you can safely reset the xattr
 of the file on one of the replicas to trigger self-heal. If the md5sums dont
 match, you will have to select the machine you want to keep as the source
 (In your case it is web01), go to the other machine (In your case it is
 web02) and execute the following commands:

 1) sudo setxattr -n trusted.afr.shared-application-data-client-0 -v
 0s file-name.
 2) then do a find on the file-name,
  that will trigger self-heal and both copies will be in replication again.

 Self-heal can cause a performance hit if you trigger self-heal for all the
 files at once if they are BIG files. so trigger 1 after the other upon
 completion in that case.

 Let me know if you need any more help with this. Removing the whole web02
 data and triggering a total self-heal is very expensive operation, I
 wouldn't do that.

 Pranith.
 - Original Message -
 From: Remi Broemeling r...@goclio.com
 To: Pranith Kumar. Karampuri prani...@gluster.com
 Cc: gluster-users@gluster.org
 Sent: Wednesday, May 18, 2011 8:21:33 PM
 Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup

 Sure,

 These files are just a sampling -- a lot of other files are showing the
 same split-brain behaviour.

 [14:42:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809223185/contact.log
 # file: agc/production/log/809223185/contact.log
 trusted.afr.shared-application-data-client-0=0s
 trusted.afr.shared-application-data-client-1=0sBQAA
 [14:45:15][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809223185/contact.log
 # file: agc/production/log/809223185/contact.log
 trusted.afr.shared-application-data-client-0=0sAAACOwAA
 trusted.afr.shared-application-data-client-1=0s

 [14:42:53][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809223185/event.log
 # file: agc/production/log/809223185/event.log
 trusted.afr.shared-application-data-client-0=0s
 trusted.afr.shared-application-data-client-1=0sDgAA
 [14:45:24][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809223185/event.log
 # file: agc/production/log/809223185/event.log
 trusted.afr.shared-application-data-client-0=0sAAAGXQAA
 trusted.afr.shared-application-data-client-1=0s

 [14:43:02][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809223635/contact.log
 # file: agc/production/log/809223635/contact.log
 trusted.afr.shared-application-data-client-0=0s
 trusted.afr.shared-application-data-client-1=0sCgAA
 [14:45:28][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809223635/contact.log
 # file: agc/production/log/809223635/contact.log
 trusted.afr.shared-application-data-client-0=0sAAAELQAA
 trusted.afr.shared-application-data-client-1=0s

 [14:43:39][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809224061/contact.log
 # file: agc/production/log/809224061/contact.log
 trusted.afr.shared-application-data-client-0=0s
 trusted.afr.shared-application-data-client-1=0sCQAA
 [14:45:32][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809224061/contact.log
 # file: agc/production/log/809224061/contact.log
 trusted.afr.shared-application-data-client-0=0sAAAD+AAA
 trusted.afr.shared-application-data-client-1=0s

 [14:43:42][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
 trusted.afr* agc/production/log/809224321/contact.log
 # file: agc/production/log/809224321/contact.log
 trusted.afr.shared-application-data-client-0=0s
 trusted.afr.shared-application-data-client-1

Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-18 Thread Pranith Kumar. Karampuri
hi Remi,
  It seems the split-brain is detected on following files:
/agc/production/log/809223185/contact.log
/agc/production/log/809223185/event.log
/agc/production/log/809223635/contact.log
/agc/production/log/809224061/contact.log
/agc/production/log/809224321/contact.log
/agc/production/log/809215319/event.log

Could you give the output of the following command for each file above on both 
the bricks in the replica pair.

getxattr -d -m trusted.afr* filepath

Thanks
Pranith

- Original Message -
From: Remi Broemeling r...@goclio.com
To: gluster-users@gluster.org
Sent: Tuesday, May 17, 2011 9:02:44 PM
Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup


Hi Pranith. Sure, here is a pastebin sampling of logs from one of the hosts: 
http://pastebin.com/1U1ziwjC 


On Mon, May 16, 2011 at 20:48, Pranith Kumar. Karampuri  prani...@gluster.com 
 wrote: 


hi Remi, 
Would it be possible to post the logs on the client, so that we can find what 
issue you are running into. 

Pranith 



- Original Message - 
From: Remi Broemeling  r...@goclio.com  
To: gluster-users@gluster.org 
Sent: Monday, May 16, 2011 10:47:33 PM 
Subject: [Gluster-users] Rebuild Distributed/Replicated Setup 


Hi, 

I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup 
across two servers (web01 and web02) with the following vol config: 

volume shared-application-data-client-0 
type protocol/client 
option remote-host web01 
option remote-subvolume /var/glusterfs/bricks/shared 
option transport-type tcp 
option ping-timeout 5 
end-volume 

volume shared-application-data-client-1 
type protocol/client 
option remote-host web02 
option remote-subvolume /var/glusterfs/bricks/shared 
option transport-type tcp 
option ping-timeout 5 
end-volume 

volume shared-application-data-replicate-0 
type cluster/replicate 
subvolumes shared-application-data-client-0 shared-application-data-client-1 
end-volume 

volume shared-application-data-write-behind 
type performance/write-behind 
subvolumes shared-application-data-replicate-0 
end-volume 

volume shared-application-data-read-ahead 
type performance/read-ahead 
subvolumes shared-application-data-write-behind 
end-volume 

volume shared-application-data-io-cache 
type performance/io-cache 
subvolumes shared-application-data-read-ahead 
end-volume 

volume shared-application-data-quick-read 
type performance/quick-read 
subvolumes shared-application-data-io-cache 
end-volume 

volume shared-application-data-stat-prefetch 
type performance/stat-prefetch 
subvolumes shared-application-data-quick-read 
end-volume 

volume shared-application-data 
type debug/io-stats 
subvolumes shared-application-data-stat-prefetch 
end-volume 

In total, four servers mount this via GlusterFS FUSE. For whatever reason (I'm 
really not sure why), the GlusterFS filesystem has run into a bit of 
split-brain nightmare (although to my knowledge an actual split brain situation 
has never occurred in this environment), and I have been getting solidly 
corrupted issues across the filesystem as well as complaints that the 
filesystem cannot be self-healed. 

What I would like to do is completely empty one of the two servers (here I am 
trying to empty server web01), making the other one (in this case web02) the 
authoritative source for the data; and then have web01 completely rebuild it's 
mirror directly from web02. 

What's the easiest/safest way to do this? Is there a command that I can run 
that will force web01 to re-initialize it's mirror directly from web02 (and 
thus completely eradicate all of the split-brain errors and data 
inconsistencies)? 

Thanks! 
-- 

Remi Broemeling 
System Administrator 
Clio - Practice Management Simplified 
1-888-858-2546 x(2^5) | r...@goclio.com 
www.goclio.com | blog | twitter | facebook 

___ 
Gluster-users mailing list 
Gluster-users@gluster.org 
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users 



-- 

Remi Broemeling 
System Administrator 
Clio - Practice Management Simplified 
1-888-858-2546 x(2^5) | r...@goclio.com 
www.goclio.com | blog | twitter | facebook 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-18 Thread Remi Broemeling
Sure,

These files are just a sampling -- a lot of other files are showing the same
split-brain behaviour.

[14:42:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809223185/contact.log
# file: agc/production/log/809223185/contact.log
trusted.afr.shared-application-data-client-0=0s
trusted.afr.shared-application-data-client-1=0sBQAA
[14:45:15][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809223185/contact.log
# file: agc/production/log/809223185/contact.log
trusted.afr.shared-application-data-client-0=0sAAACOwAA
trusted.afr.shared-application-data-client-1=0s

[14:42:53][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809223185/event.log
# file: agc/production/log/809223185/event.log
trusted.afr.shared-application-data-client-0=0s
trusted.afr.shared-application-data-client-1=0sDgAA
[14:45:24][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809223185/event.log
# file: agc/production/log/809223185/event.log
trusted.afr.shared-application-data-client-0=0sAAAGXQAA
trusted.afr.shared-application-data-client-1=0s

[14:43:02][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809223635/contact.log
# file: agc/production/log/809223635/contact.log
trusted.afr.shared-application-data-client-0=0s
trusted.afr.shared-application-data-client-1=0sCgAA
[14:45:28][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809223635/contact.log
# file: agc/production/log/809223635/contact.log
trusted.afr.shared-application-data-client-0=0sAAAELQAA
trusted.afr.shared-application-data-client-1=0s

[14:43:39][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809224061/contact.log
# file: agc/production/log/809224061/contact.log
trusted.afr.shared-application-data-client-0=0s
trusted.afr.shared-application-data-client-1=0sCQAA
[14:45:32][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809224061/contact.log
# file: agc/production/log/809224061/contact.log
trusted.afr.shared-application-data-client-0=0sAAAD+AAA
trusted.afr.shared-application-data-client-1=0s

[14:43:42][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809224321/contact.log
# file: agc/production/log/809224321/contact.log
trusted.afr.shared-application-data-client-0=0s
trusted.afr.shared-application-data-client-1=0sCAAA
[14:45:37][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809224321/contact.log
# file: agc/production/log/809224321/contact.log
trusted.afr.shared-application-data-client-0=0sAAAERAAA
trusted.afr.shared-application-data-client-1=0s

[14:43:45][root@web01:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809215319/event.log
# file: agc/production/log/809215319/event.log
trusted.afr.shared-application-data-client-0=0s
trusted.afr.shared-application-data-client-1=0sBwAA
[14:45:45][root@web02:/var/glusterfs/bricks/shared]# getfattr -d -m
trusted.afr* agc/production/log/809215319/event.log
# file: agc/production/log/809215319/event.log
trusted.afr.shared-application-data-client-0=0sAAAC/QAA
trusted.afr.shared-application-data-client-1=0s

On Wed, May 18, 2011 at 01:31, Pranith Kumar. Karampuri 
prani...@gluster.com wrote:

 hi Remi,
  It seems the split-brain is detected on following files:
 /agc/production/log/809223185/contact.log
 /agc/production/log/809223185/event.log
 /agc/production/log/809223635/contact.log
 /agc/production/log/809224061/contact.log
 /agc/production/log/809224321/contact.log
 /agc/production/log/809215319/event.log

 Could you give the output of the following command for each file above on
 both the bricks in the replica pair.

 getxattr -d -m trusted.afr* filepath

 Thanks
 Pranith

 - Original Message -
 From: Remi Broemeling r...@goclio.com
 To: gluster-users@gluster.org
 Sent: Tuesday, May 17, 2011 9:02:44 PM
 Subject: Re: [Gluster-users] Rebuild Distributed/Replicated Setup


 Hi Pranith. Sure, here is a pastebin sampling of logs from one of the
 hosts: http://pastebin.com/1U1ziwjC


 On Mon, May 16, 2011 at 20:48, Pranith Kumar. Karampuri 
 prani...@gluster.com  wrote:


 hi Remi,
 Would it be possible to post the logs on the client, so that we can find
 what issue you are running into.

 Pranith



 - Original Message -
 From: Remi Broemeling  r...@goclio.com 
 To: gluster-users@gluster.org
 Sent: Monday, May 16, 2011 10:47:33 PM
 Subject: [Gluster-users] Rebuild Distributed

Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-17 Thread Remi Broemeling
Hi Pranith.  Sure, here is a pastebin sampling of logs from one of the
hosts: http://pastebin.com/1U1ziwjC

On Mon, May 16, 2011 at 20:48, Pranith Kumar. Karampuri 
prani...@gluster.com wrote:

 hi Remi,
Would it be possible to post the logs on the client, so that we can find
 what issue you are running into.

 Pranith
 - Original Message -
 From: Remi Broemeling r...@goclio.com
 To: gluster-users@gluster.org
 Sent: Monday, May 16, 2011 10:47:33 PM
 Subject: [Gluster-users] Rebuild Distributed/Replicated Setup


 Hi,

 I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM)
 setup across two servers (web01 and web02) with the following vol config:

 volume shared-application-data-client-0
 type protocol/client
 option remote-host web01
 option remote-subvolume /var/glusterfs/bricks/shared
 option transport-type tcp
 option ping-timeout 5
 end-volume

 volume shared-application-data-client-1
 type protocol/client
 option remote-host web02
 option remote-subvolume /var/glusterfs/bricks/shared
 option transport-type tcp
 option ping-timeout 5
 end-volume

 volume shared-application-data-replicate-0
 type cluster/replicate
 subvolumes shared-application-data-client-0
 shared-application-data-client-1
 end-volume

 volume shared-application-data-write-behind
 type performance/write-behind
 subvolumes shared-application-data-replicate-0
 end-volume

 volume shared-application-data-read-ahead
 type performance/read-ahead
 subvolumes shared-application-data-write-behind
 end-volume

 volume shared-application-data-io-cache
 type performance/io-cache
 subvolumes shared-application-data-read-ahead
 end-volume

 volume shared-application-data-quick-read
 type performance/quick-read
 subvolumes shared-application-data-io-cache
 end-volume

 volume shared-application-data-stat-prefetch
 type performance/stat-prefetch
 subvolumes shared-application-data-quick-read
 end-volume

 volume shared-application-data
 type debug/io-stats
 subvolumes shared-application-data-stat-prefetch
 end-volume

 In total, four servers mount this via GlusterFS FUSE. For whatever reason
 (I'm really not sure why), the GlusterFS filesystem has run into a bit of
 split-brain nightmare (although to my knowledge an actual split brain
 situation has never occurred in this environment), and I have been getting
 solidly corrupted issues across the filesystem as well as complaints that
 the filesystem cannot be self-healed.

 What I would like to do is completely empty one of the two servers (here I
 am trying to empty server web01), making the other one (in this case web02)
 the authoritative source for the data; and then have web01 completely
 rebuild it's mirror directly from web02.

 What's the easiest/safest way to do this? Is there a command that I can run
 that will force web01 to re-initialize it's mirror directly from web02 (and
 thus completely eradicate all of the split-brain errors and data
 inconsistencies)?

 Thanks!
 --

 Remi Broemeling
 System Administrator
 Clio - Practice Management Simplified
 1-888-858-2546 x(2^5) | r...@goclio.com
 www.goclio.com | blog | twitter | facebook

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users




-- 
Remi Broemeling
System Administrator
Clio - Practice Management Simplified
1-888-858-2546 x(2^5) | r...@goclio.com
www.goclio.com | blog http://www.goclio.com/blog |
twitterhttp://www.twitter.com/goclio
 | facebook http://www.facebook.com/goclio
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-16 Thread Remi Broemeling
Hi,

I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup
across two servers (web01 and web02) with the following vol config:

volume shared-application-data-client-0
type protocol/client
option remote-host web01
option remote-subvolume /var/glusterfs/bricks/shared
option transport-type tcp
option ping-timeout 5
end-volume

volume shared-application-data-client-1
type protocol/client
option remote-host web02
option remote-subvolume /var/glusterfs/bricks/shared
option transport-type tcp
option ping-timeout 5
end-volume

volume shared-application-data-replicate-0
type cluster/replicate
subvolumes shared-application-data-client-0
shared-application-data-client-1
end-volume

volume shared-application-data-write-behind
type performance/write-behind
subvolumes shared-application-data-replicate-0
end-volume

volume shared-application-data-read-ahead
type performance/read-ahead
subvolumes shared-application-data-write-behind
end-volume

volume shared-application-data-io-cache
type performance/io-cache
subvolumes shared-application-data-read-ahead
end-volume

volume shared-application-data-quick-read
type performance/quick-read
subvolumes shared-application-data-io-cache
end-volume

volume shared-application-data-stat-prefetch
type performance/stat-prefetch
subvolumes shared-application-data-quick-read
end-volume

volume shared-application-data
type debug/io-stats
subvolumes shared-application-data-stat-prefetch
end-volume

In total, four servers mount this via GlusterFS FUSE.  For whatever reason
(I'm really not sure why), the GlusterFS filesystem has run into a bit of
split-brain nightmare (although to my knowledge an actual split brain
situation has never occurred in this environment), and I have been getting
solidly corrupted issues across the filesystem as well as complaints that
the filesystem cannot be self-healed.

What I would like to do is completely empty one of the two servers (here I
am trying to empty server web01), making the other one (in this case web02)
the authoritative source for the data; and then have web01 completely
rebuild it's mirror directly from web02.

What's the easiest/safest way to do this?  Is there a command that I can run
that will force web01 to re-initialize it's mirror directly from web02 (and
thus completely eradicate all of the split-brain errors and data
inconsistencies)?

Thanks!
-- 
Remi Broemeling
System Administrator
Clio - Practice Management Simplified
1-888-858-2546 x(2^5) | r...@goclio.com
www.goclio.com | blog http://www.goclio.com/blog |
twitterhttp://www.twitter.com/goclio
 | facebook http://www.facebook.com/goclio
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebuild Distributed/Replicated Setup

2011-05-16 Thread Pranith Kumar. Karampuri
hi Remi,
Would it be possible to post the logs on the client, so that we can find 
what issue you are running into.

Pranith
- Original Message -
From: Remi Broemeling r...@goclio.com
To: gluster-users@gluster.org
Sent: Monday, May 16, 2011 10:47:33 PM
Subject: [Gluster-users] Rebuild Distributed/Replicated Setup


Hi, 

I've got a distributed/replicated GlusterFS v3.1.2 (installed via RPM) setup 
across two servers (web01 and web02) with the following vol config: 

volume shared-application-data-client-0 
type protocol/client 
option remote-host web01 
option remote-subvolume /var/glusterfs/bricks/shared 
option transport-type tcp 
option ping-timeout 5 
end-volume 

volume shared-application-data-client-1 
type protocol/client 
option remote-host web02 
option remote-subvolume /var/glusterfs/bricks/shared 
option transport-type tcp 
option ping-timeout 5 
end-volume 

volume shared-application-data-replicate-0 
type cluster/replicate 
subvolumes shared-application-data-client-0 shared-application-data-client-1 
end-volume 

volume shared-application-data-write-behind 
type performance/write-behind 
subvolumes shared-application-data-replicate-0 
end-volume 

volume shared-application-data-read-ahead 
type performance/read-ahead 
subvolumes shared-application-data-write-behind 
end-volume 

volume shared-application-data-io-cache 
type performance/io-cache 
subvolumes shared-application-data-read-ahead 
end-volume 

volume shared-application-data-quick-read 
type performance/quick-read 
subvolumes shared-application-data-io-cache 
end-volume 

volume shared-application-data-stat-prefetch 
type performance/stat-prefetch 
subvolumes shared-application-data-quick-read 
end-volume 

volume shared-application-data 
type debug/io-stats 
subvolumes shared-application-data-stat-prefetch 
end-volume 

In total, four servers mount this via GlusterFS FUSE. For whatever reason (I'm 
really not sure why), the GlusterFS filesystem has run into a bit of 
split-brain nightmare (although to my knowledge an actual split brain situation 
has never occurred in this environment), and I have been getting solidly 
corrupted issues across the filesystem as well as complaints that the 
filesystem cannot be self-healed. 

What I would like to do is completely empty one of the two servers (here I am 
trying to empty server web01), making the other one (in this case web02) the 
authoritative source for the data; and then have web01 completely rebuild it's 
mirror directly from web02. 

What's the easiest/safest way to do this? Is there a command that I can run 
that will force web01 to re-initialize it's mirror directly from web02 (and 
thus completely eradicate all of the split-brain errors and data 
inconsistencies)? 

Thanks! 
-- 

Remi Broemeling 
System Administrator 
Clio - Practice Management Simplified 
1-888-858-2546 x(2^5) | r...@goclio.com 
www.goclio.com | blog | twitter | facebook 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users