Re: [Gluster-users] Split-brain

2014-02-20 Thread William Kwan
Hi all,

really need some guidance here.  Is this a good direction?
http://www.gluster.org/2012/07/fixing-split-brain-with-glusterfs-3-3/





On Thursday, February 20, 2014 5:43 PM, William Kwan  wrote:
 
Hi all,

Running glusterfs-3.4.2-1.el6.x86_6 on centos6.5

Due to some smart people screw up my network connection on the nodes for don't 
know how long. I found that I have my GlusterFS volume in split-brain.  I 
googled and found different way to clean this.  I need some extra help on this.

# gluster volume heal kvm1 info split-brain
Gathering Heal info on volume kvm1 has been successful

Brick mgmt1:/gluster/brick1
Number of entries: 21
at    path on brick
---
2014-02-20 22:33:41
 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:33:41 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3
2014-02-20 22:33:41 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
... 

Brick mgmt2:/gluster/brick1
Number of entries: 28
at    path on brick
---
2014-02-20 22:37:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3
2014-02-20
 22:27:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:27:38 /d058a735-0fca-430a-a3d7-cf0
... 


1. what's the best way?  

2.gluster volume heal doesnt' really save this, right?

3. kind of shooting from the dark as I can't see the data content. The volume 
is holding VM images.  Picking the latest copies should be good?

Thanks
Will___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain

2014-02-20 Thread William Kwan
Hi all,

really need some guidance here.  Is this a good direction?
http://www.gluster.org/2012/07/fixing-split-brain-with-glusterfs-3-3/





On Thursday, February 20, 2014 5:43 PM, William Kwan  wrote:
 
Hi all,

Running glusterfs-3.4.2-1.el6.x86_6 on centos6.5

Due to some smart people screw up my network connection on the nodes for don't 
know how long. I found that I have my GlusterFS volume in split-brain.  I 
googled and found different way to clean this.  I need some extra help on this.

# gluster volume heal kvm1 info split-brain
Gathering Heal info on volume kvm1 has been successful

Brick mgmt1:/gluster/brick1
Number of entries: 21
at    path on brick
---
2014-02-20 22:33:41
 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:33:41 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3
2014-02-20 22:33:41 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
... 

Brick mgmt2:/gluster/brick1
Number of entries: 28
at    path on brick
---
2014-02-20 22:37:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3
2014-02-20
 22:27:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:27:38 /d058a735-0fca-430a-a3d7-cf0
... 


1. what's the best way?  

2.gluster volume heal doesnt' really save this, right?

3. kind of shooting from the dark as I can't see the data content. The volume 
is holding VM images.  Picking the latest copies should be good?

Thanks
Will___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Failed cleanup on peer probe tmp file causes volume re-initialization problems

2014-02-20 Thread Chalcogen

Hi everybody,

This is more of a part of a larger wishlist:

I found out that when a peer probe is performed by the user, 
mgmt/glusterd write a file named after the hostname of the peer in 
question. On successful probes, this file is replaced with a file named 
after the UUID of the glusterd instance on the peer, while a failed 
probe causes the temp file to simply get deleted.


Here's an illustration:

root@someserver:/var/lib/glusterd/peers] gluster peer probe some_non_host &
[1] 25918
root@someserver:/var/lib/glusterd/peers] cat some_non_host
uuid=----
state=0
hostname1=ksome_non_host
root@someserver:/var/lib/glusterd/peers]
root@someserver:/var/lib/glusterd/peers] peer probe: failed: Probe 
returned with unknown errno 107


[1]+  Exit 1  gluster peer probe some_non_host
root@someserver:/var/lib/glusterd/peers] ls
root@someserver:/var/lib/glusterd/peers]

Here's the deal. When, for some reason, glulsterd is killed off before 
it get a chance to clean up on the temp file (say for a peer that really 
doesn't exist), and then, if you reboot your machine, the temporary file 
will really break mgmt/glusterd's recovery graph, and glusterd will be 
unable to initialize any of the existing volumes without having to 
delete the tmp file manually.


It seems to me that mgmt/glusterd should have the intelligence to 
distinguish between a genuine peer and a temp file created during probe. 
The temp file should not affect the recovery graph after reboot. 
Something like a .tmp? Preferably, also delete any temp file 
discovered during recovery at startup?


I reported a bug over this at bugzilla. Its 
https://bugzilla.redhat.com/show_bug.cgi?id=1067733.


Thanks,
Anirban
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain

2014-02-20 Thread Joe Julian


On 02/20/2014 02:43 PM, William Kwan wrote:

Hi all,

Running glusterfs-3.4.2-1.el6.x86_6 on centos6.5

Due to some smart people screw up my network connection on the nodes 
for don't know how long. I found that I have my GlusterFS volume in 
split-brain.  I googled and found different way to clean this.  I need 
some extra help on this.


# gluster volume heal kvm1 info split-brain
Gathering Heal info on volume kvm1 has been successful

Brick mgmt1:/gluster/brick1
Number of entries: 21
atpath on brick
---
2014-02-20 22:33:41 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:33:41 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3

2014-02-20 22:33:41 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
... 

Brick mgmt2:/gluster/brick1
Number of entries: 28
atpath on brick
---
2014-02-20 22:37:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3

2014-02-20 22:27:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:27:38 /d058a735-0fca-430a-a3d7-cf0
... 


1. what's the best way?
Here's the write-up I did about split-brain: 
http://joejulian.name/blog/fixing-split-brain-with-glusterfs-33/


2.gluster volume heal doesnt' really save this, right?
No, the nature of split-brain is such that there is no automated way to 
recover from it.


3. kind of shooting from the dark as I can't see the data content. The 
volume is holding VM images.  Picking the latest copies should be good?
That does seem a reasonably safe assumption, especially if your vm's are 
cattle instead of kittens 
.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Split-brain

2014-02-20 Thread William Kwan
Hi all,

Running glusterfs-3.4.2-1.el6.x86_6 on centos6.5

Due to some smart people screw up my network connection on the nodes for don't 
know how long. I found that I have my GlusterFS volume in split-brain.  I 
googled and found different way to clean this.  I need some extra help on this.

# gluster volume heal kvm1 info split-brain
Gathering Heal info on volume kvm1 has been successful

Brick mgmt1:/gluster/brick1
Number of entries: 21
at    path on brick
---
2014-02-20 22:33:41 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:33:41 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3
2014-02-20 22:33:41 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
... 

Brick mgmt2:/gluster/brick1
Number of entries: 28
at    path on brick
---
2014-02-20 22:37:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/714c56a8-db1d-42d5-bf76-869bd6c87eef/0ea0a280-4c2c-48ab-ad95-8cb48e6cf02b
2014-02-20 22:37:38 
/d058a735-0fca-430a-a3d7-cf0a77097e5d/images/20b728b6-dd39-4d2e-a5c0-2dee22df6e95/a6a9b083-b04c-4ac8-86cb-ed4eb697c2c3
2014-02-20 22:27:38 /d058a735-0fca-430a-a3d7-cf0a77097e5d/dom_md/ids
2014-02-20 22:27:38 /d058a735-0fca-430a-a3d7-cf0
... 


1. what's the best way?  

2.gluster volume heal doesnt' really save this, right?

3. kind of shooting from the dark as I can't see the data content. The volume 
is holding VM images.  Picking the latest copies should be good?

Thanks
Will___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] confused about replicated volumes and sparse files

2014-02-20 Thread Alastair Neil
I am trying to understand how  verify that a replicated volume is up to
date.


Here is my scenario.  I have a gluster cluster with two nodes serving vm
images to ovirt.

I have a volume called vm-store with a brick from each of the nodes:

Volume Name: vm-store
> Type: Replicate
> Volume ID: 379e52d3-2622-4834-8aef-b255db1c67af
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: gluster1:/export/brick0
> Brick2: gluster0:/export/brick0
> Options Reconfigured:
> user.cifs: disable
> nfs.rpc-auth-allow: *
> auth.allow: *
> storage.owner-gid: 36
> storage.owner-uid: 36


The bricks are formated with xfs using the same options on both servers and
the two servers are identical hardware and OS version and release (CentOS
6.5) with glusterfs v 3.4.2  from bits.gluster.org.

I have a 20GB sparse disk image for a VM but I am confused about why I see
differend reported disk usage on each of the nodes:

[root@gluster0 ~]#  du -sh /export/brick0
> 48G /export/brick0
> [root@gluster0 ~]# du -sh
> /export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13
> 8.6G
> /export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13
> [root@gluster1 ~]# du -sh /export/brick0
> 52G /export/brick0
> [root@gluster1 ~]# du -sh
> /export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13
> 12G
> /export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13


sure enough stat also shows different number of blocks:

[root@gluster0 ~]# stat
> /export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13
>   File:
> `/export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13'
>   Size: 21474836480 Blocks: 17927384   IO Block: 4096   regular file
> Device: fd03h/64771d Inode: 1610613256  Links: 2
> Access: (0660/-rw-rw)  Uid: (   36/vdsm)   Gid: (   36/ kvm)
> Access: 2014-02-18 17:06:30.661993000 -0500
> Modify: 2014-02-20 13:29:33.507966199 -0500
> Change: 2014-02-20 13:29:33.507966199 -0500
> [root@gluster1 ~]# stat
> /export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13
>   File:
> `/export/brick0/6d637c7f-a4ab-4510-a0d9-63a04c55d6d8/images/5dfc7c6f-d35d-4831-b2fb-ed9ab8e3392b/5933a44e-77d6-4606-b6a9-bbf7e4235b13'
>   Size: 21474836480 Blocks: 24735976   IO Block: 4096   regular file
> Device: fd03h/64771d Inode: 3758096942  Links: 2
> Access: (0660/-rw-rw)  Uid: (   36/vdsm)   Gid: (   36/ kvm)
> Access: 2014-02-20 09:30:38.490724245 -0500
> Modify: 2014-02-20 13:29:39.464913739 -0500
> Change: 2014-02-20 13:29:39.465913754 -0500



Can someone clear up my understanding?

Thanks, Alastair
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Disable Linux OOM killer on GlusterFS FUSE cllient

2014-02-20 Thread Justin Dossey
Hi all,

Here's a tip for those of you running FUSE clients on memory-constrained
boxes.

Since the glusterfs client process is just a process, the OOM killer may
select it for killing if the system runs out of memory for some reason.
 This condition happens once in a while to me on my image thumbnailing
servers-- people upload all kinds of crazy images to our site and
occasionally one of them causes ImageMagick to suck up all available
memory.

If the kernel kills the glusterfs process, the FUSE mount in question
becomes inaccessible and must be unmounted and remounted.  This is bad.

Anyway, I use puppet to disable the OOM killer on the glusterfs process.
 Here's the exec I use for it:

  exec { 'disable-oom-killer-on-glusterfs-client':
path=> "/bin:/sbin:/usr/bin:/usr/sbin",
command => 'echo -17  > /proc/$(pidof
/usr/sbin/glusterfs)/oom_score_adj',
onlyif  => 'test 0 = $(cat /proc/$(pidof
/usr/sbin/glusterfs)/oom_score_adj)',
require => Package['glusterfs-client'],
  }

Another tip is to keep a sigil file in your volume, say "MOUNTED", which
you can test for with your monitoring agent to see if the filesystem is
accessible.  Of course, if you can't stat() the MOUNTED file, the volume is
not mounted.

Hope this helps others!

-- 
Justin Dossey
CTO, PodOmatic
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] fractured/split glusterfs - 2 up, 2 down for an hour - SOLVED

2014-02-20 Thread Khoi Mai
harry mangalam,

http://supercolony.gluster.org/pipermail/gluster-users/2014-January/038492.html

have you experienced the systems you described again?  I too am seeing 
this, and amazingly the "umount " remount appeared to address 
it.  But how it gets into that state or why a client would require such 
action while others with the same glusterfs volume mounted do not exhibit 
that same behavior?

Khoi Mai
Union Pacific Railroad
Distributed Engineering & Architecture
Project Engineer



**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster volume heal info take long time

2014-02-20 Thread Chiku

Hello,

Right now, gluster volume heal MyVOL info take too long and displays 
nothing.

MyVOL is replica 2 which I added another brick to replica 3.
Since I added the new brick and did find on the volume from a glusterfs 
client, the healing process triggered and the new brick is getting the 
files.

But the command heal info seeme get timeout.
The gluster volume heal info command may display too much lines, and get 
a timeout, so displayed nothing.


Is to possible to fix it and get the command work again ?

(glusterfs version is 3.3.2)


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users