Re: [Gluster-users] Question on the rename atomicity behavior in gluster

2016-03-22 Thread Mohammed Rafi K C


On 03/23/2016 04:10 AM, Rama Shenai wrote:
> Hi, We had some questions with respect to expectations of atomicity of
> rename in gluster.
>
> To elaborate : 
>
> We have setup with two machines (lets call them M1 and M2) which
> essentially access a file (F) on a gluster volume (mounted by M1 and M2)
> A program does the following steps sequentially on each of the two
> machines (M1 & M2) in an infinite loop
>  1) Opens the file F in O_RDWR|O_EXCL mode, reads some data and closes (F)
>  2)  Renames some other file F' => F
>  
> Periodically either M1 or M2 sees a "Stalefile handle error" when it
> tries to read the file (step (1)) after opening the file in
> O_RDWR|O_EXCL (the open is successful)
>
> The specific error reported the client volume logs
>  (/var/log/glusterfs/mnt-repos-volume1.log)
> [2016-03-21 16:53:17.897902] I [dht-rename.c:1344:dht_rename]
> 0-volume1-dht: renaming master.lock
> (hash=volume1-replicate-0/cache=volume1-replicate-0) => master
> (hash=volume1-replicate-0/cache=)
> [2016-03-21 16:53:18.735090] W
> [client-rpc-fops.c:504:client3_3_stat_cbk] 0-volume1-client-0: remote
> operation failed: Stale file handle

Hi Rama,

ESTALE error in rename normally generated when either the source file is
not resolvable (deleted or inaccessible) or when the parent of
destination is not resolvable. It can happen when let's say file F' was
present when your application did a lookup before rename, but if it is
got renamed by Node M1 before M2 could rename it. Basically a race
between two rename on the same file can result in ESTALE for either of one.

To confirm this, Can you please paste the log message from brick
"0-volume1-client-0". You can find out the brick name from the graph.

Also if you can share the program or snippet that used to reproduce this
issue, that would be great.

Rafi KC



>
> We see no error when: have two processes of the above program running
> on the same machine (say on M1) accessing the file F on the gluster
> volume, for which we want to understand the expectations of atomicity
> in gluster specifically specifically for rename, and if the above is a
> bug.
>
> Also  glusterfs --version => glusterfs 3.6.9 built on Mar  2 2016 18:21:14
>
> We also would like to know if there any parameter in the one
> translators  that we can tweak to prevent this problem
>
> Any help or insights here is appreciated
>
> Thanks
> Rama
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster volume heal info split brain command not showing files in split-brain

2016-03-22 Thread Anuradha Talur


- Original Message -
> From: "ABHISHEK PALIWAL" 
> To: "Anuradha Talur" 
> Cc: gluster-users@gluster.org, gluster-de...@gluster.org
> Sent: Monday, March 21, 2016 10:44:53 AM
> Subject: Re: [Gluster-users] gluster volume heal info split brain command not 
> showing files in split-brain
> 
> Hi Anuradha,
> 
> Have you got any pointer from the above scenarios.
> 
Hi Abhishek,

I went through all the logs that you have given.
There is only one brick's log in the info you provided,
and for only one day. Where is the other brick's logfile?

In the same log file I see a lot of connects and disconnects in quick 
succession.
Which could be the cause of gfid mismatch if I/O was going on during the time.
The other logs that have been provided also do not have enough information to
determine how your setup could have ended up with no pending markers.

I understand that output from heal info split-brain is more easy to get info
for files in split-brain. But without pending markers, this info cannot be 
obtained.

For second scenario, is your self-heal-daemon on?
> Regards,
> Abhishek
> 
> On Fri, Mar 18, 2016 at 11:18 AM, ABHISHEK PALIWAL 
> wrote:
> 
> >
> >
> > On Fri, Mar 18, 2016 at 1:41 AM, Anuradha Talur  wrote:
> >
> >>
> >>
> >> - Original Message -
> >> > From: "ABHISHEK PALIWAL" 
> >> > To: "Anuradha Talur" 
> >> > Cc: gluster-users@gluster.org, gluster-de...@gluster.org
> >> > Sent: Thursday, March 17, 2016 4:00:58 PM
> >> > Subject: Re: [Gluster-users] gluster volume heal info split brain
> >> command not showing files in split-brain
> >> >
> >> > Hi Anuradha,
> >> >
> >> > Please confirm me, this is bug in glusterfs or we need to do something
> >> at
> >> > our end.
> >> >
> >> > Because this problem is stopping our development.
> >> Hi Abhishek,
> >>
> >> When you say file is not getting sync, do you mean that the files are not
> >> in sync after healing or that the existing GFID mismatch that you tried to
> >> heal failed?
> >> In one of the previous mails, you said that the GFID mismatch problem is
> >> resolved, is it not so?
> >>
> >
> > As I mentioned I have two scenario:
> > 1. First scenario is where files are in split-brain but not recognized by
> > the the split-brain and heal info commands. So we are identifying those
> > file when I/O errors occurred on those files (the same method mentioned in
> > the link which you shared earlier) but this method is not reliable in our
> > case because other modules have the dependencies on this file and those
> > modules can't wait until heal in progress. In this case we required manual
> > identification of the file those are falling in I/O error which is somehow
> > not the correct way. It is better if the split-brain or heal info command
> > identify the files and based on the output we will perform the self healing
> > on those files only.
> >
> > 2. Second scenario in which we have one log file which have the fixed size
> > and wrapping of data properties and continuously written by the system even
> > when the other brick is down or rebooting. In this case, we have two brick
> > in replica mode and when one goes down and comes up but this file remains
> > out of sync. We are not getting any of the following on this file:
> > A. Not recognized by the split-brain and heal info command.
> > B. Not getting any I/O error
> > C. Do not have the GFID mismatch
> >
> > Here, are the getfattr output of this file
> >
> > Brick B which rebooted and have the file out of sync
> >
> > getfattr -d -m . -e hex
> > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> >
> > # file:
> > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> > trusted.afr.c_glusterfs-client-1=0x
> > trusted.afr.dirty=0x
> > trusted.bit-rot.version=0x000b56d6dd1d000ec7a9
> > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
> >
> >
> > Brick A where file was getting updated when Brick B was rebooting
> >
> > getfattr -d -m . -e hex
> > opt/lvmdir/c2/brick/logfiles/availability/CELLO_AVAILABILITY2_LOG.xml
> > trusted.afr.c_glusterfs-client-0=0x0008
> > trusted.afr.c_glusterfs-client-2=0x0002
> > trusted.afr.c_glusterfs-client-4=0x0002
> > trusted.afr.c_glusterfs-client-6=0x0002
> > trusted.afr.dirty=0x
> > trusted.bit-rot.version=0x000b56d6dcb7000c87e7
> > trusted.gfid=0x9f5e354ecfda40149ddce7d5ffe760ae
> >
> > This scenario is not 100% reproducible but out of 20 cycle we can
> > reproduce it one or two times.
> >
> >
> >> To your question about finding the files in split-brain, can you try
> >> running gluster volume heal  info? Heal info is also supposed to
> >> show
> >> the files in split-brain.
> >
> >
> > This heal info command is also not working.
> >
> >
> >> If the GFID mismatch is not resolved yet, it would really help understand
> >> the underlying problem

Re: [Gluster-users] GlusterFS 3.7.9 released

2016-03-22 Thread Lindsay Mathieson
On 23 March 2016 at 15:41, Kaleb KEITHLEY  wrote:

> Looks the build machine's pbuilder apt-cache got polluted somehow. I've
> rebuilt the apt-cache and rebuilt the packages.
>
> They're on download.gluster.org now.
>

Thanks Kaleb, that did the trick.


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Russell Purinton
Strange, I’ve setup dozens of virtualization clusters, and never used anything 
other than bridged networking.  I haven’t really met anyone using routed or NAT 
mode, but I guess there’s a first time for everything.

You might be able to get the NAT mode to work for you, to pass traffic between 
the networks, you probably only need to enable IP forwarding on the KVM host, 
and make sure the FORWARD chain in iptables accepts the packets.

Russ

> On Mar 23, 2016, at 12:10 AM, Joshua J. Kugler  wrote:
> 
> On Tuesday, March 22, 2016 23:47:05 Russell Purinton wrote:
>> The routing table looks normal.   That 3rd statement that Pawan mentioned is
>> just a normal default gateway.   Nothing wrong there.
>> 
>> I suspect the issue is at the virtual network layer…  > mode='route’/>  seems suspect.
>> 
>> http://serverfault.com/questions/270931/routing-networking-on-kvm
>> 
>> 
>> I think you’d want to setup Bridge mode interfaces.
> 
> Yeah, I think it is something in the network layer.  However, it worked fine 
> when the KVM config was using NAT. A bridged interface requires linking up 
> with 
> a physical interface on the host machine, and I can't do that on the machine 
> I'm on at the moment.
> 
> Really not sure what's going on here. I wish there was a way for two 'nat' 
> type virbr networks to talk to each other, I wouldn't need the 'route' type.
> 
> j
> 
> -- 
> Joshua J. Kugler - Fairbanks, Alaska
> Azariah Enterprises - Programming and Website Design
> jos...@azariah.com - Jabber: pedah...@gmail.com
> PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7.9 released

2016-03-22 Thread Kaleb KEITHLEY

On 03/23/2016 06:13 AM, Alan Millar wrote:

Anyone have any success in updating to 3.7.9 on Debian Jessie?

I'm seeing dependency problems, when trying to install 3.7.9 using the Debian 
Jessie packages on download.gluster.org.


For example, it says it wants liburcu4.

  Depends: liburcu4 (>= 0.8.4) but it is not installable

I can only find liburcu2 for Jessie.


https://packages.debian.org/search?searchon=names&keywords=liburcu

It looks similar for some of the other dependencies also, like libtinfo5 and 
libssl1.0.2

Did the Jessie packages accidentally get built with the spec file for sid or 
stretch, possibly?  Or is my system broken and I'm looking at the wrong thing?


Looks the build machine's pbuilder apt-cache got polluted somehow. I've 
rebuilt the apt-cache and rebuilt the packages.


They're on download.gluster.org now.

--

Kaleb

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Joshua J. Kugler
On Tuesday, March 22, 2016 23:47:05 Russell Purinton wrote:
> The routing table looks normal.   That 3rd statement that Pawan mentioned is
> just a normal default gateway.   Nothing wrong there.
> 
> I suspect the issue is at the virtual network layer…   mode='route’/>  seems suspect.
> 
> http://serverfault.com/questions/270931/routing-networking-on-kvm
> 
> 
> I think you’d want to setup Bridge mode interfaces.

Yeah, I think it is something in the network layer.  However, it worked fine 
when the KVM config was using NAT. A bridged interface requires linking up with 
a physical interface on the host machine, and I can't do that on the machine 
I'm on at the moment.

Really not sure what's going on here. I wish there was a way for two 'nat' 
type virbr networks to talk to each other, I wouldn't need the 'route' type.

j

-- 
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: pedah...@gmail.com
PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Russell Purinton
The routing table looks normal.   That 3rd statement that Pawan mentioned is 
just a normal default gateway.   Nothing wrong there.

I suspect the issue is at the virtual network layer…   
> Cheers
> Dev
> 
> On Wed, Mar 23, 2016 at 4:12 PM, Joshua J. Kugler  > wrote:
> On Tuesday, March 22, 2016 22:12:00 Russell Purinton wrote:
> > If the subnet mask is wrong on 122.11 it may forward all traffic to the
> > default gateway.  The default gateway may be configured to NAT traffic from
> > the LAN, so the response packet would be seen by .10 as coming from .1.
> 
> So, it turns out the subnet isn't wrong, but for some reason, it's still
> routing through the gateway, and appears to be coming from .1, instead of .11.
> I'm not sure why.  This is a libvirt network, configured thus:
> 
> 
>   default
>   f137a5c4-1dd2-453a-a6e6-c161f2918d41
>   
>   
>   
>   
> 
>   
> 
>   
> 
> 
> When this was working, I was using forward mode=nat, but then two different
> libvirt networks couldn't talk to each other.  The two machines are on the
> same segment, on the same virtual switch. I'm not sure why they are getting
> routing through the gateway.  Off to do more troubleshooting! :)
> 
> j
> 
> --
> Joshua J. Kugler - Fairbanks, Alaska
> Azariah Enterprises - Programming and Website Design
> jos...@azariah.com  - Jabber: pedah...@gmail.com 
> 
> PGP Key: http://pgp.mit.edu/   ID 0x73B13B6A
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users 
> 
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Pawan Devaiah
The third statement in your route table states that for any network the
default gateway is 192.168.122.1

I would suggest either you remove that statement or increase the metrics
from 0, you already have a route to 192.168.122.0, it is in same broadcast
domain so you don't have to go through the gateway.

Cheers
Dev

On Wed, Mar 23, 2016 at 4:12 PM, Joshua J. Kugler 
wrote:

> On Tuesday, March 22, 2016 22:12:00 Russell Purinton wrote:
> > If the subnet mask is wrong on 122.11 it may forward all traffic to the
> > default gateway.  The default gateway may be configured to NAT traffic
> from
> > the LAN, so the response packet would be seen by .10 as coming from .1.
>
> So, it turns out the subnet isn't wrong, but for some reason, it's still
> routing through the gateway, and appears to be coming from .1, instead of
> .11.
> I'm not sure why.  This is a libvirt network, configured thus:
>
> 
>   default
>   f137a5c4-1dd2-453a-a6e6-c161f2918d41
>   
>   
>   
>   
> 
>   
> 
>   
> 
>
> When this was working, I was using forward mode=nat, but then two different
> libvirt networks couldn't talk to each other.  The two machines are on the
> same segment, on the same virtual switch. I'm not sure why they are getting
> routing through the gateway.  Off to do more troubleshooting! :)
>
> j
>
> --
> Joshua J. Kugler - Fairbanks, Alaska
> Azariah Enterprises - Programming and Website Design
> jos...@azariah.com - Jabber: pedah...@gmail.com
> PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Joshua J. Kugler
On Tuesday, March 22, 2016 22:12:00 Russell Purinton wrote:
> If the subnet mask is wrong on 122.11 it may forward all traffic to the
> default gateway.  The default gateway may be configured to NAT traffic from
> the LAN, so the response packet would be seen by .10 as coming from .1.

So, it turns out the subnet isn't wrong, but for some reason, it's still 
routing through the gateway, and appears to be coming from .1, instead of .11.  
I'm not sure why.  This is a libvirt network, configured thus:


  default
  f137a5c4-1dd2-453a-a6e6-c161f2918d41
  
  
  
  

  

  


When this was working, I was using forward mode=nat, but then two different 
libvirt networks couldn't talk to each other.  The two machines are on the 
same segment, on the same virtual switch. I'm not sure why they are getting 
routing through the gateway.  Off to do more troubleshooting! :)

j

-- 
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: pedah...@gmail.com
PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Joshua J. Kugler
On Tuesday, March 22, 2016 22:12:00 Russell Purinton wrote:
> If the subnet mask is wrong on 122.11 it may forward all traffic to the
> default gateway.  The default gateway may be configured to NAT traffic from
> the LAN, so the response packet would be seen by .10 as coming from .1.

Thanks for the suggestion, but sadly that wasn't it.  This is the route table 
on both hosts:

Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse Iface
192.168.130.0   0.0.0.0 255.255.255.0   U 0  00 br1
192.168.122.0   0.0.0.0 255.255.255.0   U 0  00 br0
0.0.0.0 192.168.122.1   0.0.0.0 UG0  00 br0

I haven't ruled it out, though. I am going to do some more log investigation.

j

-- 
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: pedah...@gmail.com
PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Russell Purinton
If the subnet mask is wrong on 122.11 it may forward all traffic to the default 
gateway.  The default gateway may be configured to NAT traffic from the LAN, so 
the response packet would be seen by .10 as coming from .1.

Russ

> On Mar 22, 2016, at 10:10 PM, Joshua J. Kugler  wrote:
> 
> Hmm...I'm wondering if my networking is messed up some how. But why would 
> that 
> cause host b to see host a as the gateway and not as the proper IP?
> 
> j
> 
> On Tuesday, March 22, 2016 17:33:03 Joshua J. Kugler wrote:
>> On Tuesday, March 22, 2016 18:27:46 Atin Mukherjee wrote:
>>> This is the problem, peer handshaking hasn't finished yet. To get to
>>> know the reason I'd need to get the glusterd log file from 
>>> 192.168.122.10.
>> 
>> Here's the log from the other machine (.10).
>> 
>>> As a workaround can you do the following?
>>> 
>>> 1. From node 1 open /var/lib/glusterd/peers/, modify state=3
>>> 2. Repeat step 1 for node 2 as well if state is different
>>> 3. restart both the glusterd instances.
>> 
>> Hmm, just realized this:
>> 
>> This is "Box A"
>> [root@vmserver-a peers]# ls -l
>> total 4
>> -rw--- 1 root root 74 Mar 22 17:15 fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
>> [root@vmserver-a peers]# cat fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
>> uuid=fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
>> state=3
>> hostname1=192.168.122.1
>> 
>> This is "Box B" (from where I'm running the gluster create command:
>> [root@vmserver-b peers]# ls -l
>> total 4
>> -rw--- 1 root root 75 Mar 22 17:15 d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
>> [root@vmserver-b peers]# cat d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
>> uuid=d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
>> state=3
>> hostname1=192.168.122.10
>> 
>> Why is the gateway in the peer list?  To my knowledge, that's not getting
>> added anywhere. This is weird.
>> 
>> Also odd:
>> [root@vmserver-a peers]# gluster peer status
>> Number of Peers: 1
>> 
>> Hostname: 192.168.122.1
>> Uuid: fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
>> State: Peer in Cluster (Disconnected)
>> 
>> [root@vmserver-b peers]# gluster peer status
>> Number of Peers: 1
>> 
>> Hostname: 192.168.122.10
>> Uuid: d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
>> State: Peer in Cluster (Connected)
>> 
>> So, somehow, on vmserver-a is getting 192.168.122.1 in its peer list instead
>> of 192.168.122.11
>> 
>> Very strange.
>> 
>> j
>> 
>>> ~Atin
>>> 
> Also send the glusterd log of the node where the commands have failed.
 
 The two logs are attached.  The peer status says connected. The log file
 says "FAILED : Host 192.168.122.10 is not in 'Peer in Cluster' state"
 I'm confused. :)
 
 Thanks for your help on this!
 
 j
 
 
 
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
> 
> -- 
> Joshua J. Kugler - Fairbanks, Alaska
> Azariah Enterprises - Programming and Website Design
> jos...@azariah.com - Jabber: pedah...@gmail.com
> PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Joshua J. Kugler
Hmm...I'm wondering if my networking is messed up some how. But why would that 
cause host b to see host a as the gateway and not as the proper IP?

j

On Tuesday, March 22, 2016 17:33:03 Joshua J. Kugler wrote:
> On Tuesday, March 22, 2016 18:27:46 Atin Mukherjee wrote:
> > This is the problem, peer handshaking hasn't finished yet. To get to
> > know the reason I'd need to get the glusterd log file from 
> > 192.168.122.10.
> 
> Here's the log from the other machine (.10).
> 
> > As a workaround can you do the following?
> > 
> > 1. From node 1 open /var/lib/glusterd/peers/, modify state=3
> > 2. Repeat step 1 for node 2 as well if state is different
> > 3. restart both the glusterd instances.
> 
> Hmm, just realized this:
> 
> This is "Box A"
> [root@vmserver-a peers]# ls -l
> total 4
> -rw--- 1 root root 74 Mar 22 17:15 fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
> [root@vmserver-a peers]# cat fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
> uuid=fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
> state=3
> hostname1=192.168.122.1
> 
> This is "Box B" (from where I'm running the gluster create command:
> [root@vmserver-b peers]# ls -l
> total 4
> -rw--- 1 root root 75 Mar 22 17:15 d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
> [root@vmserver-b peers]# cat d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
> uuid=d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
> state=3
> hostname1=192.168.122.10
> 
> Why is the gateway in the peer list?  To my knowledge, that's not getting
> added anywhere. This is weird.
> 
> Also odd:
> [root@vmserver-a peers]# gluster peer status
> Number of Peers: 1
> 
> Hostname: 192.168.122.1
> Uuid: fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
> State: Peer in Cluster (Disconnected)
> 
> [root@vmserver-b peers]# gluster peer status
> Number of Peers: 1
> 
> Hostname: 192.168.122.10
> Uuid: d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
> State: Peer in Cluster (Connected)
> 
> So, somehow, on vmserver-a is getting 192.168.122.1 in its peer list instead
> of 192.168.122.11
> 
> Very strange.
> 
> j
> 
> > ~Atin
> > 
> > >> Also send the glusterd log of the node where the commands have failed.
> > > 
> > > The two logs are attached.  The peer status says connected. The log file
> > > says "FAILED : Host 192.168.122.10 is not in 'Peer in Cluster' state"
> > > I'm confused. :)
> > > 
> > > Thanks for your help on this!
> > > 
> > > j
> > > 
> > > 
> > > 
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: pedah...@gmail.com
PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster Monthly Newsletter, March 2016

2016-03-22 Thread Amye Scavarda
Gluster Monthly Newsletter, March 2016

Great things this month, and a major upcoming event:
3.7.9 release on March 22 -
http://www.gluster.org/pipermail/gluster-users/2016-March/025922.html
3.8 roadmap and release planning
-http://www.gluster.org/pipermail/gluster-devel/2016-March/048728.html
Niels de Vos is kicking off a great deal of our work on 3.8
Previously:
Automated Tiering in Gluster:
http://blog.gluster.org/2016/03/automated-tiering-in-gluster/
Gluster at FAST: http://blog.gluster.org/2016/02/gluster-at-fast/
Gluster at NOS: http://blog.gluster.org/2016/03/nos-conf-2016/

Upcoming event:
Vault - http://events.linuxfoundation.org/events/vault

http://vault2016.sched.org/event/68kI/glusterfs-and-its-distribution-model-sakshi-bansal-red-hat
- GlusterFS and its Distribution Model - Sakshi Bansal
http://vault2016.sched.org/event/68kA/glusterfs-facebook-richard-warning-facebook
- GlusterFS @ Facebook - Richard Waering
http://vault2016.sched.org/event/68kY/arbiter-based-replication-in-gluster-without-3x-storage-cost-and-zero-split-brains-ravishankar-n-red-hat
-  Arbiter based Replication in Gluster- without 3x Storage Cost and
Zero Split-Brains! - Ravishankar N.
http://vault2016.sched.org/event/68kW/tiering-in-glusterfs-hardware-config-considerations-veda-shankar-red-hat
- Tiering in GlusterFS: Hardware Config Considerations - Veda Shankar
http://vault2016.sched.org/event/68km/ganesha-gluster-scale-out-nfsv4-kaleb-keithley-red-hat-gluster-storage
- Ganesha + Gluster scale out NFSv4 - Kaleb Keithley
http://vault2016.sched.org/event/68kZ/huge-indexes-algorithms-to-track-objects-in-cache-tiers-dan-lambright-red-hat
-  Huge Indexes: Algorithms to Track Objects in Cache Tiers - Dan
Lambright
http://vault2016.sched.org/event/68kl/glusterd-20-managing-distributed-file-system-using-a-centralized-store-atin-mukherjee-red-hat
-  GlusterD 2.0 - Managing Distributed File System Using a Centralized
Store - Atin Mukherjee
http://vault2016.sched.org/event/68k9/understanding-client-side-shared-cache-with-pblcacle-luis-pabon-red-hat
-  Understanding Client Side Shared Cache with Pblcacle - Luis Pabon
http://vault2016.sched.org/event/68kM/deploying-pnfs-over-distributed-file-storage-jiffin-tony-thottan-red-hat
-  Deploying pNFS over Distributed File Storage - Jiffin Tony Thottan
http://vault2016.sched.org/event/68vm/storage-as-a-service-with-gluster-vijay-bellur-red-hat
- Storage as a Service with Gluster - Vijay Bellur, Red Hat
http://vault2016.sched.org/event/6Qy3/lessons-learned-containerizing-glusterfs-and-ceph-with-docker-and-kubernetes-huamin-chen-red-hat
- Lessons Learned Containerizing GlusterFS and Ceph with Docker and
Kubernetes - Huamin Chen

Noteworthy threads:

++ gluster-users ++
GlusterFS FUSE Client Performance Issues  - Ravishankar N comments
that the FUSE client Performance Issues will be resolved with the
3.7.9 release -
http://www.gluster.org/pipermail/gluster-users/2016-February/025576.html
SELinux support in the near future!!! -  Manikandan S outlines support
for SELinux in upcoming releases -
http://www.gluster.org/pipermail/gluster-users/2016-March/025919.html
Default quorum for 2 way replication   -  Pranith kicks off a
conversation about quorum in 2 way replication,
http://www.gluster.org/pipermail/gluster-users/2016-March/025672.html

++ gluster-devel ++
 Quality of Service in Glusterfs - Raghavendra Gowdappa kicks off a
discussion on QoS   -
http://www.gluster.org/pipermail/gluster-devel/2016-March/048539.html
Updates on GD2 from Kaushal -
http://www.gluster.org/pipermail/gluster-devel/2016-March/048755.html
 GD2 ETCD Bootstrapping  - Atin provides an update on GlusterD 2.0  -
http://www.gluster.org/pipermail/gluster-devel/2016-March/048759.html
On backporting fixes - Raghavendra Talur begins a discussion on
backporting patches and tests. -
http://www.gluster.org/pipermail/gluster-devel/2016-March/048782.html
Improving subdir export for NFS-Ganesha
 - Jiffin Tony Thottan starts a discussion if this should be in 3.7.9
or 3.8 http://www.gluster.org/pipermail/gluster-devel/2016-March/048746.html
 Fuse Subdirectory mounts, access-control and sub-directory
geo-replication, snapshot features -  Pranith Kumar Karampuri (and
Kaushal) gives a two-part update on design.
http://www.gluster.org/pipermail/gluster-devel/2016-March/048537.html
http://www.gluster.org/pipermail/gluster-devel/2016-March/048639.html

 Gluster Top 5 Contributors in the last 30 days:
Niels de Vos, Mohammed Rafi KC,  Kaleb Keithley, Soumya Koduri, Sakshi Bansal

Upcoming CFPs:
 Flock: http://www.flocktofedora.org  - April 8
 LinuxCon Japan:
http://events.linuxfoundation.org/events/linuxcon-japan/program/cfp  -
May 6
 LinuxCon North America:
http://events.linuxfoundation.org/events/linuxcon-north-america/program/cfp
 - April 26th
 LinuxCon Europe:
http://events.linuxfoundation.org/events/linuxcon-europe/program/cfp -
June 17
 LISA: https://www.usenix.org/conference/lisa16/call-for-participation
- April 25th

 Want to see something i

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Joshua J. Kugler
On Tuesday, March 22, 2016 18:27:46 Atin Mukherjee wrote:
> This is the problem, peer handshaking hasn't finished yet. To get to
> know the reason I'd need to get the glusterd log file from  192.168.122.10.

Here's the log from the other machine (.10).

> As a workaround can you do the following?
> 
> 1. From node 1 open /var/lib/glusterd/peers/, modify state=3
> 2. Repeat step 1 for node 2 as well if state is different
> 3. restart both the glusterd instances.

Hmm, just realized this:

This is "Box A"
[root@vmserver-a peers]# ls -l
total 4
-rw--- 1 root root 74 Mar 22 17:15 fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
[root@vmserver-a peers]# cat fe722085-ac0f-4449-a43f-2dc9dd1fd8fb 
uuid=fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
state=3
hostname1=192.168.122.1

This is "Box B" (from where I'm running the gluster create command:
[root@vmserver-b peers]# ls -l
total 4
-rw--- 1 root root 75 Mar 22 17:15 d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
[root@vmserver-b peers]# cat d8e1d7a0-077a-4a50-93f6-d3922e3b96b9 
uuid=d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
state=3
hostname1=192.168.122.10

Why is the gateway in the peer list?  To my knowledge, that's not getting 
added anywhere. This is weird.

Also odd:
[root@vmserver-a peers]# gluster peer status
Number of Peers: 1

Hostname: 192.168.122.1
Uuid: fe722085-ac0f-4449-a43f-2dc9dd1fd8fb
State: Peer in Cluster (Disconnected)

[root@vmserver-b peers]# gluster peer status
Number of Peers: 1

Hostname: 192.168.122.10
Uuid: d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
State: Peer in Cluster (Connected)

So, somehow, on vmserver-a is getting 192.168.122.1 in its peer list instead 
of 192.168.122.11

Very strange.

j


> ~Atin
> 
> >> Also send the glusterd log of the node where the commands have failed.
> > 
> > The two logs are attached.  The peer status says connected. The log file
> > says "FAILED : Host 192.168.122.10 is not in 'Peer in Cluster' state" 
> > I'm confused. :)
> > 
> > Thanks for your help on this!
> > 
> > j
> > 
> > 
> > 
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: pedah...@gmail.com
PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A[2016-03-22 05:23:28.500587] I [MSGID: 100030] [glusterfsd.c:2035:main] 0-glusterd: Started running glusterd version 3.6.9 (args: glusterd --xlator-option *.upgrade=on -N)
[2016-03-22 05:23:28.505311] I [graph.c:269:gf_add_cmdline_options] 0-management: adding option 'upgrade' for volume 'management' with value 'on'
[2016-03-22 05:23:28.505375] I [glusterd.c:1214:init] 0-management: Maximum allowed open file descriptors set to 65536
[2016-03-22 05:23:28.505390] I [glusterd.c:1259:init] 0-management: Using /var/lib/glusterd as working directory
[2016-03-22 05:23:28.508992] E [rpc-transport.c:266:rpc_transport_load] 0-rpc-transport: /usr/lib64/glusterfs/3.6.9/rpc-transport/rdma.so: cannot open shared object file: No such file or directory
[2016-03-22 05:23:28.509015] W [rpc-transport.c:270:rpc_transport_load] 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid or not found on this machine
[2016-03-22 05:23:28.509022] W [rpcsvc.c:1524:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2016-03-22 05:23:28.509652] I [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication module not installed in the system
[2016-03-22 05:23:28.509954] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2016-03-22 05:23:28.509969] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/glusterd.info, returned error: (No such file or directory)
[2016-03-22 05:23:28.509975] I [glusterd-store.c:2068:glusterd_restore_op_version] 0-management: Detected new install. Setting op-version to maximum : 30603
[2016-03-22 05:23:28.510038] I [glusterd-store.c:3502:glusterd_store_retrieve_missed_snaps_list] 0-management: No missed snaps list.
[2016-03-22 05:23:28.510091] E [store.c:432:gf_store_handle_retrieve] 0-: Path corresponding to /var/lib/glusterd/options, returned error: (No such file or directory)
Final graph:
+--+
  1: volume management
  2: type mgmt/glusterd
  3: option rpc-auth.auth-glusterfs on
  4: option rpc-auth.auth-unix on
  5: option rpc-auth.auth-null on
  6: option transport.socket.listen-backlog 128
  7: option upgrade on
  8: option ping-timeout 30
  9: option transport.socket.read-fail-log off
 10: option transport.socket.keepalive-interval 2
 11: option transport.socket.keepalive-time 10
 12: option transport-type rdma
 13: option working-directory /var/lib/glusterd
 14: end-volume
 15:

Re: [Gluster-users] GlusterFS 3.7.9 released

2016-03-22 Thread Lindsay Mathieson
On 23 March 2016 at 10:43, Alan Millar  wrote:

> Anyone have any success in updating to 3.7.9 on Debian Jessie?
>
> I'm seeing dependency problems, when trying to install 3.7.9 using the
> Debian Jessie packages on download.gluster.org.
>



Same issues here on Proxmox 4, which is based on debian jessie


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS 3.7.9 released

2016-03-22 Thread Alan Millar
Anyone have any success in updating to 3.7.9 on Debian Jessie?

I'm seeing dependency problems, when trying to install 3.7.9 using the Debian 
Jessie packages on download.gluster.org.  


For example, it says it wants liburcu4.

 Depends: liburcu4 (>= 0.8.4) but it is not installable

I can only find liburcu2 for Jessie.


https://packages.debian.org/search?searchon=names&keywords=liburcu

It looks similar for some of the other dependencies also, like libtinfo5 and 
libssl1.0.2

Did the Jessie packages accidentally get built with the spec file for sid or 
stretch, possibly?  Or is my system broken and I'm looking at the wrong thing?  


Any help appreciated.  Thanks!

- Alan




- Original Message -
> From: Vijay Bellur 
> Subject: [Gluster-users] GlusterFS 3.7.9 released

> GlusterFS 3.7.9 has been released and the tarball can be found at [1]. 
> Release 
> notes will appear at [2] once the patch [3] gets merged into the repository.
> 
> Fedora-22, EPEL-[567], and Debian {Jessie,Stretch} packages are on 
> download.gluster.org
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Question on the rename atomicity behavior in gluster

2016-03-22 Thread Rama Shenai
Hi, We had some questions with respect to expectations of atomicity of
rename in gluster.

To elaborate :

We have setup with two machines (lets call them M1 and M2) which
essentially access a file (F) on a gluster volume (mounted by M1 and M2)
A program does the following steps sequentially on each of the two machines
(M1 & M2) in an infinite loop
 1) Opens the file F in O_RDWR|O_EXCL mode, reads some data and closes (F)
 2)  Renames some other file F' => F

Periodically either M1 or M2 sees a "Stalefile handle error" when it tries
to read the file (step (1)) after opening the file in O_RDWR|O_EXCL (the
open is successful)

The specific error reported the client volume logs
 (/var/log/glusterfs/mnt-repos-volume1.log)
[2016-03-21 16:53:17.897902] I [dht-rename.c:1344:dht_rename]
0-volume1-dht: renaming master.lock
(hash=volume1-replicate-0/cache=volume1-replicate-0) => master
(hash=volume1-replicate-0/cache=)
[2016-03-21 16:53:18.735090] W [client-rpc-fops.c:504:client3_3_stat_cbk]
0-volume1-client-0: remote operation failed: Stale file handle

We see no error when: have two processes of the above program running on
the same machine (say on M1) accessing the file F on the gluster volume,
for which we want to understand the expectations of atomicity in gluster
specifically specifically for rename, and if the above is a bug.

Also  glusterfs --version => glusterfs 3.6.9 built on Mar  2 2016 18:21:14

We also would like to know if there any parameter in the one translators
 that we can tweak to prevent this problem

Any help or insights here is appreciated

Thanks
Rama
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] nfs-ganesha volume null errors

2016-03-22 Thread prmarino1
Have you tried CTDB from the Samba project
I have user it with nfs-ganesha and samba4 before with great success, ‎further 
more you do not need to enable samba to run it.

  Original Message  
From: Soumya Koduri
Sent: Tuesday, March 22, 2016 09:18
To: ML Wong
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] nfs-ganesha volume null errors



On 03/21/2016 02:16 AM, ML Wong wrote:
> Hello Soumya,
> Thanks for answering my questions.
> Question 1) I am still puzzled what VOL is still referring to. Is that a
> variable/parameter that i can specify somewhere in the ganesha-ha.conf?
> Any pointers will be very much appreciated.

No it doesn't refer to any volume as it is a global option. The log 
message is misleading.
>
> 1) Those 3 test systems do not have firewalld running and SELinux
> running. And i also verify corosync.conf is now empty.
> # sestatus
> SELinux status: disabled
>
> # firewall-cmd --zone=public --list-all
> FirewallD is not running
>
> # ls -al /etc/corosync/corosync.conf
> -rw-r--r-- 1 root root 0 Mar 20 12:54 /etc/corosync/corosync.conf
>
> 2) I also do not find pacemaker.log under /var/log, but i found the
> following. Will these be the same:
> # ls -al /var/log/pcsd/pcsd.log
> -rw-r--r--. 1 root root 162322 Mar 20 13:26 /var/log/pcsd/pcsd.log
>
> In any case, that log is full of the following:
> +++
> I, [2016-03-20T13:33:34.982311 #939] INFO -- : Running:
> /usr/sbin/corosync-cmapctl totem.cluster_name
> I, [2016-03-20T13:33:34.982459 #939] INFO -- : CIB USER: hacluster,
> groups:
> I, [2016-03-20T13:33:34.985984 #939] INFO -- : Return Value: 1
> +++

There should be pacemaker.log too.. Which version of pacemaker are you 
using?

>
> 3) /var/log/messages - it does not look ganesha passing the logs to this
> file. But i see /var/log/ganesha.log - which i found out logging seem
> to be sent to there from /etc/sysconfig/ganesha (OPTIONS="-L
> /var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_FULL_DEBUG""
>
> After it failed to acquire the volume, the server will be filled with
> the following in "ganesha.log", but the other 2 nodes in the cluster do
> not have anything logged in ganesha.log. The other nodes have "E
> [MSGID: 106062] [glusterd-op-sm.c:3728:glusterd_op_ac_unlock]
> 0-management: Unable to acquire volname" logged in the
> "etc-glusterfs-glusterd.vol.log"
> +++
> 20/03/2016 13:37:32 : epoch 56ef059d : mlw-fusion1 :
> ganesha.nfsd-5215[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of
> poll loop
> 20/03/2016 13:37:32 : epoch 56ef059d : mlw-fusion1 :
> ganesha.nfsd-5215[dbus_heartbeat] gsh_dbus_thread :RW LOCK :F_DBG
> :Acquired mutex 0x7fd38e3fe080 (&dbus_bcast_lock) at
> /builddir/build/BUILD/nfs-ganesha-2.3.0/src/dbus/dbus_server.c:689
> 20/03/2016 13:37:32 : epoch 56ef059d : mlw-fusion1 :
> ganesha.nfsd-5215[dbus_heartbeat] gsh_dbus_thread :RW LOCK :F_DBG
> :Released mutex 0x7fd38e3fe080 (&dbus_bcast_lock) at
> /builddir/build/BUILD/nfs-ganesha-2.3.0/src/dbus/dbus_server.c:739
> +++
>

'/var/log/messages' is where cluster-setup related errors are logged.

Probably to debug you could try below steps -
* bring up nfs-ganesha server on all the nodes
#systemctl start nfs-ganesha
* Check if nfs-ganesha is successfully started
* On one of the nodes,
# cd '/usr/libexec/ganesha'
# bash -x ./ganesha.sh --setup /etc/ganesha

This would throw the errors returned by the script on the console during 
cluster setup.

Please give a try and let me know if you see any errors.

Thanks,
Soumya

> Testing Environment: Running CentOS Linux release 7.2.1511, glusterfs
> 3.7.8 (glusterfs-server-3.7.8-2.el7.x86_64),
> nfs-ganesha-gluster-2.3.0-1.el7.x86_64
>
>
> On Mon, Mar 14, 2016 at 2:05 AM, Soumya Koduri  > wrote:
>
> Hi,
>
>
> On 03/14/2016 04:06 AM, ML Wong wrote:
>
> Running CentOS Linux release 7.2.1511, glusterfs 3.7.8
> (glusterfs-server-3.7.8-2.el7.x86_64),
> nfs-ganesha-gluster-2.3.0-1.el7.x86_64
>
> 1) Ensured the connectivity between gluster nodes by using PING
> 2) Disabled NetworkManager (Loaded: loaded
> (/usr/lib/systemd/system/NetworkManager.service; disabled)
> 3) Gluster 'gluster_shared_storage' is created by using (gluster
> volume
> set all cluster.enable-shared-storage enable), and are all
> mounted under
> /run/gluster/shared_storage, and nfs-ganesha directory is also
> created
> after the feature being enabled
> 4) Emtpy out /etc/ganesha/ganesha.conf (have tested ganesha
> running as a
> stand-alone NFS server)
> 5) Installed pacemaker, corosync, and resource-agents
> 6) Reset 'hacluster' system-user password to be the same:
> # pcs cluster auth -u hacluster mlw-fusion1
> mlw-fusion2 mlw-fusion3
> Password:
> mlw-fusion2: Authorized
> mlw-fusion3: Authorized
> mlw-fusion1: Authorized
> 7) IPv6 is enabled - (IPV6INIT=yes in
> /etc/sysconfig/network-scripts/ifcfg-en*)
> 8) Started pcsd, and corosync
> 9) Created /var/lib/glusterd/nfs/secret.pem, and transfer to the
> other 2
> nodes
> # ssh -i secret.pem root@mlw-fusion3 "echo

Re: [Gluster-users] EC volume size calculation

2016-03-22 Thread Serkan Çoban
I mount the volume using NFS and df shows 9PB for the volume size.
I think something wrong with fuse/df.


On Tue, Mar 22, 2016 at 5:59 PM, Serkan Çoban  wrote:
> Hi,
> I just setup a glusterfs cluster with 60 nodes each has 26x8TB disks.
> Volume is 78 X (16+4) distributed disperse.
> My calculations for volume size is:
> 48 servers * 26 Disks * 8TB = 9.98PB
> But when I mount the volume from client it shows 5.7PB.
> Am I doing anything wrong?
> Gluster client and server version is 3.7.9
> Servers are Rhel 7.2, Client is Rhel 6.7
>
> Thanks,
> Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] EC volume size calculation

2016-03-22 Thread Serkan Çoban
Hi,
I just setup a glusterfs cluster with 60 nodes each has 26x8TB disks.
Volume is 78 X (16+4) distributed disperse.
My calculations for volume size is:
48 servers * 26 Disks * 8TB = 9.98PB
But when I mount the volume from client it shows 5.7PB.
Am I doing anything wrong?
Gluster client and server version is 3.7.9
Servers are Rhel 7.2, Client is Rhel 6.7

Thanks,
Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and Received peer request (Connected)

2016-03-22 Thread Atin Mukherjee
-Atin
Sent from one plus one
On 22-Mar-2016 6:54 pm, "tommy.yard...@baesystems.com" <
tommy.yard...@baesystems.com> wrote:
>
> Hi Atin,
>
> Setting 'state=3' on the instances and restarting the service seems to
have fixed the problem.

Great!

>
> Is this an 'issue' with glusterfs?

No, its not. As I said the handshaking was incomplete which lead to this
issue. I'd also recommend you to upgrade to latest gluster version i.e.
3.7.x?

>
> I will implement an automated solution for my problem, just would be good
to know if this is something that will be patched in the future?
>
> Thanks,
> Tommy
>
> -Original Message-
> From: Atin Mukherjee [mailto:amukh...@redhat.com]
> Sent: 22 March 2016 13:10
> To: Yardley, Tommy (UK Guildford); gluster-users@gluster.org
> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent
and Received peer request (Connected)
>
> Tommy,
>
> It seems like that there were frequent disconnect events which may have
caused the peer handshaking to remain incomplete and leading to an
inconsistency in the cluster state.
>
> Further follow up questions:
>
> 1. Restarting glusterd instances doesn't solve the problem?
>
> 2. If answer to 1 is yes can we try to set state=3 in all the
/var/lib/glusterd/peers/ files and then restart glusterd to see
whether the problem persists?
>
> If the above still doesn't solve the problem output of 'cat
/var/lib/glusterd/peers/*' from all the nodes should help us in figuring
out the correct workaround.
>
> ~Atin
>
> On 03/22/2016 02:51 PM, Atin Mukherjee wrote:
> > Gaurav is looking into it and he will get back with his analysis.
> >
> > ~Atin
> >
> > On 03/22/2016 02:42 PM, tommy.yard...@baesystems.com wrote:
> >> Hi,
> >>
> >> Is anyone able to help with this issue?
> >>
> >> Thanks,
> >> Tommy
> >>
> >> -Original Message-
> >> From: gluster-users-boun...@gluster.org
> >> [mailto:gluster-users-boun...@gluster.org] On Behalf Of
> >> tommy.yard...@baesystems.com
> >> Sent: 17 March 2016 08:49
> >> To: gluster-users@gluster.org
> >> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state:
> >> Sent and Received peer request (Connected)
> >>
> >> Hi,
> >>
> >> Sorry I had sent them directly to Atin
> >>
> >> I've trimmed down the larger log files a bit and attached all of them
to this email.
> >>
> >> Many thanks,
> >> Tommy
> >>
> >> -Original Message-
> >> From: Gaurav Garg [mailto:gg...@redhat.com]
> >> Sent: 17 March 2016 07:07
> >> To: Yardley, Tommy (UK Guildford)
> >> Cc: gluster-users@gluster.org
> >> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state:
> >> Sent and Received peer request (Connected)
> >>
> > I’ve sent the logs directly as they push this message over the size
limit.
> >>
> >> Where have you send logs. i could not able to find. could you send
glusterd logs so that we can start analyzing this issue.
> >>
> >> Thanks,
> >>
> >> Regards,
> >> Gaurav
> >>
> >> - Original Message -
> >> From: "Atin Mukherjee" 
> >> To: "tommy yardley" ,
> >> gluster-users@gluster.org
> >> Sent: Wednesday, March 16, 2016 5:49:05 PM
> >> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state:
> >> Sent and Received peer request (Connected)
> >>
> >> I couldn't look into this today, sorry about that. I can only look
into this case on Monday. Anyone else to take this up?
> >>
> >> ~Atin
> >>
> >> On 03/15/2016 09:57 PM, tommy.yard...@baesystems.com wrote:
> >>> Hi Atin,
> >>>
> >>>
> >>>
> >>> All nodes are running 3.5.8 – the probe sequence is:
> >>> 172.31.30.64
> >>>
> >>> 172.31.27.27 (node having issue)
> >>>
> >>> 172.31.26.134 (node the peer probe is ran on)
> >>>
> >>> 172.31.19.46
> >>>
> >>>
> >>>
> >>> I’ve sent the logs directly as they push this message over the size
limit.
> >>>
> >>>
> >>>
> >>> look forward to your reply,
> >>>
> >>>
> >>>
> >>> Tommy
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *From:*Atin Mukherjee [mailto:atin.mukherje...@gmail.com]
> >>> *Sent:* 15 March 2016 15:58
> >>> *To:* Yardley, Tommy (UK Guildford)
> >>> *Cc:* gluster-users@gluster.org
> >>> *Subject:* Re: [Gluster-users] GlusterFS cluster peer stuck in state:
> >>> Sent and Received peer request (Connected)
> >>>
> >>>
> >>>
> >>> This indicates the peer handshaking didn't go through properly and
> >>> your cluster is messed up. Are you running 3.5.8 version in all the
nodes?
> >>> Could you get me the glusterd log from all the nodes and mention the
> >>> peer probe sequence? I'd be able to look at it tomorrow only and get
back.
> >>>
> >>> -Atin
> >>> Sent from one plus one
> >>>
> >>> On 15-Mar-2016 9:16 pm, "tommy.yard...@baesystems.com
> >>> "  >>> > wrote:
> >>>
> >>> Hi All,
> >>>
> >>>
> >>>
> >>> I’m running GlusterFS on a cluster hosted in AWS. I have a script
> >>> which provisions my instances and thus will set up GlusterFS
(specifically:
> >>> glusterfs 3.5.8).
> >>>
> >>> My issue is that this only wor

Re: [Gluster-users] Trying XenServer again with Gluster

2016-03-22 Thread Russell Purinton
Thanks Andre,

Citrix XenServer does not have qemu support for libgfapi unfortunately,
though I have posted a Feature Request with them to possibly support it in
the future.  Not sure if they will.

That's unfortunate that it can't be done with 2 servers.  It makes sense
though.   Do you think it would work with 4 servers in the pool but still
using Replica 2, or is Replica 3 the minimum?   We've got a large amount of
data, and using replica 2 would cost us about $878 per month whereas
replica 3 would cost us about $1317/mo for the same amount of storage...

On Tue, Mar 22, 2016 at 8:48 AM, André Bauer  wrote:

> Hi Russel,
>
> i'm a KVM user but imho XEN also supports accessing vm images through
> libgfapi so you don't need to mount via NFS or fuse client.
>
> Infos:
>
> http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt
>
> Second point is that you need to have at least 3 replicas to get a
> working HA setup, because server quorum does not work for 2 replicas.
>
> Infos:
> https://www.gluster.org/pipermail/gluster-users/2015-November/024189.html
>
> Regards
> André
>
>
> Am 20.03.2016 um 19:41 schrieb Russell Purinton:
> > Hi all, Once again I’m trying to get XenServer working reliably with
> > GlusterFS storage for the VHDs. I’m mainly interested in the ability to
> > have a pair of storage servers, where if one goes down, the VMs can keep
> > running uninterrupted on the other server. So, we’ll be using the
> > replicate translator to make sure all the data resides on both servers.
> >
> > So initially, I tried using the Gluster NFS server. XenServer supports
> > NFS out of the box, so this seemed like a good way to go without having
> > to hack XenServer much. I found some major performance issues with this
> > however.
> >
> > I’m using a server with 12 SAS drives on a single RAID card, with dual
> > 10GbE NICs. Without Gluster, using the normal Kernel NFS server, I can
> > read and write to this server at over 400MB/sec. VMS run well. However
> > when I switch to Gluster for the NFS server, my write performance drops
> > to 20MB/sec. Read performance remains high. I found out this is due to
> > XenServer’s use of O_DIRECT for VHD access. It helped a lot when the
> > server had DDR cache on the RAID card, but for servers without that the
> > performance was unusable.
> >
> > So I installed the gluster-client in XenServer itself, and mounted the
> > volume in dom0. I then created a SR of type “file”. Success, sort of! I
> > can do just about everything on that SR, VMs run nicely, and performance
> > is acceptable at 270MB/sec, BUT…. I have a problem when I transfer an
> > existing VM to it. The transfer gets only so far along then data stops
> > moving. XenServer still says it’s copying, but no data is being sent. I
> > have to force restart the XenHost to clear the issue (and the VM isn’t
> > moved). Other file access to the FUSE mount still works, and other VMs
> > are unaffected.
> >
> > I think the problem may possibly involve file locks or perhaps a
> > performance translator. I’ve tried disabling as many performance
> > translators as I can, but no luck.
> >
> > I didn’t find anything interesting in the logs, and no crash dumps. I
> > tried to do a volume statedump to see the list of locks, but it seemed
> > to only output some cpu stats in /tmp.
> >
> > Is there a generally accepted list of volume options to use with Gluster
> > for volumes meant to store VHDs? Has anyone else had a similar
> > experience with VHD access locking up?
> >
> > Russell
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
>
>
> --
> Mit freundlichen Grüßen
> André Bauer
>
> MAGIX Software GmbH
> André Bauer
> Administrator
> August-Bebel-Straße 48
> 01219 Dresden
> GERMANY
>
> tel.: 0351 41884875
> e-mail: aba...@magix.net
> aba...@magix.net 
> www.magix.com 
>
> Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Klaus Schmidt
> Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205
>
> Find us on:
>
>  
>  
> --
> The information in this email is intended only for the addressee named
> above. Access to this email by anyone else is unauthorized. If you are
> not the intended recipient of this message any disclosure, copying,
> distribution or any action taken in reliance on it is prohibited and
> may be unlawful. MAGIX does not warrant that any attachments are free
> from viruses or other defects and accepts no liability for any losses
> resulting from infected email transmissions. Please note that any
> views expressed in this email may be those of the originator and do
> not necessarily represent the agend

Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and Received peer request (Connected)

2016-03-22 Thread tommy.yard...@baesystems.com
Hi Atin,

Setting 'state=3' on the instances and restarting the service seems to have 
fixed the problem.

Is this an 'issue' with glusterfs?

I will implement an automated solution for my problem, just would be good to 
know if this is something that will be patched in the future?

Thanks,
Tommy

-Original Message-
From: Atin Mukherjee [mailto:amukh...@redhat.com]
Sent: 22 March 2016 13:10
To: Yardley, Tommy (UK Guildford); gluster-users@gluster.org
Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
Received peer request (Connected)

Tommy,

It seems like that there were frequent disconnect events which may have caused 
the peer handshaking to remain incomplete and leading to an inconsistency in 
the cluster state.

Further follow up questions:

1. Restarting glusterd instances doesn't solve the problem?

2. If answer to 1 is yes can we try to set state=3 in all the 
/var/lib/glusterd/peers/ files and then restart glusterd to see whether 
the problem persists?

If the above still doesn't solve the problem output of 'cat 
/var/lib/glusterd/peers/*' from all the nodes should help us in figuring out 
the correct workaround.

~Atin

On 03/22/2016 02:51 PM, Atin Mukherjee wrote:
> Gaurav is looking into it and he will get back with his analysis.
>
> ~Atin
>
> On 03/22/2016 02:42 PM, tommy.yard...@baesystems.com wrote:
>> Hi,
>>
>> Is anyone able to help with this issue?
>>
>> Thanks,
>> Tommy
>>
>> -Original Message-
>> From: gluster-users-boun...@gluster.org
>> [mailto:gluster-users-boun...@gluster.org] On Behalf Of
>> tommy.yard...@baesystems.com
>> Sent: 17 March 2016 08:49
>> To: gluster-users@gluster.org
>> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state:
>> Sent and Received peer request (Connected)
>>
>> Hi,
>>
>> Sorry I had sent them directly to Atin
>>
>> I've trimmed down the larger log files a bit and attached all of them to 
>> this email.
>>
>> Many thanks,
>> Tommy
>>
>> -Original Message-
>> From: Gaurav Garg [mailto:gg...@redhat.com]
>> Sent: 17 March 2016 07:07
>> To: Yardley, Tommy (UK Guildford)
>> Cc: gluster-users@gluster.org
>> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state:
>> Sent and Received peer request (Connected)
>>
> I’ve sent the logs directly as they push this message over the size limit.
>>
>> Where have you send logs. i could not able to find. could you send glusterd 
>> logs so that we can start analyzing this issue.
>>
>> Thanks,
>>
>> Regards,
>> Gaurav
>>
>> - Original Message -
>> From: "Atin Mukherjee" 
>> To: "tommy yardley" ,
>> gluster-users@gluster.org
>> Sent: Wednesday, March 16, 2016 5:49:05 PM
>> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state:
>> Sent and Received peer request (Connected)
>>
>> I couldn't look into this today, sorry about that. I can only look into this 
>> case on Monday. Anyone else to take this up?
>>
>> ~Atin
>>
>> On 03/15/2016 09:57 PM, tommy.yard...@baesystems.com wrote:
>>> Hi Atin,
>>>
>>>
>>>
>>> All nodes are running 3.5.8 – the probe sequence is:
>>> 172.31.30.64
>>>
>>> 172.31.27.27 (node having issue)
>>>
>>> 172.31.26.134 (node the peer probe is ran on)
>>>
>>> 172.31.19.46
>>>
>>>
>>>
>>> I’ve sent the logs directly as they push this message over the size limit.
>>>
>>>
>>>
>>> look forward to your reply,
>>>
>>>
>>>
>>> Tommy
>>>
>>>
>>>
>>>
>>>
>>> *From:*Atin Mukherjee [mailto:atin.mukherje...@gmail.com]
>>> *Sent:* 15 March 2016 15:58
>>> *To:* Yardley, Tommy (UK Guildford)
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] GlusterFS cluster peer stuck in state:
>>> Sent and Received peer request (Connected)
>>>
>>>
>>>
>>> This indicates the peer handshaking didn't go through properly and
>>> your cluster is messed up. Are you running 3.5.8 version in all the nodes?
>>> Could you get me the glusterd log from all the nodes and mention the
>>> peer probe sequence? I'd be able to look at it tomorrow only and get back.
>>>
>>> -Atin
>>> Sent from one plus one
>>>
>>> On 15-Mar-2016 9:16 pm, "tommy.yard...@baesystems.com
>>> " >> > wrote:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I’m running GlusterFS on a cluster hosted in AWS. I have a script
>>> which provisions my instances and thus will set up GlusterFS (specifically:
>>> glusterfs 3.5.8).
>>>
>>> My issue is that this only works ~50% of the time and the other 50%
>>> of the time one of the peers will be ‘stuck’ in the following state:
>>>
>>> /root@ip-xx-xx-xx-1:/home/ubuntu# gluster peer status/
>>>
>>> /Number of Peers: 3/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.2/
>>>
>>> /Uuid: 3b4c1fb9-b325-4204-98fd-2eb739fa867f/
>>>
>>> /State: Peer in Cluster (Connected)/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.3/
>>>
>>> /Uuid: acfc1794-9080-4eb0-8f69-3abe78bbee16/
>>>
>>> /State: Sent and Received peer request (Connected)/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.4/
>>>
>>> /Uuid

Re: [Gluster-users] nfs-ganesha volume null errors

2016-03-22 Thread Soumya Koduri



On 03/21/2016 02:16 AM, ML Wong wrote:

Hello Soumya,
Thanks for answering my questions.
Question 1) I am still puzzled what VOL is still referring to. Is that a
variable/parameter that i can specify somewhere in the  ganesha-ha.conf?
Any pointers will be very much appreciated.


No it doesn't refer to any volume as it is a global option. The log 
message is misleading.


1) Those 3 test systems do not have firewalld running and SELinux
running. And i also verify corosync.conf is now empty.
# sestatus
SELinux status: disabled

# firewall-cmd --zone=public --list-all
FirewallD is not running

# ls -al /etc/corosync/corosync.conf
-rw-r--r-- 1 root root 0 Mar 20 12:54 /etc/corosync/corosync.conf

2) I also do not find pacemaker.log under /var/log, but i found the
following. Will these be the same:
# ls -al /var/log/pcsd/pcsd.log
-rw-r--r--. 1 root root 162322 Mar 20 13:26 /var/log/pcsd/pcsd.log

In any case, that log is full of the following:
+++
I, [2016-03-20T13:33:34.982311 #939]  INFO -- : Running:
/usr/sbin/corosync-cmapctl totem.cluster_name
I, [2016-03-20T13:33:34.982459 #939]  INFO -- : CIB USER: hacluster,
groups:
I, [2016-03-20T13:33:34.985984 #939]  INFO -- : Return Value: 1
+++


There should be pacemaker.log too.. Which version of pacemaker are you 
using?




3) /var/log/messages - it does not look ganesha passing the logs to this
file. But i see /var/log/ganesha.log  - which i found out logging seem
to be sent to there from /etc/sysconfig/ganesha (OPTIONS="-L
/var/log/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_FULL_DEBUG""

After it failed to acquire the volume, the server will be filled with
the following in "ganesha.log", but the other 2 nodes in the cluster do
not have anything logged in ganesha.log.  The other nodes have "E
[MSGID: 106062] [glusterd-op-sm.c:3728:glusterd_op_ac_unlock]
0-management: Unable to acquire volname" logged in the
"etc-glusterfs-glusterd.vol.log"
+++
20/03/2016 13:37:32 : epoch 56ef059d : mlw-fusion1 :
ganesha.nfsd-5215[dbus_heartbeat] gsh_dbus_thread :DBUS :F_DBG :top of
poll loop
20/03/2016 13:37:32 : epoch 56ef059d : mlw-fusion1 :
ganesha.nfsd-5215[dbus_heartbeat] gsh_dbus_thread :RW LOCK :F_DBG
:Acquired mutex 0x7fd38e3fe080 (&dbus_bcast_lock) at
/builddir/build/BUILD/nfs-ganesha-2.3.0/src/dbus/dbus_server.c:689
20/03/2016 13:37:32 : epoch 56ef059d : mlw-fusion1 :
ganesha.nfsd-5215[dbus_heartbeat] gsh_dbus_thread :RW LOCK :F_DBG
:Released mutex 0x7fd38e3fe080 (&dbus_bcast_lock) at
/builddir/build/BUILD/nfs-ganesha-2.3.0/src/dbus/dbus_server.c:739
+++



'/var/log/messages' is where cluster-setup related errors are logged.

Probably to debug you could try below steps -
* bring up nfs-ganesha server on all the nodes
 #systemctl start nfs-ganesha
* Check if nfs-ganesha is successfully started
* On one of the nodes,
# cd '/usr/libexec/ganesha'
# bash -x ./ganesha.sh --setup /etc/ganesha

This would throw the errors returned by the script on the console during 
cluster setup.


Please give a try and let me know if you see any errors.

Thanks,
Soumya


Testing Environment: Running CentOS Linux release 7.2.1511, glusterfs
3.7.8 (glusterfs-server-3.7.8-2.el7.x86_64),
nfs-ganesha-gluster-2.3.0-1.el7.x86_64


On Mon, Mar 14, 2016 at 2:05 AM, Soumya Koduri mailto:skod...@redhat.com>> wrote:

Hi,


On 03/14/2016 04:06 AM, ML Wong wrote:

Running CentOS Linux release 7.2.1511, glusterfs 3.7.8
(glusterfs-server-3.7.8-2.el7.x86_64),
nfs-ganesha-gluster-2.3.0-1.el7.x86_64

1) Ensured the connectivity between gluster nodes by using PING
2) Disabled NetworkManager (Loaded: loaded
(/usr/lib/systemd/system/NetworkManager.service; disabled)
3) Gluster 'gluster_shared_storage' is created by using (gluster
volume
set all cluster.enable-shared-storage enable), and are all
mounted under
/run/gluster/shared_storage, and nfs-ganesha directory is also
created
after the feature being enabled
4) Emtpy out /etc/ganesha/ganesha.conf (have tested ganesha
running as a
stand-alone NFS server)
5) Installed pacemaker, corosync, and resource-agents
6) Reset 'hacluster' system-user password to be the same:
  # pcs cluster auth -u hacluster mlw-fusion1
mlw-fusion2 mlw-fusion3
  Password:
  mlw-fusion2: Authorized
  mlw-fusion3: Authorized
  mlw-fusion1: Authorized
7) IPv6 is enabled - (IPV6INIT=yes in
/etc/sysconfig/network-scripts/ifcfg-en*)
8) Started pcsd, and corosync
9) Created /var/lib/glusterd/nfs/secret.pem, and transfer to the
other 2
nodes
  # ssh -i secret.pem root@mlw-fusion3 "echo helloworld"
  helloworld
9) Transfer the following ganesha-ha.conf to the other nodes in the
cluster, but change the HA_VOL_SERVER value accordingly to
 

Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and Received peer request (Connected)

2016-03-22 Thread Atin Mukherjee
Tommy,

It seems like that there were frequent disconnect events which may have
caused the peer handshaking to remain incomplete and leading to an
inconsistency in the cluster state.

Further follow up questions:

1. Restarting glusterd instances doesn't solve the problem?

2. If answer to 1 is yes can we try to set state=3 in all the
/var/lib/glusterd/peers/ files and then restart glusterd to see
whether the problem persists?

If the above still doesn't solve the problem output of 'cat
/var/lib/glusterd/peers/*' from all the nodes should help us in figuring
out the correct workaround.

~Atin

On 03/22/2016 02:51 PM, Atin Mukherjee wrote:
> Gaurav is looking into it and he will get back with his analysis.
> 
> ~Atin
> 
> On 03/22/2016 02:42 PM, tommy.yard...@baesystems.com wrote:
>> Hi,
>>
>> Is anyone able to help with this issue?
>>
>> Thanks,
>> Tommy
>>
>> -Original Message-
>> From: gluster-users-boun...@gluster.org 
>> [mailto:gluster-users-boun...@gluster.org] On Behalf Of 
>> tommy.yard...@baesystems.com
>> Sent: 17 March 2016 08:49
>> To: gluster-users@gluster.org
>> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
>> Received peer request (Connected)
>>
>> Hi,
>>
>> Sorry I had sent them directly to Atin
>>
>> I've trimmed down the larger log files a bit and attached all of them to 
>> this email.
>>
>> Many thanks,
>> Tommy
>>
>> -Original Message-
>> From: Gaurav Garg [mailto:gg...@redhat.com]
>> Sent: 17 March 2016 07:07
>> To: Yardley, Tommy (UK Guildford)
>> Cc: gluster-users@gluster.org
>> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
>> Received peer request (Connected)
>>
> I’ve sent the logs directly as they push this message over the size limit.
>>
>> Where have you send logs. i could not able to find. could you send glusterd 
>> logs so that we can start analyzing this issue.
>>
>> Thanks,
>>
>> Regards,
>> Gaurav
>>
>> - Original Message -
>> From: "Atin Mukherjee" 
>> To: "tommy yardley" , gluster-users@gluster.org
>> Sent: Wednesday, March 16, 2016 5:49:05 PM
>> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
>> Received peer request (Connected)
>>
>> I couldn't look into this today, sorry about that. I can only look into this 
>> case on Monday. Anyone else to take this up?
>>
>> ~Atin
>>
>> On 03/15/2016 09:57 PM, tommy.yard...@baesystems.com wrote:
>>> Hi Atin,
>>>
>>>
>>>
>>> All nodes are running 3.5.8 – the probe sequence is:
>>> 172.31.30.64
>>>
>>> 172.31.27.27 (node having issue)
>>>
>>> 172.31.26.134 (node the peer probe is ran on)
>>>
>>> 172.31.19.46
>>>
>>>
>>>
>>> I’ve sent the logs directly as they push this message over the size limit.
>>>
>>>
>>>
>>> look forward to your reply,
>>>
>>>
>>>
>>> Tommy
>>>
>>>
>>>
>>>
>>>
>>> *From:*Atin Mukherjee [mailto:atin.mukherje...@gmail.com]
>>> *Sent:* 15 March 2016 15:58
>>> *To:* Yardley, Tommy (UK Guildford)
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] GlusterFS cluster peer stuck in state:
>>> Sent and Received peer request (Connected)
>>>
>>>
>>>
>>> This indicates the peer handshaking didn't go through properly and
>>> your cluster is messed up. Are you running 3.5.8 version in all the nodes?
>>> Could you get me the glusterd log from all the nodes and mention the
>>> peer probe sequence? I'd be able to look at it tomorrow only and get back.
>>>
>>> -Atin
>>> Sent from one plus one
>>>
>>> On 15-Mar-2016 9:16 pm, "tommy.yard...@baesystems.com
>>> " >> > wrote:
>>>
>>> Hi All,
>>>
>>>
>>>
>>> I’m running GlusterFS on a cluster hosted in AWS. I have a script
>>> which provisions my instances and thus will set up GlusterFS (specifically:
>>> glusterfs 3.5.8).
>>>
>>> My issue is that this only works ~50% of the time and the other 50% of
>>> the time one of the peers will be ‘stuck’ in the following state:
>>>
>>> /root@ip-xx-xx-xx-1:/home/ubuntu# gluster peer status/
>>>
>>> /Number of Peers: 3/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.2/
>>>
>>> /Uuid: 3b4c1fb9-b325-4204-98fd-2eb739fa867f/
>>>
>>> /State: Peer in Cluster (Connected)/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.3/
>>>
>>> /Uuid: acfc1794-9080-4eb0-8f69-3abe78bbee16/
>>>
>>> /State: Sent and Received peer request (Connected)/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.4/
>>>
>>> /Uuid: af33463d-1b32-4ffb-a4f0-46ce16151e2f/
>>>
>>> /State: Peer in Cluster (Connected)/
>>>
>>>
>>>
>>> Running gluster peer status on the instance that is affected yields:
>>>
>>>
>>>
>>> /root@ip-xx-xx-xx-3:/var/log/glusterfs# gluster peer status Number of
>>> Peers: 1/
>>>
>>> / /
>>>
>>> /Hostname: xx.xx.xx.1/
>>>
>>> /Uuid: c4f17e9a-893b-48f0-a014-1a05cca09d01/
>>>
>>> /State: Peer is connected and Accepted (Connected)/
>>>
>>> / /
>>>
>>> Of which the status (Connected) in this case, will fluctuate between
>>> ‘Connected’ and ‘Disconnected’.
>>>
>>>
>>>
>>> I have been

[Gluster-users] Minutes of today's Gluster Community Bug Triage meeting (Mar 22 2016)

2016-03-22 Thread Saravanakumar Arumugam

Hi,

Please find the minutes of today's Gluster Community Bug Triage meeting 
below. Thanks to everyone who have attended the meeting.


Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-03-22/gluster_bug_triage.2016-03-22-12.00.html
Minutes (text): 
https://meetbot.fedoraproject.org/gluster-meeting/2016-03-22/gluster_bug_triage.2016-03-22-12.00.txt
Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-03-22/gluster_bug_triage.2016-03-22-12.00.log.html



#gluster-meeting: Gluster Bug Triage

Meeting started by Saravanakmr at 12:00:03 UTC (full logs 
). 




 Meeting summary

1.
1. agenda: https://public.pad.fsfe.org/p/gluster-bug-triage
   (Saravanakmr
   
,
   12:00:19)

2. *Roll Call* (Saravanakmr
   
,
   12:00:28)
3. *kkeithley_ will come up with a proposal to reduce the number of
   bugs against "mainline" in NEW state* (Saravanakmr
   
,
   12:04:42)
4. *hagarth start/sync email on regular (nightly) automated tests*
   (Saravanakmr
   
,
   12:05:36)
5. *msvbhat will look into using nightly builds for automated testing,
   and will report issues/success to the mailinglist* (Saravanakmr
   
,
   12:06:57)
6. *msvbhat will look into lalatenduM's automated Coverity setup in
   Jenkins which need assistance from an admin with more permissions*
   (Saravanakmr
   
,
   12:09:41)
1. /ACTION/: msvbhat will look into lalatenduM's automated Coverity
   setup in Jenkins which need assistance from an admin with more
   permissions (Saravanakmr
   
,
   12:13:42)

7. *msvbhat and ndevos need to think about and decide how to
   provide/use debug builds* (Saravanakmr
   
,
   12:13:55)
7.
1. /ACTION/: ndevos need to think about and decide how to
   provide/use debug builds (Saravanakmr
   
,
   12:17:02)

8. *ndevos to propose some test-cases for minimal libgfapi test*
   (Saravanakmr
   
,
   12:17:15)
9. *Manikandan and Nandaja will update on bug automation* (Saravanakmr
   
,
   12:19:16)
1. /ACTION/: Manikandan and Nandaja will update on bug automation
   (Saravanakmr
   
,
   12:20:17)
2. /ACTION/: kkeithley_ will come up with a proposal to reduce the
   number of bugs against "mainline" in NEW state (Saravanakmr
   
,
   12:23:53)
3. /ACTION/: hagarth start/sync email on regular (nightly)
   automated tests (Saravanakmr
   
,
   12:24:03)
4. https://public.pad.fsfe.org/p/gluster-bugs-to-triage
   (Saravanakmr
   
,
   12:25:04)
5. http://www.gluster.org/community/documentation/index.php/Bug_triage
   (Manikandan
   
,
   12:26:37)

10. *Open Floor* (Saravanakmr
   
,
   12:39:37)



Meeting ended at 12:41:49 UTC (full logs 
). 




 Action items

1. msvbhat will look into lalatenduM's automated Coverity setup in
   Jenkins which need assistance from an admin with more p

Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Atin Mukherjee


On 03/22/2016 01:14 PM, Joshua J. Kugler wrote:
> On Sunday, March 20, 2016 14:44:18 Atin Mukherjee wrote:
>> What does gluster peer status output say?
> 
>>From the node that failed:
> 
> [root@vmserver-b ~]# gluster peer status
> Number of Peers: 1
> 
> Hostname: 192.168.122.10
> Uuid: d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
> State: Accepted peer request (Connected)
This is the problem, peer handshaking hasn't finished yet. To get to
know the reason I'd need to get the glusterd log file from  192.168.122.10.

As a workaround can you do the following?

1. From node 1 open /var/lib/glusterd/peers/, modify state=3
2. Repeat step 1 for node 2 as well if state is different
3. restart both the glusterd instances.

~Atin

> 
>> Also send the glusterd log of the node where the commands have failed.
> 
> The two logs are attached.  The peer status says connected. The log file says 
> "FAILED : Host 192.168.122.10 is not in 'Peer in Cluster' state"  I'm 
> confused. :)
> 
> Thanks for your help on this!
> 
> j
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Trying XenServer again with Gluster

2016-03-22 Thread André Bauer
Hi Russel,

i'm a KVM user but imho XEN also supports accessing vm images through
libgfapi so you don't need to mount via NFS or fuse client.

Infos:
http://www.gluster.org/community/documentation/index.php/Libgfapi_with_qemu_libvirt

Second point is that you need to have at least 3 replicas to get a
working HA setup, because server quorum does not work for 2 replicas.

Infos:
https://www.gluster.org/pipermail/gluster-users/2015-November/024189.html

Regards
André


Am 20.03.2016 um 19:41 schrieb Russell Purinton:
> Hi all, Once again I’m trying to get XenServer working reliably with
> GlusterFS storage for the VHDs. I’m mainly interested in the ability to
> have a pair of storage servers, where if one goes down, the VMs can keep
> running uninterrupted on the other server. So, we’ll be using the
> replicate translator to make sure all the data resides on both servers.
> 
> So initially, I tried using the Gluster NFS server. XenServer supports
> NFS out of the box, so this seemed like a good way to go without having
> to hack XenServer much. I found some major performance issues with this
> however.
> 
> I’m using a server with 12 SAS drives on a single RAID card, with dual
> 10GbE NICs. Without Gluster, using the normal Kernel NFS server, I can
> read and write to this server at over 400MB/sec. VMS run well. However
> when I switch to Gluster for the NFS server, my write performance drops
> to 20MB/sec. Read performance remains high. I found out this is due to
> XenServer’s use of O_DIRECT for VHD access. It helped a lot when the
> server had DDR cache on the RAID card, but for servers without that the
> performance was unusable.
> 
> So I installed the gluster-client in XenServer itself, and mounted the
> volume in dom0. I then created a SR of type “file”. Success, sort of! I
> can do just about everything on that SR, VMs run nicely, and performance
> is acceptable at 270MB/sec, BUT…. I have a problem when I transfer an
> existing VM to it. The transfer gets only so far along then data stops
> moving. XenServer still says it’s copying, but no data is being sent. I
> have to force restart the XenHost to clear the issue (and the VM isn’t
> moved). Other file access to the FUSE mount still works, and other VMs
> are unaffected.
> 
> I think the problem may possibly involve file locks or perhaps a
> performance translator. I’ve tried disabling as many performance
> translators as I can, but no luck.
> 
> I didn’t find anything interesting in the logs, and no crash dumps. I
> tried to do a volume statedump to see the list of locks, but it seemed
> to only output some cpu stats in /tmp.
> 
> Is there a generally accepted list of volume options to use with Gluster
> for volumes meant to store VHDs? Has anyone else had a similar
> experience with VHD access locking up?
> 
> Russell
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 


-- 
Mit freundlichen Grüßen
André Bauer

MAGIX Software GmbH
André Bauer
Administrator
August-Bebel-Straße 48
01219 Dresden
GERMANY

tel.: 0351 41884875
e-mail: aba...@magix.net
aba...@magix.net 
www.magix.com 

Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Klaus Schmidt
Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205

Find us on:

 
 
--
The information in this email is intended only for the addressee named
above. Access to this email by anyone else is unauthorized. If you are
not the intended recipient of this message any disclosure, copying,
distribution or any action taken in reliance on it is prohibited and
may be unlawful. MAGIX does not warrant that any attachments are free
from viruses or other defects and accepts no liability for any losses
resulting from infected email transmissions. Please note that any
views expressed in this email may be those of the originator and do
not necessarily represent the agenda of the company.
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Convert existing volume to shard volume

2016-03-22 Thread André Bauer
Thanks for the Info...

Am 17.03.2016 um 18:21 schrieb Krutika Dhananjay:
> If you want the existing files in your volume to get sharded, you would
> need to
> a. enable sharding on the volume and configure block size, both of which
> you have already done,
> b. cp the file(s) into the same volume with temporary names
> c. once done, you can rename the temporary paths back to their old names.
> 
> HTH,
> Krutika
> 
> On Thu, Mar 17, 2016 at 9:51 PM, André Bauer  > wrote:
> 
> Hi List,
> 
> i just upgraded from 3.5.8 to 3.7.8 and want to convert my existing VM
> Images volume to a shard volume now:
> 
> gluster volume set dis-rep features.shard on
> gluster volume set dis-rep features.shard-block-size 16MB
> 
> How are the existing image files handled?
> Do i need to start rebalance to convert existing files?
> 
> Or is it better to start with an empty volume?
> If so, why?
> 
> 
> --
> Regards
> André
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 


-- 
Mit freundlichen Grüßen
André Bauer

MAGIX Software GmbH
André Bauer
Administrator
August-Bebel-Straße 48
01219 Dresden
GERMANY

tel.: 0351 41884875
e-mail: aba...@magix.net
aba...@magix.net 
www.magix.com 

Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Klaus Schmidt
Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205

Find us on:

 
 
--
The information in this email is intended only for the addressee named
above. Access to this email by anyone else is unauthorized. If you are
not the intended recipient of this message any disclosure, copying,
distribution or any action taken in reliance on it is prohibited and
may be unlawful. MAGIX does not warrant that any attachments are free
from viruses or other defects and accepts no liability for any losses
resulting from infected email transmissions. Please note that any
views expressed in this email may be those of the originator and do
not necessarily represent the agenda of the company.
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] REMINDER: Gluster Community Bug Triage meeting at 12:00 UTC

2016-03-22 Thread Saravanakumar Arumugam

Hi,

This meeting is scheduled for anyone that is interested in learning more
about, or assisting with the Bug Triage.

Meeting details:
- location: #gluster-meeting on Freenode IRC
(https://webchat.freenode.net/?channels=gluster-meeting )
- date: every Tuesday
- time: 12:00 UTC
 (in your terminal, run: date -d "12:00 UTC")
- agenda: https://public.pad.fsfe.org/p/gluster-bug-triage

Currently the following items are listed:
* Roll Call
* Status of last weeks action items
* Group Triage
* Open Floor

The last two topics have space for additions. If you have a suitable bug
or topic to discuss, please add it to the agenda.

Appreciate your participation.

Thanks,
Saravana
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] question about gluster volume heal info split-brain

2016-03-22 Thread songxin
Hi,
I have a quesition about heal info split-brain.
I know that the gfid mismatch is a kind of split-brain and the parent directory 
should be show split-brain.
In my case the "gluster volume heal info split-brain"  show that no file is 
split-brain, though same filename has diffetent gfid on two bricks of a 
replicate volume.
And access the file it will who Input/output error.




precondition:
2.A node ip:10.32.0.48;
3.B node ip:10.32.1.144
4.A brick: /opt/lvmdir/c2/brick on A node
5.B brick: /opt/lvmdir/c2/brick on B node


reproduce:
1.create a replicate volume use two brick, A brick and B brick  
(on A node)
2.start volume  
  (on A node) 
2.mount  mount point (/mnt/c) on the volume 
   (on A node) 
3.mount  mount point (/mnt/c) on the volume 
   (on B node) 
4.access the mount point
   (A node and B node)
5.reboot B node
6.start glusterd
   (B node)
7.remove B brick from replicate volume  
  (A node)
8.peer detach 10.32.1.144   
  (A node)
9.peer probe 10.32.1.144
  (A node)
10.add B brick to  volume   
   (A node)
11.after some time, go to step 5






logs on A node:


stat: cannot stat '/mnt/c/public_html/cello/ior_files/nameroot.ior': 
Input/output error


getfattr -d -m . -e hex 
opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.afr.dirty=0x
trusted.bit-rot.version=0x000256e812da0007bf13
trusted.gfid=0xc18f775d94de42879235d1331d85c860


getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files
trusted.afr.c_glusterfs-client-1=0x
trusted.afr.c_glusterfs-client-207=0x0002
trusted.afr.c_glusterfs-client-209=0x
trusted.afr.c_glusterfs-client-215=0x
trusted.afr.c_glusterfs-client-39=0x
trusted.afr.c_glusterfs-client-47=0x
trusted.afr.c_glusterfs-client-49=0x0002
trusted.afr.c_glusterfs-client-51=0x
trusted.afr.dirty=0x
trusted.gfid=0xd9cd3be03fa44d1e8a8da8523535ef0a
trusted.glusterfs.dht=0x0001




logs on B node:
stat: cannot stat '/mnt/c/public_html/cello/ior_files/nameroot.ior': 
Input/output error


getfattr -d -m . -e hex 
opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior 
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files/nameroot.ior
trusted.bit-rot.version=0x000256e813c50008b4e2
trusted.gfid=0x32145e0378864767989335f37c108409


getfattr -d -m . -e hex opt/lvmdir/c2/brick/public_html/cello/ior_files 
# file: opt/lvmdir/c2/brick/public_html/cello/ior_files
trusted.afr.c_glusterfs-client-112=0x
trusted.afr.c_glusterfs-client-116=0x
trusted.afr.c_glusterfs-client-128=0x
trusted.afr.c_glusterfs-client-130=0x
trusted.afr.c_glusterfs-client-150=0x
trusted.afr.c_glusterfs-client-164=0x
trusted.afr.c_glusterfs-client-166=0x
trusted.afr.c_glusterfs-client-194=0x
trusted.afr.c_glusterfs-client-196=0x
trusted.afr.c_glusterfs-client-200=0x
trusted.afr.c_glusterfs-client-224=0x
trusted.afr.c_glusterfs-client-26=0x
trusted.afr.c_glusterfs-client-36=0x
trusted.afr.c_glusterfs-client-38=0x
trusted.afr.c_glusterfs-client-40=0x
trusted.afr.c_glusterfs-client-50=0x
trusted.afr.c_glusterfs-client-54=0x
trusted.afr.c_glusterfs-client-58=0x0002
trusted.afr.c_glusterfs-client-64=0x
trusted.afr.c_glusterfs-client-66=0x
trusted.afr.c_glusterfs-client-70=0x
trusted.afr.c_glusterfs-client-76=0x
trusted.afr.c_glusterfs-client-84=0x
trusted.afr.c_glusterfs-client-90=0x
trusted.afr.c_glusterfs-client-98=0x
trusted.afr.dirty=0x
trusted.gfid=0xd9cd3be03fa44d1e8a8da8523535ef0a
tru

Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and Received peer request (Connected)

2016-03-22 Thread Atin Mukherjee
Gaurav is looking into it and he will get back with his analysis.

~Atin

On 03/22/2016 02:42 PM, tommy.yard...@baesystems.com wrote:
> Hi,
> 
> Is anyone able to help with this issue?
> 
> Thanks,
> Tommy
> 
> -Original Message-
> From: gluster-users-boun...@gluster.org 
> [mailto:gluster-users-boun...@gluster.org] On Behalf Of 
> tommy.yard...@baesystems.com
> Sent: 17 March 2016 08:49
> To: gluster-users@gluster.org
> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
> Received peer request (Connected)
> 
> Hi,
> 
> Sorry I had sent them directly to Atin
> 
> I've trimmed down the larger log files a bit and attached all of them to this 
> email.
> 
> Many thanks,
> Tommy
> 
> -Original Message-
> From: Gaurav Garg [mailto:gg...@redhat.com]
> Sent: 17 March 2016 07:07
> To: Yardley, Tommy (UK Guildford)
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
> Received peer request (Connected)
> 
 I’ve sent the logs directly as they push this message over the size limit.
> 
> Where have you send logs. i could not able to find. could you send glusterd 
> logs so that we can start analyzing this issue.
> 
> Thanks,
> 
> Regards,
> Gaurav
> 
> - Original Message -
> From: "Atin Mukherjee" 
> To: "tommy yardley" , gluster-users@gluster.org
> Sent: Wednesday, March 16, 2016 5:49:05 PM
> Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
> Received peer request (Connected)
> 
> I couldn't look into this today, sorry about that. I can only look into this 
> case on Monday. Anyone else to take this up?
> 
> ~Atin
> 
> On 03/15/2016 09:57 PM, tommy.yard...@baesystems.com wrote:
>> Hi Atin,
>>
>>
>>
>> All nodes are running 3.5.8 – the probe sequence is:
>> 172.31.30.64
>>
>> 172.31.27.27 (node having issue)
>>
>> 172.31.26.134 (node the peer probe is ran on)
>>
>> 172.31.19.46
>>
>>
>>
>> I’ve sent the logs directly as they push this message over the size limit.
>>
>>
>>
>> look forward to your reply,
>>
>>
>>
>> Tommy
>>
>>
>>
>>
>>
>> *From:*Atin Mukherjee [mailto:atin.mukherje...@gmail.com]
>> *Sent:* 15 March 2016 15:58
>> *To:* Yardley, Tommy (UK Guildford)
>> *Cc:* gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] GlusterFS cluster peer stuck in state:
>> Sent and Received peer request (Connected)
>>
>>
>>
>> This indicates the peer handshaking didn't go through properly and
>> your cluster is messed up. Are you running 3.5.8 version in all the nodes?
>> Could you get me the glusterd log from all the nodes and mention the
>> peer probe sequence? I'd be able to look at it tomorrow only and get back.
>>
>> -Atin
>> Sent from one plus one
>>
>> On 15-Mar-2016 9:16 pm, "tommy.yard...@baesystems.com
>> " > > wrote:
>>
>> Hi All,
>>
>>
>>
>> I’m running GlusterFS on a cluster hosted in AWS. I have a script
>> which provisions my instances and thus will set up GlusterFS (specifically:
>> glusterfs 3.5.8).
>>
>> My issue is that this only works ~50% of the time and the other 50% of
>> the time one of the peers will be ‘stuck’ in the following state:
>>
>> /root@ip-xx-xx-xx-1:/home/ubuntu# gluster peer status/
>>
>> /Number of Peers: 3/
>>
>> / /
>>
>> /Hostname: xx.xx.xx.2/
>>
>> /Uuid: 3b4c1fb9-b325-4204-98fd-2eb739fa867f/
>>
>> /State: Peer in Cluster (Connected)/
>>
>> / /
>>
>> /Hostname: xx.xx.xx.3/
>>
>> /Uuid: acfc1794-9080-4eb0-8f69-3abe78bbee16/
>>
>> /State: Sent and Received peer request (Connected)/
>>
>> / /
>>
>> /Hostname: xx.xx.xx.4/
>>
>> /Uuid: af33463d-1b32-4ffb-a4f0-46ce16151e2f/
>>
>> /State: Peer in Cluster (Connected)/
>>
>>
>>
>> Running gluster peer status on the instance that is affected yields:
>>
>>
>>
>> /root@ip-xx-xx-xx-3:/var/log/glusterfs# gluster peer status Number of
>> Peers: 1/
>>
>> / /
>>
>> /Hostname: xx.xx.xx.1/
>>
>> /Uuid: c4f17e9a-893b-48f0-a014-1a05cca09d01/
>>
>> /State: Peer is connected and Accepted (Connected)/
>>
>> / /
>>
>> Of which the status (Connected) in this case, will fluctuate between
>> ‘Connected’ and ‘Disconnected’.
>>
>>
>>
>> I have been unable to locate the cause of this issue. Has this been
>> encountered before, and if so is there a general fix? I haven’t been
>> able to find anything as of yet.
>>
>>
>>
>> Many thanks,
>>
>>
>>
>> *Tommy*
>>
>>
>>
>> Please consider the environment before printing this email. This
>> message should be regarded as confidential. If you have received this
>> email in error please notify the sender and destroy it immediately.
>> Statements of intent shall only become binding when confirmed in hard
>> copy by an authorised signatory. The contents of this email may relate
>> to dealings with other companies under the control of BAE Systems
>> Applied Intelligence Limited, details of which can be found at
>> http://www.baesystems.com/Businesses/index.htm.
>>
>>
>> ___

Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and Received peer request (Connected)

2016-03-22 Thread tommy.yard...@baesystems.com
Hi,

Is anyone able to help with this issue?

Thanks,
Tommy

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of 
tommy.yard...@baesystems.com
Sent: 17 March 2016 08:49
To: gluster-users@gluster.org
Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
Received peer request (Connected)

Hi,

Sorry I had sent them directly to Atin

I've trimmed down the larger log files a bit and attached all of them to this 
email.

Many thanks,
Tommy

-Original Message-
From: Gaurav Garg [mailto:gg...@redhat.com]
Sent: 17 March 2016 07:07
To: Yardley, Tommy (UK Guildford)
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
Received peer request (Connected)

>> > I’ve sent the logs directly as they push this message over the size limit.

Where have you send logs. i could not able to find. could you send glusterd 
logs so that we can start analyzing this issue.

Thanks,

Regards,
Gaurav

- Original Message -
From: "Atin Mukherjee" 
To: "tommy yardley" , gluster-users@gluster.org
Sent: Wednesday, March 16, 2016 5:49:05 PM
Subject: Re: [Gluster-users] GlusterFS cluster peer stuck in state: Sent and 
Received peer request (Connected)

I couldn't look into this today, sorry about that. I can only look into this 
case on Monday. Anyone else to take this up?

~Atin

On 03/15/2016 09:57 PM, tommy.yard...@baesystems.com wrote:
> Hi Atin,
>
>
>
> All nodes are running 3.5.8 – the probe sequence is:
> 172.31.30.64
>
> 172.31.27.27 (node having issue)
>
> 172.31.26.134 (node the peer probe is ran on)
>
> 172.31.19.46
>
>
>
> I’ve sent the logs directly as they push this message over the size limit.
>
>
>
> look forward to your reply,
>
>
>
> Tommy
>
>
>
>
>
> *From:*Atin Mukherjee [mailto:atin.mukherje...@gmail.com]
> *Sent:* 15 March 2016 15:58
> *To:* Yardley, Tommy (UK Guildford)
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] GlusterFS cluster peer stuck in state:
> Sent and Received peer request (Connected)
>
>
>
> This indicates the peer handshaking didn't go through properly and
> your cluster is messed up. Are you running 3.5.8 version in all the nodes?
> Could you get me the glusterd log from all the nodes and mention the
> peer probe sequence? I'd be able to look at it tomorrow only and get back.
>
> -Atin
> Sent from one plus one
>
> On 15-Mar-2016 9:16 pm, "tommy.yard...@baesystems.com
> "  > wrote:
>
> Hi All,
>
>
>
> I’m running GlusterFS on a cluster hosted in AWS. I have a script
> which provisions my instances and thus will set up GlusterFS (specifically:
> glusterfs 3.5.8).
>
> My issue is that this only works ~50% of the time and the other 50% of
> the time one of the peers will be ‘stuck’ in the following state:
>
> /root@ip-xx-xx-xx-1:/home/ubuntu# gluster peer status/
>
> /Number of Peers: 3/
>
> / /
>
> /Hostname: xx.xx.xx.2/
>
> /Uuid: 3b4c1fb9-b325-4204-98fd-2eb739fa867f/
>
> /State: Peer in Cluster (Connected)/
>
> / /
>
> /Hostname: xx.xx.xx.3/
>
> /Uuid: acfc1794-9080-4eb0-8f69-3abe78bbee16/
>
> /State: Sent and Received peer request (Connected)/
>
> / /
>
> /Hostname: xx.xx.xx.4/
>
> /Uuid: af33463d-1b32-4ffb-a4f0-46ce16151e2f/
>
> /State: Peer in Cluster (Connected)/
>
>
>
> Running gluster peer status on the instance that is affected yields:
>
>
>
> /root@ip-xx-xx-xx-3:/var/log/glusterfs# gluster peer status Number of
> Peers: 1/
>
> / /
>
> /Hostname: xx.xx.xx.1/
>
> /Uuid: c4f17e9a-893b-48f0-a014-1a05cca09d01/
>
> /State: Peer is connected and Accepted (Connected)/
>
> / /
>
> Of which the status (Connected) in this case, will fluctuate between
> ‘Connected’ and ‘Disconnected’.
>
>
>
> I have been unable to locate the cause of this issue. Has this been
> encountered before, and if so is there a general fix? I haven’t been
> able to find anything as of yet.
>
>
>
> Many thanks,
>
>
>
> *Tommy*
>
>
>
> Please consider the environment before printing this email. This
> message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard
> copy by an authorised signatory. The contents of this email may relate
> to dealings with other companies under the control of BAE Systems
> Applied Intelligence Limited, details of which can be found at
> http://www.baesystems.com/Businesses/index.htm.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users
>
> Please consider the environment before printing this email. This
> message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of 

Re: [Gluster-users] GlusterFS 3.7.9 released

2016-03-22 Thread Kaushal M
On Tue, Mar 22, 2016 at 12:10 PM, Kaleb KEITHLEY  wrote:
> On 03/22/2016 11:55 AM, ML mail wrote:
>>
>> And a thank you from me too for this release, I am looking forward to a
>> working geo-replication...
>>
>> btw: where can I find the changelog for this release? I always somehow
>> forget where it is located.
>>
>
> Footnote [3], below, has the URL of the patch that will become the release
> notes.

Just merged the change. The release-notes area available at
https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.9.md
now.

>
>
>>
>>
>>
>> On Tuesday, March 22, 2016 4:19 AM, Vijay Bellur 
>> wrote:
>> Hi all,
>>
>> GlusterFS 3.7.9 has been released and the tarball can be found at [1].
>> Release notes will appear at [2] once the patch [3] gets merged into the
>> repository.
>>
>> Fedora-22, EPEL-[567], and Debian {Jessie,Stretch} packages are on
>> download.gluster.org
>> (wheezy coming soon).
>>
>> Packages are in Ubuntu Launchpad for Trusty and Wily.
>>
>> Packages are in SuSE Build System for Leap42.1, OpenSuSE, and SLES-12.
>>
>> Packages for Fedora 23 are queued for testing, and packages for Fedora
>> {24,25} are live.
>>
>> Appreciate your feedback about this release as ever.
>>
>> Thanks,
>> Vijay
>>
>> [1]
>> https://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.9/glusterfs-3.7.9.tar.gz
>>
>> [2]
>> https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.9.md
>>
>> [3] http://review.gluster.org/13802
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Peer probe succeeded, but "not in 'Peer in Cluster' state"

2016-03-22 Thread Joshua J. Kugler
On Sunday, March 20, 2016 14:44:18 Atin Mukherjee wrote:
> What does gluster peer status output say?

>From the node that failed:

[root@vmserver-b ~]# gluster peer status
Number of Peers: 1

Hostname: 192.168.122.10
Uuid: d8e1d7a0-077a-4a50-93f6-d3922e3b96b9
State: Accepted peer request (Connected)

> Also send the glusterd log of the node where the commands have failed.

The two logs are attached.  The peer status says connected. The log file says 
"FAILED : Host 192.168.122.10 is not in 'Peer in Cluster' state"  I'm 
confused. :)

Thanks for your help on this!

j

-- 
Joshua J. Kugler - Fairbanks, Alaska
Azariah Enterprises - Programming and Website Design
jos...@azariah.com - Jabber: pedah...@gmail.com
PGP Key: http://pgp.mit.edu/  ID 0x73B13B6A[2016-03-22 05:26:40.008570] I [cli-cmd-volume.c:1804:cli_check_gsync_present] 0-: geo-replication not installed
[2016-03-22 05:26:40.008867] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:40.009394] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-03-22 05:26:40.016177] I [cli-cmd-volume.c:1804:cli_check_gsync_present] 0-: geo-replication not installed
[2016-03-22 05:26:40.016484] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:40.017430] I [cli-rpc-ops.c:588:gf_cli_get_volume_cbk] 0-cli: Received resp to get vol: 0
[2016-03-22 05:26:40.017541] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-03-22 05:26:40.027743] I [cli-cmd-volume.c:1804:cli_check_gsync_present] 0-: geo-replication not installed
[2016-03-22 05:26:40.028861] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:43.027142] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:46.027459] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:49.027722] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:52.028026] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:55.028325] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:26:58.028535] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:01.028799] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:04.029077] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:07.029260] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:10.031382] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:13.031663] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:16.031968] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:19.032261] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:22.032594] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:25.032751] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:28.033138] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:31.033366] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:34.033645] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:37.033880] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:40.034188] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:43.034510] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:43.370555] I [cli-rpc-ops.c:131:gf_cli_probe_cbk] 0-cli: Received resp to probe
[2016-03-22 05:27:43.370659] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-03-22 05:27:43.922428] I [cli-cmd-volume.c:1804:cli_check_gsync_present] 0-: geo-replication not installed
[2016-03-22 05:27:43.922737] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:43.972665] I [cli-rpc-ops.c:131:gf_cli_probe_cbk] 0-cli: Received resp to probe
[2016-03-22 05:27:43.972731] I [input.c:36:cli_batch] 0-: Exiting with: 0
[2016-03-22 05:27:43.985455] I [cli-cmd-volume.c:1804:cli_check_gsync_present] 0-: geo-replication not installed
[2016-03-22 05:27:43.985690] I [socket.c:2353:socket_event_handler] 0-transport: disconnecting now
[2016-03-22 05:27:43.994157] I [cli-cmd-volume.c:402:cli_cmd_volume_create_cbk] 0-cli: Replicate cluster type found. Checking brick order.
[2016-03-22 05:27:43.994315] I [cli-cmd-volume.c:309:cli_cmd_check_brick_order] 0-cli: Brick order okay
[2016-03-22 05:27:44.002605] I [cli-rpc-ops.c:892:gf_cli_create_volume_cbk] 0-cli: Received resp to create volume
[2016-03-22 05:27:44.002843] I [input.c:36:cli_batch] 0-: Exiting with: -1
[2016-03-22 05:27:44.712793] I [cli-cmd-volume.c:1804:cli_check_gsync

Re: [Gluster-users] Impact of force option in remove-brick

2016-03-22 Thread ABHISHEK PALIWAL
Ok thanks.

On Tue, Mar 22, 2016 at 12:45 PM, Atin Mukherjee 
wrote:

>
>
> On 03/22/2016 12:42 PM, ABHISHEK PALIWAL wrote:
> >
> >
> > On Tue, Mar 22, 2016 at 12:38 PM, Atin Mukherjee  > > wrote:
> >
> >
> >
> > On 03/22/2016 12:23 PM, ABHISHEK PALIWAL wrote:
> > >
> > > On Tue, Mar 22, 2016 at 12:14 PM, Gaurav Garg  
> > > >> wrote:
> > >
> > > >> I just want to know what is the difference in the following
> > scenario:
> > >
> > > 1. remove-brick without the force option
> > > 2. remove-brick with the force option
> > >
> > >
> > > remove-brick without force option will perform task based on
> your
> > > option,
> > > for eg. remove-brick start option will start migration of file
> > from
> > > given
> > > remove-brick to other available bricks in the cluster. you can
> > check
> > > status
> > > of this remove-brick task by issuing remove-brick status
> command.
> > >
> > > But remove-brick with force option will just forcefully remove
> > brick
> > > from the cluster.
> > > It will result in data loss in case of distributed volume,
> because
> > > it will not migrate file
> > > from given remove-brick to other available bricks in the
> > cluster. In
> > > case of replicate volume
> > > you might not have problem by doing remove-brick force because
> > later
> > > on after adding brick you
> > > can issue heal command and migrate file from first replica set
> to
> > > this newly added brick.\
> > >
> > >
> > > so when you are saying the forcefully remove the brick means it
> will
> > > remove the brick even when
> > > that brick is not available or available but have the different
> > uuid of
> > > peers, without generating any error?
> > As I mentioned earlier it doesn't make sense to differentiate between
> > these two behaviors until the UUID mismatch issue is resolved.
> >
> >
> > Yes, I agree. but how we can resolve that uuid mismatch issue is there
> > any way for the same in running system?
> I've explained the case why you are running into multiple UUIDs here [1].
>
> [1] http://www.gluster.org/pipermail/gluster-users/2016-March/025912.html
> >
> > >
> > >
> > > Thanks,
> > >
> > > ~Gaurav
> > >
> > > - Original Message -
> > > From: "ABHISHEK PALIWAL"  abhishpali...@gmail.com>
> > > >>
> > > To: gluster-users@gluster.org  gluster-users@gluster.org>
> >  >>,
> > > gluster-de...@gluster.org 
> >  >>
> > > Sent: Tuesday, March 22, 2016 11:35:52 AM
> > > Subject: [Gluster-users] Impact of force option in remove-brick
> > >
> > > Hi Team,
> > >
> > > I have the following scenario:
> > >
> > > 1. I have one replica 2 volume in which two brick are
> available.
> > > 2. in such permutation and combination I got the UUID of peers
> mismatch.
> > > 3. Because of UUID mismatch when I tried to remove brick on the
> > > second board I am getting the Incorrect Brick failure.
> > >
> > > Now, I have the question if I am using the remove-brick
> command with
> > > the 'force' option it means it should remove the brick in any
> > > situation either the brick is available or its UUID is
> mismatch.
> > >
> > > I just want to know what is the difference in the following
> scenario:
> > >
> > > 1. remove-brick without the force option
> > > 2. remove-brick with the force option
> > >
> > >
> > > Regards
> > > Abhishek
> > >
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org 
> >  >>
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >
> > >
> > >
> > >
> > > --
> > >
> > >
> > >
> > >
> > > Regards
> > > Abhishek Paliwal
> > >
> > >
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org 
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> > >
> >
> >
> >
> >
> > --
> >
> >
> >
> >
> > Regards
> > Abhishek Paliwal
>



-- 




Regards
Abhishek Paliwal
___

Re: [Gluster-users] Impact of force option in remove-brick

2016-03-22 Thread Atin Mukherjee


On 03/22/2016 12:42 PM, ABHISHEK PALIWAL wrote:
> 
> 
> On Tue, Mar 22, 2016 at 12:38 PM, Atin Mukherjee  > wrote:
> 
> 
> 
> On 03/22/2016 12:23 PM, ABHISHEK PALIWAL wrote:
> >
> > On Tue, Mar 22, 2016 at 12:14 PM, Gaurav Garg  
> > >> wrote:
> >
> > >> I just want to know what is the difference in the following
> scenario:
> >
> > 1. remove-brick without the force option
> > 2. remove-brick with the force option
> >
> >
> > remove-brick without force option will perform task based on your
> > option,
> > for eg. remove-brick start option will start migration of file
> from
> > given
> > remove-brick to other available bricks in the cluster. you can
> check
> > status
> > of this remove-brick task by issuing remove-brick status command.
> >
> > But remove-brick with force option will just forcefully remove
> brick
> > from the cluster.
> > It will result in data loss in case of distributed volume, because
> > it will not migrate file
> > from given remove-brick to other available bricks in the
> cluster. In
> > case of replicate volume
> > you might not have problem by doing remove-brick force because
> later
> > on after adding brick you
> > can issue heal command and migrate file from first replica set to
> > this newly added brick.\
> >
> >
> > so when you are saying the forcefully remove the brick means it will
> > remove the brick even when
> > that brick is not available or available but have the different
> uuid of
> > peers, without generating any error?
> As I mentioned earlier it doesn't make sense to differentiate between
> these two behaviors until the UUID mismatch issue is resolved.
> 
> 
> Yes, I agree. but how we can resolve that uuid mismatch issue is there
> any way for the same in running system?
I've explained the case why you are running into multiple UUIDs here [1].

[1] http://www.gluster.org/pipermail/gluster-users/2016-March/025912.html
> 
> >
> >
> > Thanks,
> >
> > ~Gaurav
> >
> > - Original Message -
> > From: "ABHISHEK PALIWAL"  
> > >>
> > To: gluster-users@gluster.org 
> >,
> > gluster-de...@gluster.org 
> >
> > Sent: Tuesday, March 22, 2016 11:35:52 AM
> > Subject: [Gluster-users] Impact of force option in remove-brick
> >
> > Hi Team,
> >
> > I have the following scenario:
> >
> > 1. I have one replica 2 volume in which two brick are available.
> > 2. in such permutation and combination I got the UUID of peers 
> mismatch.
> > 3. Because of UUID mismatch when I tried to remove brick on the
> > second board I am getting the Incorrect Brick failure.
> >
> > Now, I have the question if I am using the remove-brick command with
> > the 'force' option it means it should remove the brick in any
> > situation either the brick is available or its UUID is mismatch.
> >
> > I just want to know what is the difference in the following 
> scenario:
> >
> > 1. remove-brick without the force option
> > 2. remove-brick with the force option
> >
> >
> > Regards
> > Abhishek
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org 
> >
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >
> > --
> >
> >
> >
> >
> > Regards
> > Abhishek Paliwal
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org 
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> 
> 
> 
> 
> -- 
> 
> 
> 
> 
> Regards
> Abhishek Paliwal
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Impact of force option in remove-brick

2016-03-22 Thread ABHISHEK PALIWAL
On Tue, Mar 22, 2016 at 12:38 PM, Atin Mukherjee 
wrote:

>
>
> On 03/22/2016 12:23 PM, ABHISHEK PALIWAL wrote:
> >
> > On Tue, Mar 22, 2016 at 12:14 PM, Gaurav Garg  > > wrote:
> >
> > >> I just want to know what is the difference in the following
> scenario:
> >
> > 1. remove-brick without the force option
> > 2. remove-brick with the force option
> >
> >
> > remove-brick without force option will perform task based on your
> > option,
> > for eg. remove-brick start option will start migration of file from
> > given
> > remove-brick to other available bricks in the cluster. you can check
> > status
> > of this remove-brick task by issuing remove-brick status command.
> >
> > But remove-brick with force option will just forcefully remove brick
> > from the cluster.
> > It will result in data loss in case of distributed volume, because
> > it will not migrate file
> > from given remove-brick to other available bricks in the cluster. In
> > case of replicate volume
> > you might not have problem by doing remove-brick force because later
> > on after adding brick you
> > can issue heal command and migrate file from first replica set to
> > this newly added brick.\
> >
> >
> > so when you are saying the forcefully remove the brick means it will
> > remove the brick even when
> > that brick is not available or available but have the different uuid of
> > peers, without generating any error?
> As I mentioned earlier it doesn't make sense to differentiate between
> these two behaviors until the UUID mismatch issue is resolved.
>

Yes, I agree. but how we can resolve that uuid mismatch issue is there any
way for the same in running system?

> >
> >
> > Thanks,
> >
> > ~Gaurav
> >
> > - Original Message -
> > From: "ABHISHEK PALIWAL"  > >
> > To: gluster-users@gluster.org ,
> > gluster-de...@gluster.org 
> > Sent: Tuesday, March 22, 2016 11:35:52 AM
> > Subject: [Gluster-users] Impact of force option in remove-brick
> >
> > Hi Team,
> >
> > I have the following scenario:
> >
> > 1. I have one replica 2 volume in which two brick are available.
> > 2. in such permutation and combination I got the UUID of peers
> mismatch.
> > 3. Because of UUID mismatch when I tried to remove brick on the
> > second board I am getting the Incorrect Brick failure.
> >
> > Now, I have the question if I am using the remove-brick command with
> > the 'force' option it means it should remove the brick in any
> > situation either the brick is available or its UUID is mismatch.
> >
> > I just want to know what is the difference in the following scenario:
> >
> > 1. remove-brick without the force option
> > 2. remove-brick with the force option
> >
> >
> > Regards
> > Abhishek
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org 
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >
> > --
> >
> >
> >
> >
> > Regards
> > Abhishek Paliwal
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
>



-- 




Regards
Abhishek Paliwal
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Impact of force option in remove-brick

2016-03-22 Thread Atin Mukherjee


On 03/22/2016 12:23 PM, ABHISHEK PALIWAL wrote:
> 
> On Tue, Mar 22, 2016 at 12:14 PM, Gaurav Garg  > wrote:
> 
> >> I just want to know what is the difference in the following scenario:
> 
> 1. remove-brick without the force option
> 2. remove-brick with the force option
> 
> 
> remove-brick without force option will perform task based on your
> option,
> for eg. remove-brick start option will start migration of file from
> given
> remove-brick to other available bricks in the cluster. you can check
> status
> of this remove-brick task by issuing remove-brick status command.
> 
> But remove-brick with force option will just forcefully remove brick
> from the cluster.
> It will result in data loss in case of distributed volume, because
> it will not migrate file
> from given remove-brick to other available bricks in the cluster. In
> case of replicate volume
> you might not have problem by doing remove-brick force because later
> on after adding brick you
> can issue heal command and migrate file from first replica set to
> this newly added brick.\
> 
> 
> so when you are saying the forcefully remove the brick means it will
> remove the brick even when
> that brick is not available or available but have the different uuid of
> peers, without generating any error?
As I mentioned earlier it doesn't make sense to differentiate between
these two behaviors until the UUID mismatch issue is resolved.
> 
> 
> Thanks,
> 
> ~Gaurav
> 
> - Original Message -
> From: "ABHISHEK PALIWAL"  >
> To: gluster-users@gluster.org ,
> gluster-de...@gluster.org 
> Sent: Tuesday, March 22, 2016 11:35:52 AM
> Subject: [Gluster-users] Impact of force option in remove-brick
> 
> Hi Team,
> 
> I have the following scenario:
> 
> 1. I have one replica 2 volume in which two brick are available.
> 2. in such permutation and combination I got the UUID of peers mismatch.
> 3. Because of UUID mismatch when I tried to remove brick on the
> second board I am getting the Incorrect Brick failure.
> 
> Now, I have the question if I am using the remove-brick command with
> the 'force' option it means it should remove the brick in any
> situation either the brick is available or its UUID is mismatch.
> 
> I just want to know what is the difference in the following scenario:
> 
> 1. remove-brick without the force option
> 2. remove-brick with the force option
> 
> 
> Regards
> Abhishek
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
> 
> -- 
> 
> 
> 
> 
> Regards
> Abhishek Paliwal
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Impact of force option in remove-brick

2016-03-22 Thread Gaurav Garg
comment inline.

- Original Message -
From: "ABHISHEK PALIWAL" 
To: "Gaurav Garg" 
Cc: gluster-users@gluster.org, gluster-de...@gluster.org
Sent: Tuesday, March 22, 2016 12:23:25 PM
Subject: Re: [Gluster-users] Impact of force option in remove-brick

On Tue, Mar 22, 2016 at 12:14 PM, Gaurav Garg  wrote:

> >> I just want to know what is the difference in the following scenario:
>
> 1. remove-brick without the force option
> 2. remove-brick with the force option
>
>
> remove-brick without force option will perform task based on your option,
> for eg. remove-brick start option will start migration of file from given
> remove-brick to other available bricks in the cluster. you can check status
> of this remove-brick task by issuing remove-brick status command.
>
> But remove-brick with force option will just forcefully remove brick from
> the cluster.
> It will result in data loss in case of distributed volume, because it will
> not migrate file
> from given remove-brick to other available bricks in the cluster. In case
> of replicate volume
> you might not have problem by doing remove-brick force because later on
> after adding brick you
> can issue heal command and migrate file from first replica set to this
> newly added brick.\
>

so when you are saying the forcefully remove the brick means it will remove
the brick even when
that brick is not available or available but have the different uuid of
peers, without generating any error?


yes, it will remove the brick when brick is not available (brick not available
means when brick is hosted on the node which is down or not available).

i didn't get your point how are you getting different uuid of peers.

~Gaurav

>
> Thanks,
>
> ~Gaurav
>
> - Original Message -
> From: "ABHISHEK PALIWAL" 
> To: gluster-users@gluster.org, gluster-de...@gluster.org
> Sent: Tuesday, March 22, 2016 11:35:52 AM
> Subject: [Gluster-users] Impact of force option in remove-brick
>
> Hi Team,
>
> I have the following scenario:
>
> 1. I have one replica 2 volume in which two brick are available.
> 2. in such permutation and combination I got the UUID of peers mismatch.
> 3. Because of UUID mismatch when I tried to remove brick on the second
> board I am getting the Incorrect Brick failure.
>
> Now, I have the question if I am using the remove-brick command with the
> 'force' option it means it should remove the brick in any situation either
> the brick is available or its UUID is mismatch.
>
> I just want to know what is the difference in the following scenario:
>
> 1. remove-brick without the force option
> 2. remove-brick with the force option
>
>
> Regards
> Abhishek
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 




Regards
Abhishek Paliwal
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users