Re: [Gluster-users] Slow write times to gluster disk

2017-06-01 Thread Ben Turner
Are you sure using conv=sync is what you want?  I normally use conv=fdatasync, 
I'll look up the difference between the two and see if it affects your test.


-b

- Original Message -
> From: "Pat Haley" 
> To: "Pranith Kumar Karampuri" 
> Cc: "Ravishankar N" , gluster-users@gluster.org, 
> "Steve Postma" , "Ben
> Turner" 
> Sent: Tuesday, May 30, 2017 9:40:34 PM
> Subject: Re: [Gluster-users] Slow write times to gluster disk
> 
> 
> Hi Pranith,
> 
> The "dd" command was:
> 
>  dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
> 
> There were 2 instances where dd reported 22 seconds. The output from the
> dd tests are in
> 
> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> 
> Pat
> 
> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> > Pat,
> >What is the command you used? As per the following output, it
> > seems like at least one write operation took 16 seconds. Which is
> > really bad.
> >   96.391165.10 us  89.00 us*16487014.00 us*  393212
> >   WRITE
> >
> >
> > On Tue, May 30, 2017 at 10:36 PM, Pat Haley  > > wrote:
> >
> >
> > Hi Pranith,
> >
> > I ran the same 'dd' test both in the gluster test volume and in
> > the .glusterfs directory of each brick.  The median results (12 dd
> > trials in each test) are similar to before
> >
> >   * gluster test volume: 586.5 MB/s
> >   * bricks (in .glusterfs): 1.4 GB/s
> >
> > The profile for the gluster test-volume is in
> >
> > 
> > http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> > 
> > 
> >
> > Thanks
> >
> > Pat
> >
> >
> >
> >
> > On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> >> Let's start with the same 'dd' test we were testing with to see,
> >> what the numbers are. Please provide profile numbers for the
> >> same. From there on we will start tuning the volume to see what
> >> we can do.
> >>
> >> On Tue, May 30, 2017 at 9:16 PM, Pat Haley  >> > wrote:
> >>
> >>
> >> Hi Pranith,
> >>
> >> Thanks for the tip.  We now have the gluster volume mounted
> >> under /home.  What tests do you recommend we run?
> >>
> >> Thanks
> >>
> >> Pat
> >>
> >>
> >>
> >> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
> >>>
> >>>
> >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley  >>> > wrote:
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> Sorry for the delay.  I never saw received your reply
> >>> (but I did receive Ben Turner's follow-up to your
> >>> reply).  So we tried to create a gluster volume under
> >>> /home using different variations of
> >>>
> >>> gluster volume create test-volume
> >>> mseas-data2:/home/gbrick_test_1
> >>> mseas-data2:/home/gbrick_test_2 transport tcp
> >>>
> >>> However we keep getting errors of the form
> >>>
> >>> Wrong brick type: transport, use
> >>> :
> >>>
> >>> Any thoughts on what we're doing wrong?
> >>>
> >>>
> >>> You should give transport tcp at the beginning I think.
> >>> Anyways, transport tcp is the default, so no need to specify
> >>> so remove those two words from the CLI.
> >>>
> >>>
> >>> Also do you have a list of the test we should be running
> >>> once we get this volume created?  Given the time-zone
> >>> difference it might help if we can run a small battery
> >>> of tests and post the results rather than test-post-new
> >>> test-post... .
> >>>
> >>>
> >>> This is the first time I am doing performance analysis on
> >>> users as far as I remember. In our team there are separate
> >>> engineers who do these tests. Ben who replied earlier is one
> >>> such engineer.
> >>>
> >>> Ben,
> >>> Have any suggestions?
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
> 
> 
>  On Thu, May 11, 2017 at 9:32 PM, Pat Haley
>  > wrote:
> 
> 
>  Hi Pranith,
> 
>  The /home partition is mounted as ext4
>  /home ext4 defaults,usrquota,grpquota   1 2
> 
>  The brick partitions are mounted ax xfs
>  /mnt/brick1 xfs defaults 0 0
>  /mnt/brick2 xfs defaults 0 0
> 

Re: [Gluster-users] Heal operation detail of EC volumes

2017-06-01 Thread Serkan Çoban
>Is it possible that this matches your observations ?
Yes that matches what I see. So 19 files is being in parallel by 19
SHD processes. I thought only one file is being healed at a time.
Then what is the meaning of disperse.shd-max-threads parameter? If I
set it to 2 then each SHD thread will heal two files at the same time?

>How many IOPS can handle your bricks ?
Bricks are 7200RPM NL-SAS disks. 70-80 random IOPS max. But write
pattern seems sequential, 30-40MB bulk writes every 4-5 seconds.
This is what iostat shows.

>Do you have a test environment where we could check all this ?
Not currently but will have in 4-5 weeks. New servers are arriving, I
will add this test to my notes.

> There's a feature to allow to configure the self-heal block size to optimize 
> these cases. The feature is available on 3.11.
I did not see this in 3.11 release notes, what parameter name I should look for?



On Thu, Jun 1, 2017 at 10:30 AM, Xavier Hernandez  wrote:
> Hi Serkan,
>
> On 30/05/17 10:22, Serkan Çoban wrote:
>>
>> Ok I understand that heal operation takes place on server side. In
>> this case I should see X KB
>>  out network traffic from 16 servers and 16X KB input traffic to the
>> failed brick server right? So that process will get 16 chunks
>> recalculate our chunk and write it to disk.
>
>
> That should be the normal operation for a single heal.
>
>> The problem is I am not seeing such kind of traffic on servers. In my
>> configuration (16+4 EC) I see 20 servers are all have 7-8MB outbound
>> traffic and none of them has more than 10MB incoming traffic.
>> Only heal operation is happening on cluster right now, no client/other
>> traffic. I see constant 7-8MB write to healing brick disk. So where is
>> the missing traffic?
>
>
> Not sure about your configuration, but probably you are seeing the result of
> having the SHD of each server doing heals. That would explain the network
> traffic you have.
>
> Suppose that all SHD but the one on the damaged brick are working. In this
> case 19 servers will peek 16 fragments each. This gives 19 * 16 = 304
> fragments to be requested. EC balances the reads among all available
> servers, and there's a chance (1/19) that a fragment is local to the server
> asking it. So we'll need a total of 304 - 304 / 19 = 288 network requests,
> 288 / 19 = 15.2 sent by each server.
>
> If we have a total of 288 requests, it means that each server will answer
> 288 / 19 = 15.2 requests. The net effect of all this is that each healthy
> server is sending 15.2*X bytes of data and each server is receiving 15.2*X
> bytes of data.
>
> Now we need to account for the writes to the damaged brick. We have 19
> simultaneous heals. This means that the damaged brick will receive 19*X
> bytes of data, and each healthy server will send X additional bytes of data.
>
> So:
>
> A healthy server receives 15.2*X bytes of data
> A healthy server sends 16.2*X bytes of data
> A damaged server receives 19*X bytes of data
> A damaged server sends few bytes of data (communication and synchronization
> overhead basically)
>
> As you can see, in this configuration each server has almost the same amount
> of inbound and outbound traffic. Only big difference is the damaged brick,
> that should receive a little more of traffic, but it should send much less.
>
> Is it possible that this matches your observations ?
>
> There's one more thing to consider here, and it's the apparent low
> throughput of self-heal. One possible thing to check is the small size and
> random behavior of the requests.
>
> Assuming that each request has a size of ~128 / 16 = 8KB, at a rate of ~8
> MB/s the servers are processing ~1000 IOPS. Since requests are going to 19
> different files, even if each file is accessed sequentially, the real effect
> will be like random access (some read-ahead on the filesystem can improve
> reads a bit, but writes won't benefit so much).
>
> How many IOPS can handle your bricks ?
>
> Do you have a test environment where we could check all this ? if possible
> it would be interesting to have only a single SHD (kill all SHD from all
> servers but one). In this situation, without client accesses, we should see
> the 16/1 ratio of reads vs writes on the network. We should also see a
> similar of even a little better speed because all reads and writes will be
> sequential, optimizing available IOPS.
>
> There's a feature to allow to configure the self-heal block size to optimize
> these cases. The feature is available on 3.11.
>
> Best regards,
>
> Xavi
>
>
>>
>> On Tue, May 30, 2017 at 10:25 AM, Ashish Pandey 
>> wrote:
>>>
>>>
>>> When we say client side heal or server side heal, we basically talking
>>> about
>>> the side which "triggers" heal of a file.
>>>
>>> 1 - server side heal - shd scans indices and triggers heal
>>>
>>> 2 - client side heal - a fop finds that file needs heal and it triggers
>>> heal
>>> for that file.
>>>
>>> Now, what happens when 

[Gluster-users] Who's using OpenStack Cinder & Gluster? [ Was Re: [Gluster-devel] Fwd: Re: GlusterFS removal from Openstack Cinder]

2017-06-01 Thread Vijay Bellur
Joe,

Agree with you on turning this around into something more positive.

One aspect that would really help us decide on our next steps here is the
actual number of deployments that will be affected by the removal of the
gluster driver in Cinder. If you are running or aware of a deployment of
OpenStack Cinder & Gluster, can you please respond on this thread or to me
& Niels in private providing more details about your deployment? Details
like OpenStack & Gluster versions, number of Gluster nodes & total storage
capactiy would be very useful to us.

Thanks!
Vijay


On Tue, May 30, 2017 at 7:22 PM, Joe Julian  wrote:

> On 05/30/2017 03:52 PM, Ric Wheeler wrote:
>
> On 05/30/2017 06:37 PM, Joe Julian wrote:
>
> On 05/30/2017 03:24 PM, Ric Wheeler wrote:
>
> On 05/27/2017 03:02 AM, Joe Julian wrote:
>
> On 05/26/2017 11:38 PM, Pranith Kumar Karampuri wrote:
>
>
>
> On Wed, May 24, 2017 at 9:10 PM, Joe Julian   > wrote:
>
> Forwarded for posterity and follow-up.
>
>
>  Forwarded Message 
> Subject: Re: GlusterFS removal from Openstack Cinder
> Date: Fri, 05 May 2017 21:07:27 +
> From: Amye Scavarda  
>  
> To: Eric Harney  
>  , Joe
> Julian  
>  , Vijay Bellur
>   
> 
> CC: Amye Scavarda  
>  
>
>
>
> Eric,
> I'm sorry to hear this.
> I'm reaching out internally (within Gluster CI team and CentOS CI
> which
> supports Gluster) to get an idea of the level of effort we'll need to
> provide to resolve this.
> It'll take me a few days to get this, but this is on my radar. In the
> meantime, is there somewhere I should be looking at for requirements
> to
> meet this gateway?
>
> Thanks!
> -- amye
>
> On Fri, May 5, 2017 at 16:09 Joe Julian   > wrote:
>
> On 05/05/2017 12:54 PM, Eric Harney wrote:
> >> On 04/28/2017 12:41 PM, Joe Julian wrote:
> >>> I learned, today, that GlusterFS was deprecated and removed
> from
> >>> Cinder as one of our #gluster (freenode) users was attempting
> to
> >>> upgrade openstack. I could find no rational nor discussion of
> that
> >>> removal. Could you please educate me about that decision?
> >>>
> >
> > Hi Joe,
> >
> > I can fill in on the rationale here.
> >
> > Keeping a driver in the Cinder tree requires running a CI
> platform to
> > test that driver and report results against all patchsets
> submitted to
> > Cinder.  This is a fairly large burden, which we could not meet
> once the
> > Gluster Cinder driver was no longer an active development target
> at
> Red Hat.
> >
> > This was communicated via a warning issued by the driver for
> anyone
> > running the OpenStack Newton code, and via the Cinder release
> notes for
> > the Ocata release.  (I can see in retrospect that this was
> probRecording of the meeting can be found at [3].ably not
> > communicated widely enough.)
> >
> > I apologize for not reaching out to the Gluster community about
> this.
> >
> > If someone from the Gluster world is interested in bringing this
> driver
> > back, I can help coordinate there.  But it will require someone
> stepping
> > in in a big way to maintain it.
> >
> > Thanks,
> > Eric
>
> Ah, Red Hat's statement that the acquisition of InkTank was not an
> abandonment of Gluster seems rather disingenuous now. I'm
> disappointed.
>
>
> I am a Red Hat employee working on gluster and I am happy with the kind of
> investments the company did in GlusterFS. Still am. It is a pretty good
> company and really open. I never had any trouble saying something the
> management did is wrong when I strongly felt and they would give a decent
> reason for their decision.
>
>
> Happy to hear that. Still looks like meddling to an outsider. Not the
> Gluster team's fault though (although more participation of the developers
> in community meetings would probably help with that feeling of being
> disconnected, in my own personal opinion).
>
>
> As a community, each member needs to make sure that their specific use
> case has the resources it needs to flourish. If some team cares about
> Gluster in openstack, they should step forward and 

Re: [Gluster-users] Gluster client mount fails in mid flight with signum 15

2017-06-01 Thread Niels de Vos
On Thu, Jun 01, 2017 at 01:52:23PM +, Gabriel Lindeborg wrote:
> This has been solved, as far as we can tell.
> 
> Problem was with KillUserProcesses=1 in logind.conf. This has shown to
> kill mounts made using mount -a booth by root and by any user with
> sudo at session logout.

Ah, yes, that could well be the cause of the problem.

> Hope this will anybody else who run into this.

Care to share how you solved it? Just disabling the option might not be
the most suitable approach. Did you convert it to systemd.mount units,
or maybe setup automounting with x-systemd.automount or autofs? Were
there considerations that made you choose one solution over an other?

Thanks!
Niels


> Thanks 4 all your help and
> cheers
> Gabbe
> 
> 1 juni 2017 kl. 09:24 skrev Gabriel Lindeborg 
> >:
> 
> All four clients did run 3.10.2 as well
> 
> The volumes has been running fine until we upgraded to 3.10, when we hit some 
> issues with port mismatches. We restarted all the volumes, the servers and 
> the clients and now hit this issue.
> We’ve since backed up the files, remove the volumes, removed the bricks, 
> removed gluster, installed glusterfs 3.7.20, created new volumes on new 
> bricks, restored the files and still hit the same issue at clients on the 
> nodes that also runs the servers. We’ve got to clients on connected to one of 
> the volumes that has been working fine all the time.
> 
> This is the debug logs from one of the mount as the client gets disconnected:
> The message "D [MSGID: 0] [dht-common.c:979:dht_revalidate_cbk] 0-mule-dht: 
> revalidate lookup of / returned with op_ret 0 [Structure needs cleaning]" 
> repeated 26 times between [2017-05-31 13:48:51.680757] and [2017-05-31 
> 13:50:46.325368]
> /DAEMON/DEBUG [2017-05-31T15:50:50.589272+02:00] [] [] 
> [logging.c:1830:gf_log_flush_timeout_cbk] 0-logging-infra: Log timer timed 
> out. About to flush outstanding messages if present
> /DAEMON/DEBUG [2017-05-31T15:50:50.589520+02:00] [] [] 
> [logging.c:1792:__gf_log_inject_timer_event] 0-logging-infra: Starting timer 
> now. Timeout = 120, current buf size = 5
> [2017-05-31 13:50:51.908797] D [MSGID: 0] 
> [dht-common.c:979:dht_revalidate_cbk] 0-mule-dht: revalidate lookup of / 
> returned with op_ret 0 [Structure needs cleaning]
> /DAEMON/DEBUG [2017-05-31T15:51:24.592190+02:00] [] [] 
> [rpc-clnt-ping.c:300:rpc_clnt_start_ping] 0-mule-client-0: returning as 
> transport is already disconnected OR there are no frames (0 || 0)
> /DAEMON/DEBUG [2017-05-31T15:51:24.592469+02:00] [] [] 
> [rpc-clnt-ping.c:300:rpc_clnt_start_ping] 0-mule-client-1: returning as 
> transport is already disconnected OR there are no frames (0 || 0)
> /DAEMON/DEBUG [2017-05-31T15:51:26.324867+02:00] [] [] 
> [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (--> 
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f36b3260192] (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7f36b302f9db] 
> (--> /lib
> 64/libgfrpc.so.0(+0x13fd4)[0x7f36b302ffd4] (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x451)[0x7f36b302cf01] (--> 
> /usr/lib64/glusterfs/3.7.20/xlator/protocol/client.so(client_submit_request+0x1fc)[0x7f36a599c33c]
>  ) 0-: 10.3.48.179:49155: ping timer event already remove
> d
> /DAEMON/DEBUG [2017-05-31T15:51:26.325230+02:00] [] [] 
> [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (--> 
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f36b3260192] (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7f36b302f9db] 
> (--> /lib
> 64/libgfrpc.so.0(+0x13fd4)[0x7f36b302ffd4] (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x451)[0x7f36b302cf01] (--> 
> /usr/lib64/glusterfs/3.7.20/xlator/protocol/client.so(client_submit_request+0x1fc)[0x7f36a599c33c]
>  ) 0-: 10.3.48.180:49155: ping timer event already remove
> d
> /DAEMON/DEBUG [2017-05-31T15:52:08.595536+02:00] [] [] 
> [rpc-clnt-ping.c:300:rpc_clnt_start_ping] 0-mule-client-0: returning as 
> transport is already disconnected OR there are no frames (0 || 0)
> /DAEMON/DEBUG [2017-05-31T15:52:08.595735+02:00] [] [] 
> [rpc-clnt-ping.c:300:rpc_clnt_start_ping] 0-mule-client-1: returning as 
> transport is already disconnected OR there are no frames (0 || 0)
> /DAEMON/DEBUG [2017-05-31T15:52:12.059895+02:00] [] [] 
> [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] (--> 
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f36b3260192] (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_remove_ping_timer_locked+0x8b)[0x7f36b302f9db] 
> (--> /lib
> 64/libgfrpc.so.0(+0x13fd4)[0x7f36b302ffd4] (--> 
> /lib64/libgfrpc.so.0(rpc_clnt_submit+0x451)[0x7f36b302cf01] (--> 
> /usr/lib64/glusterfs/3.7.20/xlator/protocol/client.so(client_submit_request+0x1fc)[0x7f36a599c33c]
>  ) 0-: 10.3.48.179:49155: ping timer event already remove
> d
> /DAEMON/DEBUG [2017-05-31T15:52:12.060170+02:00] [] [] 
> [rpc-clnt-ping.c:98:rpc_clnt_remove_ping_timer_locked] 

[Gluster-users] Release 3.12: Scope and calendar!

2017-06-01 Thread Shyam

Hi,

Here are some top reminders for the 3.12 release:

1) When 3.12 is released 3.8 will be EOL'd, hence users are encouraged 
to prepare for the same as per the calendar posted here.


2) 3.12 is a long term maintenance (LTM) release, and potentially the 
last in the 3.x line of Gluster!


3) From this release onward, the feature freeze date is moved ~45 days 
in advance, before the release. Hence, for this one release you will 
have lesser time to get your features into the release.


Release calendar:

- Feature freeze, or branching date: July 17th, 2017
   - All feature post this date need exceptions granted to make it into 
the 3.12 release


- Release date: August 30th, 2017

Release owners:

- Shyam
-  Any volunteers?

Features and major changes process in a nutshell:
1) Open a github issue

2) Refer the issue # in the commit messages of all changes against the 
feature (specs, code, tests, docs, release notes) (refer to the issue as 
"updates gluster/glusterfs#N" where N is the issue)


3) We will ease out release-notes updates form this release onward. 
Still thinking how to get that done, but the intention is that a 
contributor can update release notes before/on/after completion of the 
feature and not worry about branching dates etc. IOW, you can control 
when you are done, than the release dates controlling the same for you.


Thanks,
Shyam
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] upgrade from 3.8.12 to 3.9.x - to do or...

2017-06-01 Thread Kaleb S. KEITHLEY

On 06/01/2017 06:29 AM, lejeczek wrote:

hi everybody

I'd like to ask before I migrate - any issues with upping to 3.9.x from 
3.8.12 ?

Anything especially important in changelog?
Or maybe go to 3.10.x ? For it has something great & new?


3.9 was a Short Term Maintenance (STM) release. It reached end-of-life 
when 3.10 was released.


I suggest you upgrade to 3.10.

--

Kaleb

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Empty info file preventing glusterd from starting

2017-06-01 Thread ABHISHEK PALIWAL
Hi Niels,

I have backported that patch on Gluster 3.7.6 and we haven't seen any other
issue due to that patch.

Everything is fine till now in our testing and its going on extensively.

Regards,
Abhishek

On Thu, Jun 1, 2017 at 1:46 PM, Niels de Vos  wrote:

> On Thu, Jun 01, 2017 at 01:03:25PM +0530, ABHISHEK PALIWAL wrote:
> > Hi Niels,
> >
> > No problem we wil try to backport that patch on 3.7.6.
> >
> > Could you please let me know in which release Gluster community is going
> to
> > provide this patch and date of that release?
>
> It really depends on when someone has time to work on it. Our releases
> are time based, and will happen even when a bugfix/feature is not merged
> or implemented. We can't give any guarantees about availability for
> final patche (or backports).
>
> The best you can do is help testing a potential fix, and work with the
> developer(s) of that patch to improve and get it accepted in the master
> branch. If developers do not have time to work on it, or progress is
> slow, you can ask them if you can take it over from if you are
> comfortable with writing the code.
>
> Niels
>
>
> >
> > Regards,
> > Abhishek
> >
> > On Wed, May 31, 2017 at 10:05 PM, Niels de Vos 
> wrote:
> >
> > > On Wed, May 31, 2017 at 04:08:06PM +0530, ABHISHEK PALIWAL wrote:
> > > > We are using 3.7.6 and on link https://review.gluster.org/#/c/16279
> > > status
> > > > is "can't merge"
> > >
> > > Note that 3.7.x will not get any updates anymore. We currently maintain
> > > version 3.8.x, 3.10.x and 3.11.x. See the release schedele for more
> > > details:
> > >   https://www.gluster.org/community/release-schedule/
> > >
> > > Niels
> > >
> > >
> > > >
> > > > On Wed, May 31, 2017 at 4:05 PM, Amar Tumballi 
> > > wrote:
> > > >
> > > > > This is already part of 3.11.0 release?
> > > > >
> > > > > On Wed, May 31, 2017 at 3:47 PM, ABHISHEK PALIWAL <
> > > abhishpali...@gmail.com
> > > > > > wrote:
> > > > >
> > > > >> Hi Atin,
> > > > >>
> > > > >> Could you please let us know any time plan for deliver of this
> patch.
> > > > >>
> > > > >> Regards,
> > > > >> Abhishek
> > > > >>
> > > > >> On Tue, May 9, 2017 at 6:37 PM, ABHISHEK PALIWAL <
> > > abhishpali...@gmail.com
> > > > >> > wrote:
> > > > >>
> > > > >>> Actually it is very risky if it will reproduce in production
> thats is
> > > > >>> why I said it is on high priority as want to resolve it before
> > > production.
> > > > >>>
> > > > >>> On Tue, May 9, 2017 at 6:20 PM, Atin Mukherjee <
> amukh...@redhat.com>
> > > > >>> wrote:
> > > > >>>
> > > > 
> > > > 
> > > >  On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <
> > > >  abhishpali...@gmail.com> wrote:
> > > > 
> > > > > Hi Atin,
> > > > >
> > > > > Thanks for your reply.
> > > > >
> > > > >
> > > > > Its urgent because this error is very rarely reproducible we
> have
> > > seen
> > > > > this 2 3 times in our system till now.
> > > > >
> > > > > We have delivery in near future so that we want it asap. Please
> > > try to
> > > > > review it internally.
> > > > >
> > > > 
> > > >  I don't think your statements justified the reason of urgency
> as (a)
> > > >  you have mentioned it to be *rarely* reproducible and (b) I am
> still
> > > >  waiting for a real use case where glusterd will go through
> multiple
> > > >  restarts in a loop?
> > > > 
> > > > 
> > > > > Regards,
> > > > > Abhishek
> > > > >
> > > > > On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <
> > > amukh...@redhat.com>
> > > > > wrote:
> > > > >
> > > > >>
> > > > >>
> > > > >> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <
> > > > >> abhishpali...@gmail.com> wrote:
> > > > >>
> > > > >>> + Muthu-vingeshwaran
> > > > >>>
> > > > >>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL <
> > > > >>> abhishpali...@gmail.com> wrote:
> > > > >>>
> > > >  Hi Atin/Team,
> > > > 
> > > >  We are using gluster-3.7.6 with setup of two brick and while
> > > >  restart of system I have seen that the glusterd daemon is
> > > getting failed
> > > >  from start.
> > > > 
> > > > 
> > > >  At the time of analyzing the logs from
> etc-glusterfs...log
> > > file
> > > >  I have received the below logs
> > > > 
> > > > 
> > > >  [2017-05-06 03:33:39.798087] I [MSGID: 100030]
> > > >  [glusterfsd.c:2348:main] 0-/usr/sbin/glusterd: Started
> running
> > > >  /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd
> -p
> > > >  /var/run/glusterd.pid --log-level INFO)
> > > >  [2017-05-06 03:33:39.807859] I [MSGID: 106478]
> > > >  [glusterd.c:1350:init] 0-management: Maximum allowed open
> file
> > > descriptors
> > > >  set to 65536
> > > >  [2017-05-06 03:33:39.807974] I [MSGID: 

Re: [Gluster-users] [Gluster-devel] Empty info file preventing glusterd from starting

2017-06-01 Thread Niels de Vos
On Thu, Jun 01, 2017 at 01:03:25PM +0530, ABHISHEK PALIWAL wrote:
> Hi Niels,
> 
> No problem we wil try to backport that patch on 3.7.6.
> 
> Could you please let me know in which release Gluster community is going to
> provide this patch and date of that release?

It really depends on when someone has time to work on it. Our releases
are time based, and will happen even when a bugfix/feature is not merged
or implemented. We can't give any guarantees about availability for
final patche (or backports).

The best you can do is help testing a potential fix, and work with the
developer(s) of that patch to improve and get it accepted in the master
branch. If developers do not have time to work on it, or progress is
slow, you can ask them if you can take it over from if you are
comfortable with writing the code.

Niels


> 
> Regards,
> Abhishek
> 
> On Wed, May 31, 2017 at 10:05 PM, Niels de Vos  wrote:
> 
> > On Wed, May 31, 2017 at 04:08:06PM +0530, ABHISHEK PALIWAL wrote:
> > > We are using 3.7.6 and on link https://review.gluster.org/#/c/16279
> > status
> > > is "can't merge"
> >
> > Note that 3.7.x will not get any updates anymore. We currently maintain
> > version 3.8.x, 3.10.x and 3.11.x. See the release schedele for more
> > details:
> >   https://www.gluster.org/community/release-schedule/
> >
> > Niels
> >
> >
> > >
> > > On Wed, May 31, 2017 at 4:05 PM, Amar Tumballi 
> > wrote:
> > >
> > > > This is already part of 3.11.0 release?
> > > >
> > > > On Wed, May 31, 2017 at 3:47 PM, ABHISHEK PALIWAL <
> > abhishpali...@gmail.com
> > > > > wrote:
> > > >
> > > >> Hi Atin,
> > > >>
> > > >> Could you please let us know any time plan for deliver of this patch.
> > > >>
> > > >> Regards,
> > > >> Abhishek
> > > >>
> > > >> On Tue, May 9, 2017 at 6:37 PM, ABHISHEK PALIWAL <
> > abhishpali...@gmail.com
> > > >> > wrote:
> > > >>
> > > >>> Actually it is very risky if it will reproduce in production thats is
> > > >>> why I said it is on high priority as want to resolve it before
> > production.
> > > >>>
> > > >>> On Tue, May 9, 2017 at 6:20 PM, Atin Mukherjee 
> > > >>> wrote:
> > > >>>
> > > 
> > > 
> > >  On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <
> > >  abhishpali...@gmail.com> wrote:
> > > 
> > > > Hi Atin,
> > > >
> > > > Thanks for your reply.
> > > >
> > > >
> > > > Its urgent because this error is very rarely reproducible we have
> > seen
> > > > this 2 3 times in our system till now.
> > > >
> > > > We have delivery in near future so that we want it asap. Please
> > try to
> > > > review it internally.
> > > >
> > > 
> > >  I don't think your statements justified the reason of urgency as (a)
> > >  you have mentioned it to be *rarely* reproducible and (b) I am still
> > >  waiting for a real use case where glusterd will go through multiple
> > >  restarts in a loop?
> > > 
> > > 
> > > > Regards,
> > > > Abhishek
> > > >
> > > > On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <
> > amukh...@redhat.com>
> > > > wrote:
> > > >
> > > >>
> > > >>
> > > >> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <
> > > >> abhishpali...@gmail.com> wrote:
> > > >>
> > > >>> + Muthu-vingeshwaran
> > > >>>
> > > >>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL <
> > > >>> abhishpali...@gmail.com> wrote:
> > > >>>
> > >  Hi Atin/Team,
> > > 
> > >  We are using gluster-3.7.6 with setup of two brick and while
> > >  restart of system I have seen that the glusterd daemon is
> > getting failed
> > >  from start.
> > > 
> > > 
> > >  At the time of analyzing the logs from etc-glusterfs...log
> > file
> > >  I have received the below logs
> > > 
> > > 
> > >  [2017-05-06 03:33:39.798087] I [MSGID: 100030]
> > >  [glusterfsd.c:2348:main] 0-/usr/sbin/glusterd: Started running
> > >  /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p
> > >  /var/run/glusterd.pid --log-level INFO)
> > >  [2017-05-06 03:33:39.807859] I [MSGID: 106478]
> > >  [glusterd.c:1350:init] 0-management: Maximum allowed open file
> > descriptors
> > >  set to 65536
> > >  [2017-05-06 03:33:39.807974] I [MSGID: 106479]
> > >  [glusterd.c:1399:init] 0-management: Using /system/glusterd as
> > working
> > >  directory
> > >  [2017-05-06 03:33:39.826833] I [MSGID: 106513]
> > >  [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
> > >  retrieved op-version: 30706
> > >  [2017-05-06 03:33:39.827515] E [MSGID: 106206]
> > >  [glusterd-store.c:2562:glusterd_store_update_volinfo]
> > >  0-management: Failed to get next store iter
> > >  [2017-05-06 03:33:39.827563] E [MSGID: 106207]
> 

Re: [Gluster-users] Floating IPv6 in a cluster (as NFS-Ganesha VIP)

2017-06-01 Thread Jan
Hi all,

thank you very much for support! I filed the bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1457724

I'll try to test it again to get some errors / warnings from log.

Best regards,

Jan

On Wed, May 31, 2017 at 12:25 PM, Kaleb S. KEITHLEY 
wrote:

> On 05/31/2017 07:03 AM, Soumya Koduri wrote:
> > +Andrew and Ken
> >
> > On 05/29/2017 11:48 PM, Jan wrote:
> >> Hi all,
> >>
> >> I love this project, Gluster and Ganesha are amazing. Thank you for this
> >> great work!
> >>
> >> The only thing that I miss is IPv6 support. I know that there are some
> >> challenges and that’s OK. For me it’s not important whether Gluster
> >> servers use IPv4 or IPv6 to speak each other and replicate data.
> >>
> >> The only thing that I’d like to have is a floating IPv6 for clients when
> >> I use Ganesha (just IPv6, dual stack isn’t needed).
> >>
> >> I tested it and I put IPv6 into ganesha-ha.conf instead of IPv4 and it
> >> didn’t work. But I think that it might work since Ganesha supports IPv6:
> >>
> >> netstat -plnt
> >>
> >> tcp6   00 :::2049:::*  LISTEN  1856/ganesha.nfsd
> >>
> >> Is there a way how to do that? Maybe build a cluster with IPv4 and then
> >> change “something” in Pacemaker / Corosync and replace IPv4 by IPv6?
> >>
> >
> > At-least from [1] looks like it is supported. Do you see any
> > errors/warnings in the log files? (/var/log/messages,
> > /var/log/pacemaker.log, /var/log/corosync.log)
> >
> >
> > [1] https://www.systutorials.com/docs/linux/man/7-ocf_heartbeat_IPaddr2/
> >
>
> /usr/lib/ocf/resource.d/heartbeat/IPaddr2 does support IPv6:
>
>   ...
>   Manages virtual IPv4 and IPv6 addresses (Linux
> specific version)
>
>   
>   
>   
>   The IPv4 (dotted quad notation) or IPv6 address (colon hexadecimal
> notation)
>   example IPv4 "192.168.1.1".
>   example IPv6 "2001:db8:DC28:0:0:FC57:D4C8:1FFF".
>   
>   ...
>
>
> If it's not working I suspect the ganesha-ha.sh script may not handle
> IPv6 addrs from the ganesha-ha.conf correctly.
>
> Please file a bug at
> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS component:
> common-ha, version: 3.10.
>
> Patches are nice too. ;-)
>
> Thanks
>
> --
>
> Kaleb
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Empty info file preventing glusterd from starting

2017-06-01 Thread ABHISHEK PALIWAL
Hi Niels,

No problem we wil try to backport that patch on 3.7.6.

Could you please let me know in which release Gluster community is going to
provide this patch and date of that release?

Regards,
Abhishek

On Wed, May 31, 2017 at 10:05 PM, Niels de Vos  wrote:

> On Wed, May 31, 2017 at 04:08:06PM +0530, ABHISHEK PALIWAL wrote:
> > We are using 3.7.6 and on link https://review.gluster.org/#/c/16279
> status
> > is "can't merge"
>
> Note that 3.7.x will not get any updates anymore. We currently maintain
> version 3.8.x, 3.10.x and 3.11.x. See the release schedele for more
> details:
>   https://www.gluster.org/community/release-schedule/
>
> Niels
>
>
> >
> > On Wed, May 31, 2017 at 4:05 PM, Amar Tumballi 
> wrote:
> >
> > > This is already part of 3.11.0 release?
> > >
> > > On Wed, May 31, 2017 at 3:47 PM, ABHISHEK PALIWAL <
> abhishpali...@gmail.com
> > > > wrote:
> > >
> > >> Hi Atin,
> > >>
> > >> Could you please let us know any time plan for deliver of this patch.
> > >>
> > >> Regards,
> > >> Abhishek
> > >>
> > >> On Tue, May 9, 2017 at 6:37 PM, ABHISHEK PALIWAL <
> abhishpali...@gmail.com
> > >> > wrote:
> > >>
> > >>> Actually it is very risky if it will reproduce in production thats is
> > >>> why I said it is on high priority as want to resolve it before
> production.
> > >>>
> > >>> On Tue, May 9, 2017 at 6:20 PM, Atin Mukherjee 
> > >>> wrote:
> > >>>
> > 
> > 
> >  On Tue, May 9, 2017 at 6:10 PM, ABHISHEK PALIWAL <
> >  abhishpali...@gmail.com> wrote:
> > 
> > > Hi Atin,
> > >
> > > Thanks for your reply.
> > >
> > >
> > > Its urgent because this error is very rarely reproducible we have
> seen
> > > this 2 3 times in our system till now.
> > >
> > > We have delivery in near future so that we want it asap. Please
> try to
> > > review it internally.
> > >
> > 
> >  I don't think your statements justified the reason of urgency as (a)
> >  you have mentioned it to be *rarely* reproducible and (b) I am still
> >  waiting for a real use case where glusterd will go through multiple
> >  restarts in a loop?
> > 
> > 
> > > Regards,
> > > Abhishek
> > >
> > > On Tue, May 9, 2017 at 5:58 PM, Atin Mukherjee <
> amukh...@redhat.com>
> > > wrote:
> > >
> > >>
> > >>
> > >> On Tue, May 9, 2017 at 3:37 PM, ABHISHEK PALIWAL <
> > >> abhishpali...@gmail.com> wrote:
> > >>
> > >>> + Muthu-vingeshwaran
> > >>>
> > >>> On Tue, May 9, 2017 at 11:30 AM, ABHISHEK PALIWAL <
> > >>> abhishpali...@gmail.com> wrote:
> > >>>
> >  Hi Atin/Team,
> > 
> >  We are using gluster-3.7.6 with setup of two brick and while
> >  restart of system I have seen that the glusterd daemon is
> getting failed
> >  from start.
> > 
> > 
> >  At the time of analyzing the logs from etc-glusterfs...log
> file
> >  I have received the below logs
> > 
> > 
> >  [2017-05-06 03:33:39.798087] I [MSGID: 100030]
> >  [glusterfsd.c:2348:main] 0-/usr/sbin/glusterd: Started running
> >  /usr/sbin/glusterd version 3.7.6 (args: /usr/sbin/glusterd -p
> >  /var/run/glusterd.pid --log-level INFO)
> >  [2017-05-06 03:33:39.807859] I [MSGID: 106478]
> >  [glusterd.c:1350:init] 0-management: Maximum allowed open file
> descriptors
> >  set to 65536
> >  [2017-05-06 03:33:39.807974] I [MSGID: 106479]
> >  [glusterd.c:1399:init] 0-management: Using /system/glusterd as
> working
> >  directory
> >  [2017-05-06 03:33:39.826833] I [MSGID: 106513]
> >  [glusterd-store.c:2047:glusterd_restore_op_version] 0-glusterd:
> >  retrieved op-version: 30706
> >  [2017-05-06 03:33:39.827515] E [MSGID: 106206]
> >  [glusterd-store.c:2562:glusterd_store_update_volinfo]
> >  0-management: Failed to get next store iter
> >  [2017-05-06 03:33:39.827563] E [MSGID: 106207]
> >  [glusterd-store.c:2844:glusterd_store_retrieve_volume]
> >  0-management: Failed to update volinfo for c_glusterfs volume
> >  [2017-05-06 03:33:39.827625] E [MSGID: 106201]
> >  [glusterd-store.c:3042:glusterd_store_retrieve_volumes]
> >  0-management: Unable to restore volume: c_glusterfs
> >  [2017-05-06 03:33:39.827722] E [MSGID: 101019]
> >  [xlator.c:428:xlator_init] 0-management: Initialization of
> volume
> >  'management' failed, review your volfile again
> >  [2017-05-06 03:33:39.827762] E [graph.c:322:glusterfs_graph_
> init]
> >  0-management: initializing translator failed
> >  [2017-05-06 03:33:39.827784] E [graph.c:661:glusterfs_graph_
> activate]
> >  0-graph: init failed
> >  [2017-05-06 03:33:39.828396] W [glusterfsd.c:1238:cleanup_
> and_exit]
> 

Re: [Gluster-users] Heal operation detail of EC volumes

2017-06-01 Thread Xavier Hernandez

Hi Serkan,

On 30/05/17 10:22, Serkan Çoban wrote:

Ok I understand that heal operation takes place on server side. In
this case I should see X KB
 out network traffic from 16 servers and 16X KB input traffic to the
failed brick server right? So that process will get 16 chunks
recalculate our chunk and write it to disk.


That should be the normal operation for a single heal.


The problem is I am not seeing such kind of traffic on servers. In my
configuration (16+4 EC) I see 20 servers are all have 7-8MB outbound
traffic and none of them has more than 10MB incoming traffic.
Only heal operation is happening on cluster right now, no client/other
traffic. I see constant 7-8MB write to healing brick disk. So where is
the missing traffic?


Not sure about your configuration, but probably you are seeing the 
result of having the SHD of each server doing heals. That would explain 
the network traffic you have.


Suppose that all SHD but the one on the damaged brick are working. In 
this case 19 servers will peek 16 fragments each. This gives 19 * 16 = 
304 fragments to be requested. EC balances the reads among all available 
servers, and there's a chance (1/19) that a fragment is local to the 
server asking it. So we'll need a total of 304 - 304 / 19 = 288 network 
requests, 288 / 19 = 15.2 sent by each server.


If we have a total of 288 requests, it means that each server will 
answer 288 / 19 = 15.2 requests. The net effect of all this is that each 
healthy server is sending 15.2*X bytes of data and each server is 
receiving 15.2*X bytes of data.


Now we need to account for the writes to the damaged brick. We have 19 
simultaneous heals. This means that the damaged brick will receive 19*X 
bytes of data, and each healthy server will send X additional bytes of data.


So:

A healthy server receives 15.2*X bytes of data
A healthy server sends 16.2*X bytes of data
A damaged server receives 19*X bytes of data
A damaged server sends few bytes of data (communication and 
synchronization overhead basically)


As you can see, in this configuration each server has almost the same 
amount of inbound and outbound traffic. Only big difference is the 
damaged brick, that should receive a little more of traffic, but it 
should send much less.


Is it possible that this matches your observations ?

There's one more thing to consider here, and it's the apparent low 
throughput of self-heal. One possible thing to check is the small size 
and random behavior of the requests.


Assuming that each request has a size of ~128 / 16 = 8KB, at a rate of 
~8 MB/s the servers are processing ~1000 IOPS. Since requests are going 
to 19 different files, even if each file is accessed sequentially, the 
real effect will be like random access (some read-ahead on the 
filesystem can improve reads a bit, but writes won't benefit so much).


How many IOPS can handle your bricks ?

Do you have a test environment where we could check all this ? if 
possible it would be interesting to have only a single SHD (kill all SHD 
from all servers but one). In this situation, without client accesses, 
we should see the 16/1 ratio of reads vs writes on the network. We 
should also see a similar of even a little better speed because all 
reads and writes will be sequential, optimizing available IOPS.


There's a feature to allow to configure the self-heal block size to 
optimize these cases. The feature is available on 3.11.


Best regards,

Xavi



On Tue, May 30, 2017 at 10:25 AM, Ashish Pandey  wrote:


When we say client side heal or server side heal, we basically talking about
the side which "triggers" heal of a file.

1 - server side heal - shd scans indices and triggers heal

2 - client side heal - a fop finds that file needs heal and it triggers heal
for that file.

Now, what happens when heal gets triggered.
In both  the cases following functions takes part -

ec_heal => ec_heal_throttle=>ec_launch_heal

Now ec_launch_heal just creates heal tasks (with ec_synctask_heal_wrap which
calls ec_heal_do ) and put it into a queue.
This happens on server and "syncenv" infrastructure which is nothing but a
set of workers pick these tasks and execute it. That is when actual
read/write for
heal happens.



From: "Serkan Çoban" 
To: "Ashish Pandey" 
Cc: "Gluster Users" 
Sent: Monday, May 29, 2017 6:44:50 PM
Subject: Re: [Gluster-users] Heal operation detail of EC volumes



Healing could be triggered by client side (access of file) or server side
(shd).
However, in both the cases actual heal starts from "ec_heal_do" function.

If I do a recursive getfattr operation from clients, then all heal
operation is done on clients right? Client read the chunks, calculate
and write the missing chunk.
And If I don't access files from client then SHD daemons will start
heal and read,calculate,write the missing chunks right?

In first case EC 

[Gluster-users] Restore a node in a replicating Gluster setup after data loss

2017-06-01 Thread Niklaus Hofer

Hi

We have a Replica 2 + Arbiter Gluster setup with 3 nodes Server1, 
Server2 and Server3 where Server3 is the Arbiter node. There are several 
Gluster volumes ontop of that setup. They all look a bit like this:


gluster volume info gv-tier1-vm-01

[...]
Number of Bricks: 1 x (2 + 1) = 3
[...]
Bricks:
Brick1: Server1:/var/data/lv-vm-01
Brick2: Server2:/var/data/lv-vm-01
Brick3: Server3:/var/data/lv-vm-01/brick (arbiter)
[...]
cluster.data-self-heal-algorithm: full
[...]

We took down Server2 because we needed to do maintenance on this 
server's storage. During maintenance work, we ended up having to 
completely rebuild the storage on Server2. This means that 
"/var/data/lv-vm-01" on Server2 is now empty. However, all the Gluster 
Metadata in "/var/lib/glusterd/" is still in tact. Gluster has not been 
started on Server2.


Here is what our sample gluster volume currently looks like on the still 
active nodes:


gluster volume status gv-tier1-vm-01

Status of volume: gv-tier1-vm-01
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick Server1:/var/data/lv-vm-0149204 0  Y 
22775
Brick Server3:/var/data/lv-vm-01/brick  49161 0  Y 
15334
Self-heal Daemon on localhost   N/A   N/AY 
19233
Self-heal Daemon on Server3 N/A   N/AY 
20839



Now we would like to rebuild the data on Server2 from the still in tact 
data on Server1. That is to say, we hope to start up Gluster on Server2 
in such a way that it will sync the data from Server1 back. If at all 
possible, the Gluster cluster should stay up during this process and 
access to the Gluster volumes should not be interrupted.


What is the correct / recommended way of doing this?

Greetings
Niklaus Hofer
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
niklaus.ho...@stepping-stone.ch
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] "Another Transaction is in progres..."

2017-06-01 Thread Krist van Besien
Thanks for the suggestion, this solved it for us, and we probably found the
cause as well. We had performance co-pilot running and it was continously
enabling profiling on volumes...
We found the reference to the node that had the lock, and restarted
glusterd on that node, and all went well from there on.

Krist


On 31 May 2017 at 15:56, Vijay Bellur  wrote:

>
>
> On Wed, May 31, 2017 at 9:32 AM, Krist van Besien 
> wrote:
>
>> Hi all,
>>
>> I am trying to do trivial things, like setting quota, or just querying
>> the status and keep getting
>>
>> "Another transaction is in progres for "
>>
>> These messages pop up, then disappear for a while, then pop up again...
>>
>> What do these messages mean? How do I figure out which "transaction" is
>> meant here, and what do I do about it?
>>
>
>
> This message usually means that a different gluster command is being
> executed in the cluster. Most gluster commands are serialized by a cluster
> wide lock. Upon not being able to acquire the cluster lock, this message is
> displayed.
>
> You can check /var/log/glusterfs/cmd_history.log on all storage nodes to
> observe what other commands are in progress at the time of getting this
> error message. Are you per chance using oVirt to manage Gluster? oVirt
> periodically does a "gluster volume status" to determine the volume health
> and that can conflict with other commands being executed.
>
> Regards,
> Vijay
>
>


-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Disconnected gluster node things it is still connected...

2017-06-01 Thread Krist van Besien
Hi all,

Trying to do some availability testing.

We have three nodes: node1, node2, node3. Volumes are all replica 2, across
all three nodes.

As a test we disconnected node1, buy removing the vlan tag for that host on
the switch it is connected to. As a result node2 and node3 now show node1
in disconnected status, and show the volumes as degraded.
This is ecpected.

However logging in to node1 (via the ilo, as there is no network) showed
that this node still though it was connected to node2 and node3, even
though it could no longer communicate with it.

Also it did keep its bricks up...

This is not as expected. What I expected is that node1 detects it is no
longer part of a quorum, and takes all its bricks down.

So what did we miss?

Krist




-- 
Vriendelijke Groet |  Best Regards | Freundliche Grüße | Cordialement
--
Krist van Besien | Senior Architect | Red Hat EMEA Cloud Practice | RHCE |
RHCSA Open Stack
@: kr...@redhat.com | M: +41-79-5936260
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users