Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

2017-12-22 Thread Henrik Juul Pedersen
Hi Karthik,

Thanks for the info. Maybe the documentation should be updated to
explain the different AFR versions, I know I was confused.

Also, when looking at the changelogs from my three bricks before fixing:

Brick 1:
trusted.afr.virt_images-client-1=0x0228
trusted.afr.virt_images-client-3=0x

Brick 2:
trusted.afr.virt_images-client-2=0x03ef
trusted.afr.virt_images-client-3=0x

Brick 3 (arbiter):
trusted.afr.virt_images-client-1=0x0228

I would think that the changelog for client 1 should win by majority
vote? Or how does the self-healing process work?
I assumed this as the correct version, and reset client 2 on brick 2:
# setfattr -n trusted.afr.virt_images-client-2 -v
0x fedora27.qcow2

I then did a directory listing, which might have started a heal, but
heal statistics show (i also did a full heal):
Starting time of crawl: Fri Dec 22 11:34:47 2017

Ending time of crawl: Fri Dec 22 11:34:47 2017

Type of crawl: INDEX
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 1

Starting time of crawl: Fri Dec 22 11:39:29 2017

Ending time of crawl: Fri Dec 22 11:39:29 2017

Type of crawl: FULL
No. of entries healed: 0
No. of entries in split-brain: 0
No. of heal failed entries: 1

I was immediately able to touch the file, so gluster was okay about
it, however heal info still showed the file for a while:
# gluster volume heal virt_images info
Brick virt3:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1

Brick virt2:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1

Brick printserver:/data/virt_images/brick
/fedora27.qcow2
Status: Connected
Number of entries: 1



Now heal info shows 0 entries, and the two data bricks have the same
md5sum, so it's back in sync.



I have a few questions after all of this:

1) How can a split brain happen in a replica 3 arbiter 1 setup with
both server- and client quorum enabled?
2) Why was it not able to self heal, when tro bricks seemed in sync
with their changelogs?
3) Why could I not see the file in heal info split-brain?
4) Why could I not fix this through the cli split-brain resolution tool?
5) Is it possible to force a sync in a volume? Or maybe test sync
status? It might be smart to be able to "flush" changes when taking a
brick down for maintenance.
6) How am I supposed to monitor events like this? I have a gluster
volume with ~500.000 files, I need to be able to guarantee data
integrity and availability to the users.
7) Is glusterfs "production ready"? Because I find it hard to monitor
and thus trust in these setups. Also performance with small / many
files seems horrible at best - but that's for another discussion.

Thanks for all of your help, Ill continue to try and tweak some
performance out of this. :)

Best regards,
Henrik Juul Pedersen
LIAB ApS

On 22 December 2017 at 07:26, Karthik Subrahmanya  wrote:
> Hi Henrik,
>
> Thanks for providing the required outputs. See my replies inline.
>
> On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen  wrote:
>>
>> Hi Karthik and Ben,
>>
>> I'll try and reply to you inline.
>>
>> On 21 December 2017 at 07:18, Karthik Subrahmanya 
>> wrote:
>> > Hey,
>> >
>> > Can you give us the volume info output for this volume?
>>
>> # gluster volume info virt_images
>>
>> Volume Name: virt_images
>> Type: Replicate
>> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594
>> Status: Started
>> Snapshot Count: 2
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: virt3:/data/virt_images/brick
>> Brick2: virt2:/data/virt_images/brick
>> Brick3: printserver:/data/virt_images/brick (arbiter)
>> Options Reconfigured:
>> features.quota-deem-statfs: on
>> features.inode-quota: on
>> features.quota: on
>> features.barrier: disable
>> features.scrub: Active
>> features.bitrot: on
>> nfs.rpc-auth-allow: on
>> server.allow-insecure: on
>> user.cifs: off
>> features.shard: off
>> cluster.shd-wait-qlength: 1
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> performance.low-prio-threads: 32
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> nfs.disable: on
>> transport.address-family: inet
>> server.outstanding-rpc-limit: 512
>>
>> > Why are you not able to get the xattrs from arbiter brick? It is the
>> > same
>> > way as you do it on data bricks.
>>
>> Yes I must have confused myself yesterday somehow, here it is in full
>> from all three bricks:
>>
>> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2
>> # file: fedora27.qcow2
>> trusted.afr.dirty=0x
>> trusted.afr.virt_images-client-1=0x0228

[Gluster-users] Exact purpose of network.ping-timeout

2017-12-22 Thread Omar Kohl
Hi,

I have a question regarding the "ping-timeout" option. I have been researching 
its purpose for a few days and it is not completely clear to me. Especially 
that it is apparently strongly encouraged by the Gluster community not to 
change or at least decrease this value!

Assuming that I set ping-timeout to 10 seconds (instead of the default 42) this 
would mean that if I have a network outage of 11 seconds then Gluster 
internally would have to re-allocate some resources that it freed after the 10 
seconds, correct? But apart from that there are no negative implications, are 
there? For instance if I'm copying files during the network outage then those 
files will continue copying after those 11 seconds.

This means that the only purpose of ping-timeout is to save those extra 
resources that are used by "short" network outages. Is that correct?

If I am confident that my network will not have many 11 second outages and if 
they do occur I am willing to incur those extra costs due to resource 
allocation is there any reason not to set ping-timeout to 10 seconds?

The problem I have with a long ping-timeout is that the Windows Samba Client 
disconnects after 25 seconds. So if one of the nodes of a Gluster cluster shuts 
down ungracefully then the Samba Client disconnects and the file that was being 
copied is incomplete on the server. These "costs" seem to be much higher than 
the potential costs of those Gluster resource re-allocations. But it is hard to 
estimate because there is not clear documentation what exactly those Gluster 
costs are.

In general I would be very interested in a comprehensive explanation of 
ping-timeout and the up- and downsides of setting high or low values for it.

Kinds regards,
Omar
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Announcing Glustered 2018 in Bologna (IT)

2017-12-22 Thread Ivan Rossi
We are happy to announce that Glustered 2018, a Gluster community meeting,
will take place on  March 8th 2018 in Bologna (Italy), back-to-back with
Incontro Devops Italia
(http://2018.incontrodevops.it) and in the same venue as the main event.

http://www.incontrodevops.it/events/glustered-2018/

The tentative schedule will have a (confirmed) keynote by Niels de Vos,
plus technical talks, use cases presentations and/or community space.

Call for Papers is now open: please  send proposals to i...@biodec.com.

Bologna is well connected, by cheap direct flights, to most of the major
European airports, thus there is the potential to grow above being a local
event and to have a nice European meetup of the community. Please help
making the event a success by submitting proposals, it is also on you...

Tickets are free, but registration will be required (limited room).

Merry christmas and happy new year .

Ivan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware

2017-12-22 Thread Karthik Subrahmanya
Hey Henrik,

Good to know that the issue got resolved. I will try to answer some of the
questions you have.
- The time taken to heal the file depends on its size. That's why you were
seeing some delay in getting everything back to normal in the heal info
output.
- You did not hit the split-brain situation. In split-brain all the bricks
will be blaming the other bricks. But in your case the third brick was not
blamed by any other brick.
- It was not able to heal the file because arbiter can not be source for
data heal. The other two data bricks were blaming each other, so heal was
not able to decide on the source.
  This is arbiter becoming source for data heal issue. We are working on
the fix for this, and it will be shipped with the next release.
- Since it was not in split brain, you were not able see this in heal info
split-brain and not able to resolve this using the cli for split-brain
resolution.
- You can use the heal command to perform syncing of data after brick
maintenance. Once the brick comes up any ways the heal will be triggered
automatically.
- You can use the heal info command to monitor the status of heal.

Regards,
Karthik

On Fri, Dec 22, 2017 at 6:01 PM, Henrik Juul Pedersen  wrote:

> Hi Karthik,
>
> Thanks for the info. Maybe the documentation should be updated to
> explain the different AFR versions, I know I was confused.
>
> Also, when looking at the changelogs from my three bricks before fixing:
>
> Brick 1:
> trusted.afr.virt_images-client-1=0x0228
> trusted.afr.virt_images-client-3=0x
>
> Brick 2:
> trusted.afr.virt_images-client-2=0x03ef
> trusted.afr.virt_images-client-3=0x
>
> Brick 3 (arbiter):
> trusted.afr.virt_images-client-1=0x0228
>
> I would think that the changelog for client 1 should win by majority
> vote? Or how does the self-healing process work?
> I assumed this as the correct version, and reset client 2 on brick 2:
> # setfattr -n trusted.afr.virt_images-client-2 -v
> 0x fedora27.qcow2
>
> I then did a directory listing, which might have started a heal, but
> heal statistics show (i also did a full heal):
> Starting time of crawl: Fri Dec 22 11:34:47 2017
>
> Ending time of crawl: Fri Dec 22 11:34:47 2017
>
> Type of crawl: INDEX
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 1
>
> Starting time of crawl: Fri Dec 22 11:39:29 2017
>
> Ending time of crawl: Fri Dec 22 11:39:29 2017
>
> Type of crawl: FULL
> No. of entries healed: 0
> No. of entries in split-brain: 0
> No. of heal failed entries: 1
>
> I was immediately able to touch the file, so gluster was okay about
> it, however heal info still showed the file for a while:
> # gluster volume heal virt_images info
> Brick virt3:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
>
> Brick virt2:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
>
> Brick printserver:/data/virt_images/brick
> /fedora27.qcow2
> Status: Connected
> Number of entries: 1
>
>
>
> Now heal info shows 0 entries, and the two data bricks have the same
> md5sum, so it's back in sync.
>
>
>
> I have a few questions after all of this:
>
> 1) How can a split brain happen in a replica 3 arbiter 1 setup with
> both server- and client quorum enabled?
> 2) Why was it not able to self heal, when tro bricks seemed in sync
> with their changelogs?
> 3) Why could I not see the file in heal info split-brain?
> 4) Why could I not fix this through the cli split-brain resolution tool?
> 5) Is it possible to force a sync in a volume? Or maybe test sync
> status? It might be smart to be able to "flush" changes when taking a
> brick down for maintenance.
> 6) How am I supposed to monitor events like this? I have a gluster
> volume with ~500.000 files, I need to be able to guarantee data
> integrity and availability to the users.
> 7) Is glusterfs "production ready"? Because I find it hard to monitor
> and thus trust in these setups. Also performance with small / many
> files seems horrible at best - but that's for another discussion.
>
> Thanks for all of your help, Ill continue to try and tweak some
> performance out of this. :)
>
> Best regards,
> Henrik Juul Pedersen
> LIAB ApS
>
> On 22 December 2017 at 07:26, Karthik Subrahmanya 
> wrote:
> > Hi Henrik,
> >
> > Thanks for providing the required outputs. See my replies inline.
> >
> > On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen 
> wrote:
> >>
> >> Hi Karthik and Ben,
> >>
> >> I'll try and reply to you inline.
> >>
> >> On 21 December 2017 at 07:18, Karthik Subrahmanya 
> >> wrote:
> >> > Hey,
> >> >
> >> > Can you give us the volume info output for this volume?
> >>
> >> # gluster volume info virt_images
> >>
> >> Volume Name: virt_images
> >> Type: Replicate
> >> Volume ID: