Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
Hi Karthik, Thanks for the info. Maybe the documentation should be updated to explain the different AFR versions, I know I was confused. Also, when looking at the changelogs from my three bricks before fixing: Brick 1: trusted.afr.virt_images-client-1=0x0228 trusted.afr.virt_images-client-3=0x Brick 2: trusted.afr.virt_images-client-2=0x03ef trusted.afr.virt_images-client-3=0x Brick 3 (arbiter): trusted.afr.virt_images-client-1=0x0228 I would think that the changelog for client 1 should win by majority vote? Or how does the self-healing process work? I assumed this as the correct version, and reset client 2 on brick 2: # setfattr -n trusted.afr.virt_images-client-2 -v 0x fedora27.qcow2 I then did a directory listing, which might have started a heal, but heal statistics show (i also did a full heal): Starting time of crawl: Fri Dec 22 11:34:47 2017 Ending time of crawl: Fri Dec 22 11:34:47 2017 Type of crawl: INDEX No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 1 Starting time of crawl: Fri Dec 22 11:39:29 2017 Ending time of crawl: Fri Dec 22 11:39:29 2017 Type of crawl: FULL No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 1 I was immediately able to touch the file, so gluster was okay about it, however heal info still showed the file for a while: # gluster volume heal virt_images info Brick virt3:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick virt2:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Brick printserver:/data/virt_images/brick /fedora27.qcow2 Status: Connected Number of entries: 1 Now heal info shows 0 entries, and the two data bricks have the same md5sum, so it's back in sync. I have a few questions after all of this: 1) How can a split brain happen in a replica 3 arbiter 1 setup with both server- and client quorum enabled? 2) Why was it not able to self heal, when tro bricks seemed in sync with their changelogs? 3) Why could I not see the file in heal info split-brain? 4) Why could I not fix this through the cli split-brain resolution tool? 5) Is it possible to force a sync in a volume? Or maybe test sync status? It might be smart to be able to "flush" changes when taking a brick down for maintenance. 6) How am I supposed to monitor events like this? I have a gluster volume with ~500.000 files, I need to be able to guarantee data integrity and availability to the users. 7) Is glusterfs "production ready"? Because I find it hard to monitor and thus trust in these setups. Also performance with small / many files seems horrible at best - but that's for another discussion. Thanks for all of your help, Ill continue to try and tweak some performance out of this. :) Best regards, Henrik Juul Pedersen LIAB ApS On 22 December 2017 at 07:26, Karthik Subrahmanyawrote: > Hi Henrik, > > Thanks for providing the required outputs. See my replies inline. > > On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen wrote: >> >> Hi Karthik and Ben, >> >> I'll try and reply to you inline. >> >> On 21 December 2017 at 07:18, Karthik Subrahmanya >> wrote: >> > Hey, >> > >> > Can you give us the volume info output for this volume? >> >> # gluster volume info virt_images >> >> Volume Name: virt_images >> Type: Replicate >> Volume ID: 9f3c8273-4d9d-4af2-a4e7-4cb4a51e3594 >> Status: Started >> Snapshot Count: 2 >> Number of Bricks: 1 x (2 + 1) = 3 >> Transport-type: tcp >> Bricks: >> Brick1: virt3:/data/virt_images/brick >> Brick2: virt2:/data/virt_images/brick >> Brick3: printserver:/data/virt_images/brick (arbiter) >> Options Reconfigured: >> features.quota-deem-statfs: on >> features.inode-quota: on >> features.quota: on >> features.barrier: disable >> features.scrub: Active >> features.bitrot: on >> nfs.rpc-auth-allow: on >> server.allow-insecure: on >> user.cifs: off >> features.shard: off >> cluster.shd-wait-qlength: 1 >> cluster.locking-scheme: granular >> cluster.data-self-heal-algorithm: full >> cluster.server-quorum-type: server >> cluster.quorum-type: auto >> cluster.eager-lock: enable >> network.remote-dio: enable >> performance.low-prio-threads: 32 >> performance.io-cache: off >> performance.read-ahead: off >> performance.quick-read: off >> nfs.disable: on >> transport.address-family: inet >> server.outstanding-rpc-limit: 512 >> >> > Why are you not able to get the xattrs from arbiter brick? It is the >> > same >> > way as you do it on data bricks. >> >> Yes I must have confused myself yesterday somehow, here it is in full >> from all three bricks: >> >> Brick 1 (virt2): # getfattr -d -m . -e hex fedora27.qcow2 >> # file: fedora27.qcow2 >> trusted.afr.dirty=0x >> trusted.afr.virt_images-client-1=0x0228
[Gluster-users] Exact purpose of network.ping-timeout
Hi, I have a question regarding the "ping-timeout" option. I have been researching its purpose for a few days and it is not completely clear to me. Especially that it is apparently strongly encouraged by the Gluster community not to change or at least decrease this value! Assuming that I set ping-timeout to 10 seconds (instead of the default 42) this would mean that if I have a network outage of 11 seconds then Gluster internally would have to re-allocate some resources that it freed after the 10 seconds, correct? But apart from that there are no negative implications, are there? For instance if I'm copying files during the network outage then those files will continue copying after those 11 seconds. This means that the only purpose of ping-timeout is to save those extra resources that are used by "short" network outages. Is that correct? If I am confident that my network will not have many 11 second outages and if they do occur I am willing to incur those extra costs due to resource allocation is there any reason not to set ping-timeout to 10 seconds? The problem I have with a long ping-timeout is that the Windows Samba Client disconnects after 25 seconds. So if one of the nodes of a Gluster cluster shuts down ungracefully then the Samba Client disconnects and the file that was being copied is incomplete on the server. These "costs" seem to be much higher than the potential costs of those Gluster resource re-allocations. But it is hard to estimate because there is not clear documentation what exactly those Gluster costs are. In general I would be very interested in a comprehensive explanation of ping-timeout and the up- and downsides of setting high or low values for it. Kinds regards, Omar ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Announcing Glustered 2018 in Bologna (IT)
We are happy to announce that Glustered 2018, a Gluster community meeting, will take place on March 8th 2018 in Bologna (Italy), back-to-back with Incontro Devops Italia (http://2018.incontrodevops.it) and in the same venue as the main event. http://www.incontrodevops.it/events/glustered-2018/ The tentative schedule will have a (confirmed) keynote by Niels de Vos, plus technical talks, use cases presentations and/or community space. Call for Papers is now open: please send proposals to i...@biodec.com. Bologna is well connected, by cheap direct flights, to most of the major European airports, thus there is the potential to grow above being a local event and to have a nice European meetup of the community. Please help making the event a success by submitting proposals, it is also on you... Tickets are free, but registration will be required (limited room). Merry christmas and happy new year . Ivan ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster replicate 3 arbiter 1 in split brain. gluster cli seems unaware
Hey Henrik, Good to know that the issue got resolved. I will try to answer some of the questions you have. - The time taken to heal the file depends on its size. That's why you were seeing some delay in getting everything back to normal in the heal info output. - You did not hit the split-brain situation. In split-brain all the bricks will be blaming the other bricks. But in your case the third brick was not blamed by any other brick. - It was not able to heal the file because arbiter can not be source for data heal. The other two data bricks were blaming each other, so heal was not able to decide on the source. This is arbiter becoming source for data heal issue. We are working on the fix for this, and it will be shipped with the next release. - Since it was not in split brain, you were not able see this in heal info split-brain and not able to resolve this using the cli for split-brain resolution. - You can use the heal command to perform syncing of data after brick maintenance. Once the brick comes up any ways the heal will be triggered automatically. - You can use the heal info command to monitor the status of heal. Regards, Karthik On Fri, Dec 22, 2017 at 6:01 PM, Henrik Juul Pedersenwrote: > Hi Karthik, > > Thanks for the info. Maybe the documentation should be updated to > explain the different AFR versions, I know I was confused. > > Also, when looking at the changelogs from my three bricks before fixing: > > Brick 1: > trusted.afr.virt_images-client-1=0x0228 > trusted.afr.virt_images-client-3=0x > > Brick 2: > trusted.afr.virt_images-client-2=0x03ef > trusted.afr.virt_images-client-3=0x > > Brick 3 (arbiter): > trusted.afr.virt_images-client-1=0x0228 > > I would think that the changelog for client 1 should win by majority > vote? Or how does the self-healing process work? > I assumed this as the correct version, and reset client 2 on brick 2: > # setfattr -n trusted.afr.virt_images-client-2 -v > 0x fedora27.qcow2 > > I then did a directory listing, which might have started a heal, but > heal statistics show (i also did a full heal): > Starting time of crawl: Fri Dec 22 11:34:47 2017 > > Ending time of crawl: Fri Dec 22 11:34:47 2017 > > Type of crawl: INDEX > No. of entries healed: 0 > No. of entries in split-brain: 0 > No. of heal failed entries: 1 > > Starting time of crawl: Fri Dec 22 11:39:29 2017 > > Ending time of crawl: Fri Dec 22 11:39:29 2017 > > Type of crawl: FULL > No. of entries healed: 0 > No. of entries in split-brain: 0 > No. of heal failed entries: 1 > > I was immediately able to touch the file, so gluster was okay about > it, however heal info still showed the file for a while: > # gluster volume heal virt_images info > Brick virt3:/data/virt_images/brick > /fedora27.qcow2 > Status: Connected > Number of entries: 1 > > Brick virt2:/data/virt_images/brick > /fedora27.qcow2 > Status: Connected > Number of entries: 1 > > Brick printserver:/data/virt_images/brick > /fedora27.qcow2 > Status: Connected > Number of entries: 1 > > > > Now heal info shows 0 entries, and the two data bricks have the same > md5sum, so it's back in sync. > > > > I have a few questions after all of this: > > 1) How can a split brain happen in a replica 3 arbiter 1 setup with > both server- and client quorum enabled? > 2) Why was it not able to self heal, when tro bricks seemed in sync > with their changelogs? > 3) Why could I not see the file in heal info split-brain? > 4) Why could I not fix this through the cli split-brain resolution tool? > 5) Is it possible to force a sync in a volume? Or maybe test sync > status? It might be smart to be able to "flush" changes when taking a > brick down for maintenance. > 6) How am I supposed to monitor events like this? I have a gluster > volume with ~500.000 files, I need to be able to guarantee data > integrity and availability to the users. > 7) Is glusterfs "production ready"? Because I find it hard to monitor > and thus trust in these setups. Also performance with small / many > files seems horrible at best - but that's for another discussion. > > Thanks for all of your help, Ill continue to try and tweak some > performance out of this. :) > > Best regards, > Henrik Juul Pedersen > LIAB ApS > > On 22 December 2017 at 07:26, Karthik Subrahmanya > wrote: > > Hi Henrik, > > > > Thanks for providing the required outputs. See my replies inline. > > > > On Thu, Dec 21, 2017 at 10:42 PM, Henrik Juul Pedersen > wrote: > >> > >> Hi Karthik and Ben, > >> > >> I'll try and reply to you inline. > >> > >> On 21 December 2017 at 07:18, Karthik Subrahmanya > >> wrote: > >> > Hey, > >> > > >> > Can you give us the volume info output for this volume? > >> > >> # gluster volume info virt_images > >> > >> Volume Name: virt_images > >> Type: Replicate > >> Volume ID: