Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

Carl Sirotic Thu, 29 Aug 2019 13:03:11 -0700

Yes,

this makes alot of sense.


It's the behavior that I was experiencing that makes no sense.

When one node was shut down, the whole VM cluster locked up.

However, I managed to find that the culprit were the quorum settings.

I put the quorum at 2 bricks for quorum now, and I am not experiencingthe problem anymore.


All my vm boot disks and data disks are now sharded.

We are on 10gbit networks, when the node comes backs, we do not see anylatency really.



Carl


On 2019-08-29 3:58 p.m., Darrell Budic wrote:

You may be mis-understanding the way the gluster system works indetail here, but you’ve got the right idea overall. Since gluster ismaintaining 3 copies of your data, you can lose a drive or a wholesystem and things will keep going without interruption (well, mostly,if a host node was using the system that just died, it may pausebriefly before re-connecting to one that is still running via abackup-server setting or your dns configs). While the system is stillgoing with one node down, that node is falling behind and new diskwrites, and the remaining ones are keeping track of what’s changing.Once you repair/recover/reboot the down node, it will rejoin thecluster. Now the recovered system has to catch up, and it does this byhaving the other two nodes send it the changes. In the meantime,gluster is serving any reads for that data from one of the up to datenodes, even if you ask the one you just restarted. In order to do thishealing, it had to lock the files to ensure no changes are made whileit copies a chunk of them over the recovered node. When it locks them,your hypervisor notices they have gone read-only, and especially if ithas a pending write for that file, may pause the VM because this lookslike a storage issue to it. Once the file gets unlocked, it can bewritten again, and your hypervisor notices and will generallyreactivate your VM. You may see delays too, especially if you onlyhave 1G networking between your host nodes while everything is gettingcopied around. And your files could be being locked, updated,unlocked, locked again a few seconds or minutes later, etc.
That’s where sharding comes into play, once you have a file broken upinto shards, gluster can get away with only locking the particularshard it needs to heal, and leaving the whole disk image unlocked. Youmay still catch a brief pause if you try and write the specificsegment of the file gluster is healing at the moment, but it’s alsogoing to be much faster because it’s a small chuck of the file, andcopies quickly.
Also, check outhttps://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/,you probably want to set cluster.server-quorum-ratio to 50 for areplica-3 setup to avoid the possibility of split-brains. Your clusterwill go write only if it loses two nodes though, but you can alwaysmake a change to the server-quorum-ratio later if you need to keep itrunning temporarily.
Hope that makes sense of what’s going on for you,

  -Darrell
On Aug 23, 2019, at 5:06 PM, Carl Sirotic<csiro...@evoqarchitecture.com<mailto:csiro...@evoqarchitecture.com>> wrote:
Okay,
so it means, at least I am not getting the expected behavior andthere is hope.
I put the quorum settings that I was told a couple of emails ago.

After applying virt group, they are

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type server
cluster.server-quorum-ratio 0
cluster.quorum-reads no

Also,

I just put the ping timeout to 5 seconds now.


Carl

On 2019-08-23 5:45 p.m., Ingo Fischer wrote:
Hi Carl,
In my understanding and experience (I have a replica 3 Systemrunning too) this should not happen. Can you tell your client andserver quorum settings?
Ingo
Am 23.08.2019 um 15:53 schrieb Carl Sirotic<csiro...@evoqarchitecture.com <mailto:csiro...@evoqarchitecture.com>>:
However,

I must have misunderstood the whole concept of gluster.
In a replica 3, for me, it's completely unacceptable, regardless ofthe options, that all my VMs go down when I reboot one node.
The whole purpose of having a full 3 copy of my data on the fly issuposed to be this.
I am in the process of sharding every file.
But even if the healing time would be longer, I would still expecta non-sharded replica 3 brick with vm boot disk, to not go down ifI reboot one of its copy.
I am not very impressed by gluster so far.

Carl

On 2019-08-19 4:15 p.m., Darrell Budic wrote:
/var/lib/glusterd/groups/virt is a good start for ideas, notablysome thread settings and choose-local=off to improve readperformance. If you don’t have at least 10 cores on your servers,you may want to lower the recommended shd-max-threads=8 to no morethan half your CPU cores to keep healing from swamping out regularwork.
It’s also starting to depend on what your backing store andnetworking setup are, so you’re going to want to test changes andfind what works best for your setup.
In addition to the virt group settings, I use these on most of myvolumes, SSD or HDD backed, with the default 64M shard size:
performance.io <http://performance.io/>-thread-count: 32# seemedgood for my system, particularly a ZFS backed volume with lots ofspindles
client.event-threads: 8
cluster.data-self-heal-algorithm: full# 10G networking, uses morenet/less cpu to heal. probably don’t use this for 1G networking?
performance.stat-prefetch: on
cluster.read-hash-mode: 3# distribute reads to least loaded server(by read queue depth)
and these two only on my HDD backed volume:

performance.cache-size: 1G
performance.write-behind-window-size: 64MB
but I suspect these two need another round or six of tuning totell if they are making a difference.
I use the throughput-performance tuned profile on my servers, soyou should be in good shape there.
On Aug 19, 2019, at 12:22 PM, Guy Boisvert<guy.boisv...@ingtegration.com<mailto:guy.boisv...@ingtegration.com>> wrote:
On 2019-08-19 12:08 p.m., Darrell Budic wrote:
You also need to make sure your volume is setup properly forbest performance. Did you apply the gluster virt group to yourvolumes, or at least features.shard = on on your VM volume?
That's what we did here:


gluster volume set W2K16_Rhenium cluster.quorum-type auto
gluster volume set W2K16_Rhenium network.ping-timeout 10
gluster volume set W2K16_Rhenium auth.allow \*
gluster volume set W2K16_Rhenium group virt
gluster volume set W2K16_Rhenium storage.owner-uid 36
gluster volume set W2K16_Rhenium storage.owner-gid 36
gluster volume set W2K16_Rhenium features.shard on
gluster volume set W2K16_Rhenium features.shard-block-size 256MB
gluster volume set W2K16_Rhenium cluster.data-self-heal-algorithmfull
gluster volume set W2K16_Rhenium performance.low-prio-threads 32

tuned-adm profile random-io (a profile i added in CentOS 7)


cat /usr/lib/tuned/random-io/tuned.conf
===========================================
[main]
summary=Optimize for Gluster virtual machine storage
include=throughput-performance

[sysctl]

vm.dirty_ratio = 5
vm.dirty_background_ratio = 2


Any more optimization to add to this?


Guy

--
Guy Boisvert, ing.
IngTegration inc.
http://www.ingtegration.com <http://www.ingtegration.com/>
https://www.linkedin.com/in/guy-boisvert-8990487

AVIS DE CONFIDENTIALITE : ce message peut contenir des
renseignements confidentiels appartenant exclusivement a
IngTegration Inc. ou a ses filiales. Si vous n'etes pas
le destinataire indique ou prevu dans ce  message (ou
responsable de livrer ce message a la personne indiquee ou
prevue) ou si vous pensez que ce message vous a ete adresse
par erreur, vous ne pouvez pas utiliser ou reproduire ce
message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous
devez le detruire et vous etes prie d'avertir l'expediteur
en repondant au courriel.

CONFIDENTIALITY NOTICE : Proprietary/Confidential Information
belonging to IngTegration Inc. and its affiliates may be
contained in this message. If you are not a recipient
indicated or intended in this message (or responsible for
delivery of this message to such person), or you think for
any reason that this message may have been addressed to you
in error, you may not use or copy or deliver this message to
anyone else. In such case, you should destroy this message
and are asked to notify the sender by reply email.
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

Reply via email to