Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-29 Thread Carl Sirotic

Yes,

this makes alot of sense.

It's the behavior that I was experiencing that makes no sense.

When one node was shut down, the whole VM cluster locked up.

However, I managed to find that the culprit were the quorum settings.

I put the quorum at 2 bricks for quorum now, and I am not experiencing 
the problem anymore.


All my vm boot disks and data disks are now sharded.

We are on 10gbit networks, when the node comes backs, we do not see any 
latency really.



Carl


On 2019-08-29 3:58 p.m., Darrell Budic wrote:
You may be mis-understanding the way the gluster system works in 
detail here, but you’ve got the right idea overall. Since gluster is 
maintaining 3 copies of your data, you can lose a drive or a whole 
system and things will keep going without interruption (well, mostly, 
if a host node was using the system that just died, it may pause 
briefly before re-connecting to one that is still running via a 
backup-server setting or your dns configs). While the system is still 
going with one node down, that node is falling behind and new disk 
writes, and the remaining ones are keeping track of what’s changing. 
Once you repair/recover/reboot the down node, it will rejoin the 
cluster. Now the recovered system has to catch up, and it does this by 
having the other two nodes send it the changes. In the meantime, 
gluster is serving any reads for that data from one of the up to date 
nodes, even if you ask the one you just restarted. In order to do this 
healing, it had to lock the files to ensure no changes are made while 
it copies a chunk of them over the recovered node. When it locks them, 
your hypervisor notices they have gone read-only, and especially if it 
has a pending write for that file, may pause the VM because this looks 
like a storage issue to it. Once the file gets unlocked, it can be 
written again, and your hypervisor notices and will generally 
reactivate your VM. You may see delays too, especially if you only 
have 1G networking between your host nodes while everything is getting 
copied around. And your files could be being locked, updated, 
unlocked, locked again a few seconds or minutes later, etc.


That’s where sharding comes into play, once you have a file broken up 
into shards, gluster can get away with only locking the particular 
shard it needs to heal, and leaving the whole disk image unlocked. You 
may still catch a brief pause if you try and write the specific 
segment of the file gluster is healing at the moment, but it’s also 
going to be much faster because it’s a small chuck of the file, and 
copies quickly.


Also, check out 
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/, 
you probably want to set cluster.server-quorum-ratio to 50 for a 
replica-3 setup to avoid the possibility of split-brains. Your cluster 
will go write only if it loses two nodes though, but you can always 
make a change to the server-quorum-ratio later if you need to keep it 
running temporarily.


Hope that makes sense of what’s going on for you,

  -Darrell

On Aug 23, 2019, at 5:06 PM, Carl Sirotic 
> wrote:


Okay,

so it means, at least I am not getting the expected behavior and 
there is hope.


I put the quorum settings that I was told a couple of emails ago.

After applying virt group, they are

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type server
cluster.server-quorum-ratio 0
cluster.quorum-reads no

Also,

I just put the ping timeout to 5 seconds now.


Carl

On 2019-08-23 5:45 p.m., Ingo Fischer wrote:

Hi Carl,

In my understanding and experience (I have a replica 3 System 
running too) this should not happen. Can you tell your client and 
server quorum settings?


Ingo

Am 23.08.2019 um 15:53 schrieb Carl Sirotic 
mailto:csiro...@evoqarchitecture.com>>:



However,

I must have misunderstood the whole concept of gluster.

In a replica 3, for me, it's completely unacceptable, regardless of 
the options, that all my VMs go down when I reboot one node.


The whole purpose of having a full 3 copy of my data on the fly is 
suposed to be this.


I am in the process of sharding every file.

But even if the healing time would be longer, I would still expect 
a non-sharded replica 3 brick with vm boot disk, to not go down if 
I reboot one of its copy.



I am not very impressed by gluster so far.

Carl

On 2019-08-19 4:15 p.m., Darrell Budic wrote:
/var/lib/glusterd/groups/virt is a good start for ideas, notably 
some thread settings and choose-local=off to improve read 
performance. If you don’t have at least 10 cores on your servers, 
you may want to lower the recommended shd-max-threads=8 to no more 
than half your CPU cores to keep healing from swamping out regular 
work.


It’s also starting to depend on what your backing store and 
networking setup are, so you’re going to want to test changes and 
find what works best for your setup.


In addition to the virt group set

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-29 Thread Darrell Budic
You may be mis-understanding the way the gluster system works in detail here, 
but you’ve got the right idea overall. Since gluster is maintaining 3 copies of 
your data, you can lose a drive or a whole system and things will keep going 
without interruption (well, mostly, if a host node was using the system that 
just died, it may pause briefly before re-connecting to one that is still 
running via a backup-server setting or your dns configs). While the system is 
still going with one node down, that node is falling behind and new disk 
writes, and the remaining ones are keeping track of what’s changing. Once you 
repair/recover/reboot the down node, it will rejoin the cluster. Now the 
recovered system has to catch up, and it does this by having the other two 
nodes send it the changes. In the meantime, gluster is serving any reads for 
that data from one of the up to date nodes, even if you ask the one you just 
restarted. In order to do this healing, it had to lock the files to ensure no 
changes are made while it copies a chunk of them over the recovered node. When 
it locks them, your hypervisor notices they have gone read-only, and especially 
if it has a pending write for that file, may pause the VM because this looks 
like a storage issue to it. Once the file gets unlocked, it can be written 
again, and your hypervisor notices and will generally reactivate your VM. You 
may see delays too, especially if you only have 1G networking between your host 
nodes while everything is getting copied around. And your files could be being 
locked, updated, unlocked, locked again a few seconds or minutes later, etc.

That’s where sharding comes into play, once you have a file broken up into 
shards, gluster can get away with only locking the particular shard it needs to 
heal, and leaving the whole disk image unlocked. You may still catch a brief 
pause if you try and write the specific segment of the file gluster is healing 
at the moment, but it’s also going to be much faster because it’s a small chuck 
of the file, and copies quickly.

Also, check out 
https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/server-quorum/
 
,
 you probably want to set cluster.server-quorum-ratio to 50 for a replica-3 
setup to avoid the possibility of split-brains. Your cluster will go write only 
if it loses two nodes though, but you can always make a change to the 
server-quorum-ratio later if you need to keep it running temporarily.

Hope that makes sense of what’s going on for you,

  -Darrell

> On Aug 23, 2019, at 5:06 PM, Carl Sirotic  
> wrote:
> 
> Okay,
> 
> so it means, at least I am not getting the expected behavior and there is 
> hope.
> 
> I put the quorum settings that I was told a couple of emails ago.
> 
> After applying virt group, they are
> 
> cluster.quorum-type auto  
>   
> cluster.quorum-count(null)
>   
> cluster.server-quorum-type  server
>   
> cluster.server-quorum-ratio 0 
>   
> cluster.quorum-readsno
>   
> 
> 
> Also,
> 
> I just put the ping timeout to 5 seconds now.
> 
> 
> Carl
> 
> On 2019-08-23 5:45 p.m., Ingo Fischer wrote:
>> Hi Carl,
>> 
>> In my understanding and experience (I have a replica 3 System running too) 
>> this should not happen. Can you tell your client and server quorum settings?
>> 
>> Ingo
>> 
>> Am 23.08.2019 um 15:53 schrieb Carl Sirotic > >:
>> 
>>> However,
>>> 
>>> I must have misunderstood the whole concept of gluster.
>>> 
>>> In a replica 3, for me, it's completely unacceptable, regardless of the 
>>> options, that all my VMs go down when I reboot one node.
>>> 
>>> The whole purpose of having a full 3 copy of my data on the fly is suposed 
>>> to be this.
>>> 
>>> I am in the process of sharding every file.
>>> 
>>> But even if the healing time would be longer, I would still expect a 
>>> non-sharded replica 3 brick with vm boot disk, to not go down if I reboot 
>>> one of its copy.
>>> 
>>> 
>>> 
>>> I am not very impressed by gluster so far.
>>> 
>>> Carl
>>> 
>>> On 2019-08-19 4:15 p.m., Darrell Budic wrote:
 /var/lib/glusterd/groups/virt is a good start for ideas, notably some 
 thread settings and choose-local=off to improve read performance. If you 
 don’t have at least 10 cores on your servers, you may want to lower the 
 recommended shd-max-threads=8 to no more than half your CPU cores to keep 
 healing from swamping out regular work.
 
 It’s also starting to depend on what your backing store and networking 
 setup are, so you’re going to want to test changes and find what works 
 best for your setup.
 
 In addition to 

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-23 Thread Carl Sirotic
-- Forwarded message --From: Carl Sirotic Date: Aug. 23, 2019 7:00 p.m.Subject: Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashesTo: Joe Julian Cc: 
AVIS DE CONFIDENTIALITÉ : Ce courriel peut contenir de l'information privilégiée et confidentielle. Nous vous demandons de le détruire immédiatement si vous n'êtes pas le destinataire.
CONFIDENTIALITY NOTICE: This email may contain information that is privileged and confidential. Please delete immediately if you are not the intended recipient.
--- Begin Message ---
<<< text/html; charset=utf-8: Unrecognized >>>
--- End Message ---
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-23 Thread Carl Sirotic

Okay,

so it means, at least I am not getting the expected behavior and there 
is hope.


I put the quorum settings that I was told a couple of emails ago.

After applying virt group, they are

cluster.quorum-type auto
cluster.quorum-count (null)
cluster.server-quorum-type server
cluster.server-quorum-ratio 0
cluster.quorum-reads no

Also,

I just put the ping timeout to 5 seconds now.


Carl

On 2019-08-23 5:45 p.m., Ingo Fischer wrote:

Hi Carl,

In my understanding and experience (I have a replica 3 System running 
too) this should not happen. Can you tell your client and server 
quorum settings?


Ingo

Am 23.08.2019 um 15:53 schrieb Carl Sirotic 
mailto:csiro...@evoqarchitecture.com>>:



However,

I must have misunderstood the whole concept of gluster.

In a replica 3, for me, it's completely unacceptable, regardless of 
the options, that all my VMs go down when I reboot one node.


The whole purpose of having a full 3 copy of my data on the fly is 
suposed to be this.


I am in the process of sharding every file.

But even if the healing time would be longer, I would still expect a 
non-sharded replica 3 brick with vm boot disk, to not go down if I 
reboot one of its copy.



I am not very impressed by gluster so far.

Carl

On 2019-08-19 4:15 p.m., Darrell Budic wrote:
/var/lib/glusterd/groups/virt is a good start for ideas, notably 
some thread settings and choose-local=off to improve read 
performance. If you don’t have at least 10 cores on your servers, 
you may want to lower the recommended shd-max-threads=8 to no more 
than half your CPU cores to keep healing from swamping out regular 
work.


It’s also starting to depend on what your backing store and 
networking setup are, so you’re going to want to test changes and 
find what works best for your setup.


In addition to the virt group settings, I use these on most of my 
volumes, SSD or HDD backed, with the default 64M shard size:


performance.io -thread-count: 32# seemed good 
for my system, particularly a ZFS backed volume with lots of spindles

client.event-threads: 8
cluster.data-self-heal-algorithm: full# 10G networking, uses more 
net/less cpu to heal. probably don’t use this for 1G networking?

performance.stat-prefetch: on
cluster.read-hash-mode: 3# distribute reads to least loaded server 
(by read queue depth)


and these two only on my HDD backed volume:

performance.cache-size: 1G
performance.write-behind-window-size: 64MB

but I suspect these two need another round or six of tuning to tell 
if they are making a difference.


I use the throughput-performance tuned profile on my servers, so you 
should be in good shape there.


On Aug 19, 2019, at 12:22 PM, Guy Boisvert 
> wrote:


On 2019-08-19 12:08 p.m., Darrell Budic wrote:
You also need to make sure your volume is setup properly for best 
performance. Did you apply the gluster virt group to your volumes, 
or at least features.shard = on on your VM volume?


That's what we did here:


gluster volume set W2K16_Rhenium cluster.quorum-type auto
gluster volume set W2K16_Rhenium network.ping-timeout 10
gluster volume set W2K16_Rhenium auth.allow \*
gluster volume set W2K16_Rhenium group virt
gluster volume set W2K16_Rhenium storage.owner-uid 36
gluster volume set W2K16_Rhenium storage.owner-gid 36
gluster volume set W2K16_Rhenium features.shard on
gluster volume set W2K16_Rhenium features.shard-block-size 256MB
gluster volume set W2K16_Rhenium cluster.data-self-heal-algorithm full
gluster volume set W2K16_Rhenium performance.low-prio-threads 32

tuned-adm profile random-io        (a profile i added in CentOS 7)


cat /usr/lib/tuned/random-io/tuned.conf
===
[main]
summary=Optimize for Gluster virtual machine storage
include=throughput-performance

[sysctl]

vm.dirty_ratio = 5
vm.dirty_background_ratio = 2


Any more optimization to add to this?


Guy

--
Guy Boisvert, ing.
IngTegration inc.
http://www.ingtegration.com
https://www.linkedin.com/in/guy-boisvert-8990487

AVIS DE CONFIDENTIALITE : ce message peut contenir des
renseignements confidentiels appartenant exclusivement a
IngTegration Inc. ou a ses filiales. Si vous n'etes pas
le destinataire indique ou prevu dans ce  message (ou
responsable de livrer ce message a la personne indiquee ou
prevue) ou si vous pensez que ce message vous a ete adresse
par erreur, vous ne pouvez pas utiliser ou reproduire ce
message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous
devez le detruire et vous etes prie d'avertir l'expediteur
en repondant au courriel.

CONFIDENTIALITY NOTICE : Proprietary/Confidential Information
belonging to IngTegration Inc. and its affiliates may be
contained in this message. If you are not a recipient
indicated or intended in this message (or responsible for
delivery of this message to such person), or you think for
any reason that this message may have been addressed to you
in error, you may not use or copy or d

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-23 Thread Carl Sirotic

However,

I must have misunderstood the whole concept of gluster.

In a replica 3, for me, it's completely unacceptable, regardless of the 
options, that all my VMs go down when I reboot one node.


The whole purpose of having a full 3 copy of my data on the fly is 
suposed to be this.


I am in the process of sharding every file.

But even if the healing time would be longer, I would still expect a 
non-sharded replica 3 brick with vm boot disk, to not go down if I 
reboot one of its copy.



I am not very impressed by gluster so far.

Carl

On 2019-08-19 4:15 p.m., Darrell Budic wrote:
/var/lib/glusterd/groups/virt is a good start for ideas, notably some 
thread settings and choose-local=off to improve read performance. If 
you don’t have at least 10 cores on your servers, you may want to 
lower the recommended shd-max-threads=8 to no more than half your CPU 
cores to keep healing from swamping out regular work.


It’s also starting to depend on what your backing store and networking 
setup are, so you’re going to want to test changes and find what works 
best for your setup.


In addition to the virt group settings, I use these on most of my 
volumes, SSD or HDD backed, with the default 64M shard size:


performance.io -thread-count: 32# seemed good 
for my system, particularly a ZFS backed volume with lots of spindles

client.event-threads: 8
cluster.data-self-heal-algorithm: full# 10G networking, uses more 
net/less cpu to heal. probably don’t use this for 1G networking?

performance.stat-prefetch: on
cluster.read-hash-mode: 3# distribute reads to least loaded server (by 
read queue depth)


and these two only on my HDD backed volume:

performance.cache-size: 1G
performance.write-behind-window-size: 64MB

but I suspect these two need another round or six of tuning to tell if 
they are making a difference.


I use the throughput-performance tuned profile on my servers, so you 
should be in good shape there.


On Aug 19, 2019, at 12:22 PM, Guy Boisvert 
> wrote:


On 2019-08-19 12:08 p.m., Darrell Budic wrote:
You also need to make sure your volume is setup properly for best 
performance. Did you apply the gluster virt group to your volumes, 
or at least features.shard = on on your VM volume?


That's what we did here:


gluster volume set W2K16_Rhenium cluster.quorum-type auto
gluster volume set W2K16_Rhenium network.ping-timeout 10
gluster volume set W2K16_Rhenium auth.allow \*
gluster volume set W2K16_Rhenium group virt
gluster volume set W2K16_Rhenium storage.owner-uid 36
gluster volume set W2K16_Rhenium storage.owner-gid 36
gluster volume set W2K16_Rhenium features.shard on
gluster volume set W2K16_Rhenium features.shard-block-size 256MB
gluster volume set W2K16_Rhenium cluster.data-self-heal-algorithm full
gluster volume set W2K16_Rhenium performance.low-prio-threads 32

tuned-adm profile random-io        (a profile i added in CentOS 7)


cat /usr/lib/tuned/random-io/tuned.conf
===
[main]
summary=Optimize for Gluster virtual machine storage
include=throughput-performance

[sysctl]

vm.dirty_ratio = 5
vm.dirty_background_ratio = 2


Any more optimization to add to this?


Guy

--
Guy Boisvert, ing.
IngTegration inc.
http://www.ingtegration.com
https://www.linkedin.com/in/guy-boisvert-8990487

AVIS DE CONFIDENTIALITE : ce message peut contenir des
renseignements confidentiels appartenant exclusivement a
IngTegration Inc. ou a ses filiales. Si vous n'etes pas
le destinataire indique ou prevu dans ce  message (ou
responsable de livrer ce message a la personne indiquee ou
prevue) ou si vous pensez que ce message vous a ete adresse
par erreur, vous ne pouvez pas utiliser ou reproduire ce
message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous
devez le detruire et vous etes prie d'avertir l'expediteur
en repondant au courriel.

CONFIDENTIALITY NOTICE : Proprietary/Confidential Information
belonging to IngTegration Inc. and its affiliates may be
contained in this message. If you are not a recipient
indicated or intended in this message (or responsible for
delivery of this message to such person), or you think for
any reason that this message may have been addressed to you
in error, you may not use or copy or deliver this message to
anyone else. In such case, you should destroy this message
and are asked to notify the sender by reply email.



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-20 Thread Strahil
Yes you can start it afterwards, BUT DO NOT STOP it once enabled !
Bad things happen :D

Best Regards,
Strahil NikolovOn Aug 19, 2019 20:01, Carl Sirotic 
 wrote:
>
> No, I didn't. 
>
> I am very interested about these settings. 
>
> Also, is it possible to turn the shard feature AFTER the volume was 
> started to be used ? 
>
>
> Carl 
>
> On 2019-08-19 12:08 p.m., Darrell Budic wrote: 
> > You also need to make sure your volume is setup properly for best 
> > performance. Did you apply the gluster virt group to your volumes, or at 
> > least features.shard = on on your VM volume? 
> > 
> >> On Aug 19, 2019, at 11:05 AM, Carl Sirotic  
> >> wrote: 
> >> 
> >> Yes, I made sure there was no heal. 
> >> This is what I am suspecting thet shutting down a host isn't the right way 
> >> to go. 
> >> Hi Carl, Did you check for any pending heals before rebooting the gluster 
> >> server? Also, it was discussed that shutting down the node, does not stop 
> >> the bricks properly and thus the clients will eait for a timeout before 
> >> restoring full functionality.  You can stop your glusterd and actually all 
> >> processes by using a script in /usr/share/gluster/scripts (the path is 
> >> based on memory and could be wrong). Best Regards, Strahil NikllovOn Aug 
> >> 19, 2019 18:34, Carl Sirotic wrote: > > Hi, > > we have a replicate 3 
> >> cluster. > > 2 other servers are clients that run VM that are stored on 
> >> the Gluster > volumes. > > I had to reboot one of the brick for 
> >> maintenance. > > The whole VM setup went super slow and some of the client 
> >> crashed. > > I think there is some timeout setting for KVM/Qemu vs timeout 
> >> of > Glusterd that could fix this. > > Do anyone have an idea ? > > The 
> >> whole point of having gluster for me was to be able to shut down one > of 
> >> the host while the vm stay running. > > > Carl > > 
> >> ___ > Gluster-users mailing 
> >> list > Gluster-users@gluster.org > 
> >> https://lists.gluster.org/mailman/listinfo/gluster-users AVIS DE 
> >> CONFIDENTIALITÉ : Ce courriel peut contenir de l'information privilégiée 
> >> et confidentielle. Nous vous demandons de le détruire immédiatement si 
> >> vous n'êtes pas le destinataire. CONFIDENTIALITY NOTICE: This email may 
> >> contain information that is privileged and confidential. Please delete 
> >> immediately if you are not the intended recipient. 
> >> ___ 
> >> Gluster-users mailing list 
> >> Gluster-users@gluster.org 
> >> https://lists.gluster.org/mailman/listinfo/gluster-users 
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-19 Thread Darrell Budic
/var/lib/glusterd/groups/virt is a good start for ideas, notably some thread 
settings and choose-local=off to improve read performance. If you don’t have at 
least 10 cores on your servers, you may want to lower the recommended 
shd-max-threads=8 to no more than half your CPU cores to keep healing from 
swamping out regular work.

It’s also starting to depend on what your backing store and networking setup 
are, so you’re going to want to test changes and find what works best for your 
setup.

In addition to the virt group settings, I use these on most of my volumes, SSD 
or HDD backed, with the default 64M shard size:

performance.io-thread-count: 32 # seemed good for my system, 
particularly a ZFS backed volume with lots of spindles
client.event-threads: 8 
cluster.data-self-heal-algorithm: full  # 10G networking, uses more net/less 
cpu to heal. probably don’t use this for 1G networking?
performance.stat-prefetch: on
cluster.read-hash-mode: 3   # distribute reads to least 
loaded server (by read queue depth)

and these two only on my HDD backed volume:

performance.cache-size: 1G
performance.write-behind-window-size: 64MB

but I suspect these two need another round or six of tuning to tell if they are 
making a difference.

I use the throughput-performance tuned profile on my servers, so you should be 
in good shape there.

> On Aug 19, 2019, at 12:22 PM, Guy Boisvert  
> wrote:
> 
> On 2019-08-19 12:08 p.m., Darrell Budic wrote:
>> You also need to make sure your volume is setup properly for best 
>> performance. Did you apply the gluster virt group to your volumes, or at 
>> least features.shard = on on your VM volume?
> 
> That's what we did here:
> 
> 
> gluster volume set W2K16_Rhenium cluster.quorum-type auto
> gluster volume set W2K16_Rhenium network.ping-timeout 10
> gluster volume set W2K16_Rhenium auth.allow \*
> gluster volume set W2K16_Rhenium group virt
> gluster volume set W2K16_Rhenium storage.owner-uid 36
> gluster volume set W2K16_Rhenium storage.owner-gid 36
> gluster volume set W2K16_Rhenium features.shard on
> gluster volume set W2K16_Rhenium features.shard-block-size 256MB
> gluster volume set W2K16_Rhenium cluster.data-self-heal-algorithm full
> gluster volume set W2K16_Rhenium performance.low-prio-threads 32
> 
> tuned-adm profile random-io(a profile i added in CentOS 7)
> 
> 
> cat /usr/lib/tuned/random-io/tuned.conf
> ===
> [main]
> summary=Optimize for Gluster virtual machine storage
> include=throughput-performance
> 
> [sysctl]
> 
> vm.dirty_ratio = 5
> vm.dirty_background_ratio = 2
> 
> 
> Any more optimization to add to this?
> 
> 
> Guy
> 
> -- 
> Guy Boisvert, ing.
> IngTegration inc.
> http://www.ingtegration.com
> https://www.linkedin.com/in/guy-boisvert-8990487
> 
> AVIS DE CONFIDENTIALITE : ce message peut contenir des
> renseignements confidentiels appartenant exclusivement a
> IngTegration Inc. ou a ses filiales. Si vous n'etes pas
> le destinataire indique ou prevu dans ce  message (ou
> responsable de livrer ce message a la personne indiquee ou
> prevue) ou si vous pensez que ce message vous a ete adresse
> par erreur, vous ne pouvez pas utiliser ou reproduire ce
> message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous
> devez le detruire et vous etes prie d'avertir l'expediteur
> en repondant au courriel.
> 
> CONFIDENTIALITY NOTICE : Proprietary/Confidential Information
> belonging to IngTegration Inc. and its affiliates may be
> contained in this message. If you are not a recipient
> indicated or intended in this message (or responsible for
> delivery of this message to such person), or you think for
> any reason that this message may have been addressed to you
> in error, you may not use or copy or deliver this message to
> anyone else. In such case, you should destroy this message
> and are asked to notify the sender by reply email.
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-19 Thread Darrell Budic
You want sharding for sure, it keeps the entire disk from being locked while it 
heals. So you usually don’t notice it when you reboot a system, say.

It’s fine to enable after the fact, but existing files won’t be sharded. You 
can work around this by stopping the VM and copying the file to new location, 
then renaming it over the old version. If you’re running something that lets 
you migrate live volume, you can create a new share with sharding enabled, then 
migrate the volume.

> On Aug 19, 2019, at 12:01 PM, Carl Sirotic  
> wrote:
> 
> No, I didn't.
> 
> I am very interested about these settings.
> 
> Also, is it possible to turn the shard feature AFTER the volume was started 
> to be used ?
> 
> 
> Carl
> 
> On 2019-08-19 12:08 p.m., Darrell Budic wrote:
>> You also need to make sure your volume is setup properly for best 
>> performance. Did you apply the gluster virt group to your volumes, or at 
>> least features.shard = on on your VM volume?
>> 
>>> On Aug 19, 2019, at 11:05 AM, Carl Sirotic  
>>> wrote:
>>> 
>>> Yes, I made sure there was no heal.
>>> This is what I am suspecting thet shutting down a host isn't the right way 
>>> to go.
>>> Hi Carl, Did you check for any pending heals before rebooting the gluster 
>>> server? Also, it was discussed that shutting down the node, does not stop 
>>> the bricks properly and thus the clients will eait for a timeout before 
>>> restoring full functionality.  You can stop your glusterd and actually all 
>>> processes by using a script in /usr/share/gluster/scripts (the path is 
>>> based on memory and could be wrong). Best Regards, Strahil NikllovOn Aug 
>>> 19, 2019 18:34, Carl Sirotic wrote: > > Hi, > > we have a replicate 3 
>>> cluster. > > 2 other servers are clients that run VM that are stored on the 
>>> Gluster > volumes. > > I had to reboot one of the brick for maintenance. > 
>>> > The whole VM setup went super slow and some of the client crashed. > > I 
>>> think there is some timeout setting for KVM/Qemu vs timeout of > Glusterd 
>>> that could fix this. > > Do anyone have an idea ? > > The whole point of 
>>> having gluster for me was to be able to shut down one > of the host while 
>>> the vm stay running. > > > Carl > > 
>>> ___ > Gluster-users mailing 
>>> list > Gluster-users@gluster.org > 
>>> https://lists.gluster.org/mailman/listinfo/gluster-users AVIS DE 
>>> CONFIDENTIALITÉ : Ce courriel peut contenir de l'information privilégiée et 
>>> confidentielle. Nous vous demandons de le détruire immédiatement si vous 
>>> n'êtes pas le destinataire. CONFIDENTIALITY NOTICE: This email may contain 
>>> information that is privileged and confidential. Please delete immediately 
>>> if you are not the intended recipient. 
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-19 Thread Guy Boisvert

On 2019-08-19 12:08 p.m., Darrell Budic wrote:

You also need to make sure your volume is setup properly for best performance. 
Did you apply the gluster virt group to your volumes, or at least 
features.shard = on on your VM volume?


That's what we did here:


gluster volume set W2K16_Rhenium cluster.quorum-type auto
gluster volume set W2K16_Rhenium network.ping-timeout 10
gluster volume set W2K16_Rhenium auth.allow \*
gluster volume set W2K16_Rhenium group virt
gluster volume set W2K16_Rhenium storage.owner-uid 36
gluster volume set W2K16_Rhenium storage.owner-gid 36
gluster volume set W2K16_Rhenium features.shard on
gluster volume set W2K16_Rhenium features.shard-block-size 256MB
gluster volume set W2K16_Rhenium cluster.data-self-heal-algorithm full
gluster volume set W2K16_Rhenium performance.low-prio-threads 32

tuned-adm profile random-io        (a profile i added in CentOS 7)


cat /usr/lib/tuned/random-io/tuned.conf
===
[main]
summary=Optimize for Gluster virtual machine storage
include=throughput-performance

[sysctl]

vm.dirty_ratio = 5
vm.dirty_background_ratio = 2


Any more optimization to add to this?


Guy

--
Guy Boisvert, ing.
IngTegration inc.
http://www.ingtegration.com
https://www.linkedin.com/in/guy-boisvert-8990487

AVIS DE CONFIDENTIALITE : ce message peut contenir des
renseignements confidentiels appartenant exclusivement a
IngTegration Inc. ou a ses filiales. Si vous n'etes pas
le destinataire indique ou prevu dans ce  message (ou
responsable de livrer ce message a la personne indiquee ou
prevue) ou si vous pensez que ce message vous a ete adresse
par erreur, vous ne pouvez pas utiliser ou reproduire ce
message, ni le livrer a quelqu'un d'autre. Dans ce cas, vous
devez le detruire et vous etes prie d'avertir l'expediteur
en repondant au courriel.

CONFIDENTIALITY NOTICE : Proprietary/Confidential Information
belonging to IngTegration Inc. and its affiliates may be
contained in this message. If you are not a recipient
indicated or intended in this message (or responsible for
delivery of this message to such person), or you think for
any reason that this message may have been addressed to you
in error, you may not use or copy or deliver this message to
anyone else. In such case, you should destroy this message
and are asked to notify the sender by reply email.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-19 Thread Carl Sirotic

No, I didn't.

I am very interested about these settings.

Also, is it possible to turn the shard feature AFTER the volume was 
started to be used ?



Carl

On 2019-08-19 12:08 p.m., Darrell Budic wrote:

You also need to make sure your volume is setup properly for best performance. 
Did you apply the gluster virt group to your volumes, or at least 
features.shard = on on your VM volume?


On Aug 19, 2019, at 11:05 AM, Carl Sirotic  
wrote:

Yes, I made sure there was no heal.
This is what I am suspecting thet shutting down a host isn't the right way to 
go.
Hi Carl, Did you check for any pending heals before rebooting the gluster server? Also, it was discussed that shutting down the node, does not stop the bricks 
properly and thus the clients will eait for a timeout before restoring full functionality.  You can stop your glusterd and actually all processes by using a 
script in /usr/share/gluster/scripts (the path is based on memory and could be wrong). Best Regards, Strahil NikllovOn Aug 19, 2019 18:34, Carl Sirotic wrote: 
> > Hi, > > we have a replicate 3 cluster. > > 2 other servers are clients that run VM that are stored on the Gluster > volumes. > > 
I had to reboot one of the brick for maintenance. > > The whole VM setup went super slow and some of the client crashed. > > I think there is some 
timeout setting for KVM/Qemu vs timeout of > Glusterd that could fix this. > > Do anyone have an idea ? > > The whole point of having gluster for 
me was to be able to shut down one > of the host while the vm stay running. > > > Carl > > ___ 
> Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users AVIS DE CONFIDENTIALITÉ : Ce 
courriel peut contenir de l'information privilégiée et confidentielle. Nous vous demandons de le détruire immédiatement si vous n'êtes pas le destinataire. 
CONFIDENTIALITY NOTICE: This email may contain information that is privileged and confidential. Please delete immediately if you are not the intended recipient. 
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-19 Thread Darrell Budic
You also need to make sure your volume is setup properly for best performance. 
Did you apply the gluster virt group to your volumes, or at least 
features.shard = on on your VM volume?

> On Aug 19, 2019, at 11:05 AM, Carl Sirotic  
> wrote:
> 
> Yes, I made sure there was no heal.
> This is what I am suspecting thet shutting down a host isn't the right way to 
> go.
> Hi Carl, Did you check for any pending heals before rebooting the gluster 
> server? Also, it was discussed that shutting down the node, does not stop the 
> bricks properly and thus the clients will eait for a timeout before restoring 
> full functionality.  You can stop your glusterd and actually all processes by 
> using a script in /usr/share/gluster/scripts (the path is based on memory and 
> could be wrong). Best Regards, Strahil NikllovOn Aug 19, 2019 18:34, Carl 
> Sirotic wrote: > > Hi, > > we have a replicate 3 cluster. > > 2 other servers 
> are clients that run VM that are stored on the Gluster > volumes. > > I had 
> to reboot one of the brick for maintenance. > > The whole VM setup went super 
> slow and some of the client crashed. > > I think there is some timeout 
> setting for KVM/Qemu vs timeout of > Glusterd that could fix this. > > Do 
> anyone have an idea ? > > The whole point of having gluster for me was to be 
> able to shut down one > of the host while the vm stay running. > > > Carl > > 
> ___ > Gluster-users mailing list 
> > Gluster-users@gluster.org > 
> https://lists.gluster.org/mailman/listinfo/gluster-users AVIS DE 
> CONFIDENTIALITÉ : Ce courriel peut contenir de l'information privilégiée et 
> confidentielle. Nous vous demandons de le détruire immédiatement si vous 
> n'êtes pas le destinataire. CONFIDENTIALITY NOTICE: This email may contain 
> information that is privileged and confidential. Please delete immediately if 
> you are not the intended recipient. 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Brick Reboot => VMs slowdown, client crashes

2019-08-19 Thread Carl Sirotic
Yes, I made sure there was no heal.This is what I am suspecting thet shutting down a host isn't the right way to go.
Hi Carl,
Did you check for any pending heals before rebooting the gluster server?

Also, it was discussed that shutting down the node, does not stop the bricks properly and thus the clients will eait for a timeout before restoring full functionality.

You can stop your glusterd and actually all processes by using a script in /usr/share/gluster/scripts (the path is based on memory and could be wrong).

Best Regards,
Strahil NikllovOn Aug 19, 2019 18:34, Carl Sirotic  wrote:
>
> Hi, 
>
> we have a replicate 3 cluster. 
>
> 2 other servers are clients that run VM that are stored on the Gluster 
> volumes. 
>
> I had to reboot one of the brick for maintenance. 
>
> The whole VM setup went super slow and some of the client crashed. 
>
> I think there is some timeout setting for KVM/Qemu vs timeout of 
> Glusterd that could fix this. 
>
> Do anyone have an idea ? 
>
> The whole point of having gluster for me was to be able to shut down one 
> of the host while the vm stay running. 
>
>
> Carl 
>
> ___ 
> Gluster-users mailing list 
> Gluster-users@gluster.org 
> https://lists.gluster.org/mailman/listinfo/gluster-users 

AVIS DE CONFIDENTIALITÉ : Ce courriel peut contenir de l'information privilégiée et confidentielle. Nous vous demandons de le détruire immédiatement si vous n'êtes pas le destinataire.
CONFIDENTIALITY NOTICE: This email may contain information that is privileged and confidential. Please delete immediately if you are not the intended recipient.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users