[Gluster-users] gluster heal performance (was: Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.)

2020-09-10 Thread Martin Bähr
Excerpts from Gionatan Danti's message of 2020-09-11 00:35:52 +0200:
> The main point was the potentially long heal time 

could you (or anyone else) please elaborate on what long heal times are
to be expected?

we have a 3-node replica cluster running version 3.12.9 (we are building
a new cluster now) with 32TiB of space. each node has a single brick on
top of a 7-disk raid5 (linux softraid)

at one point we had one node unavailable for one month (gluster failed
to start up properly on that node and we didn't have monitoring in place
to notice) and the accumulated changes of one month of operation took 4
months to heal. i would have expected this ideally to take 2 weeks or
less, one month at the worst (ie faster than or at least as fast as it
took to create the data but not slower, and especially not 4 times
slower)

the initial heal count was about 6million files for one node and
5.4million for the other.

the healing speed was not constant. at first the heal count increased,
that is, healing was seemingly slower than the amount of new files
added. then it started to speed up and the first million of each node
took about 46 days to heal, while the last million took 4 days.

i logged the output of 
   "gluster volume heal gluster-volume statistics heal-count" 
every hour to monitor the healing process.


what makes healing so slow?


almost all files are newly added and not changed, so they were missing
on the node that was offline. the files are backup for user devices, so
almost all files are written once and rarely, if ever, read.

we do have a few huge directories with 25, 88000, 6 and 29000
subdirectories each. in total 26TiB of small files, but no more than
a few 1000 per directory. (it's user data, some have more, some have
less)

could those huge directories be responsible for the slow healing? 

the filesystem is ext4 on top of a 7 disk raid5.

after this ordeal was over we discovered the readdir-ahead setting which
was on.  we turned that off based on other discussions on performance
that suggested an improvement from this change, but we haven't had the
opportunity to do a large healing test since, so we can't tell if it
makes a difference for us.

any insights would be appreciated.

greetings, martin.

--
general managerrealss.com
student mentor   fossasia.org
community mentor blug.sh  beijinglug.club
pike programmer  pike.lysator.liu.secaudium.net societyserver.org
Martin Bähr  working in chinahttp://societyserver.org/mbaehr/




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] glusterd restarts every 2 minutes

2020-09-10 Thread Computerisms Corporation

Hi List,

In my 2-server gluster setup, one server is consistently restarting the 
glusterd proccess.  On the first second of every other minute, I get a 
shutdown in my glusterd log:


W [glusterfsd.c:1596:cleanup_and_exit] 
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f7410fa5fa3] 
-->/usr/local/sbin/glusterd(glusterfs_sigwaiter+0xed) [0x55e18d840b8d] 
-->/usr/local/sbin/glusterd(cleanup_and_exit+0x54) [0x55e18d8409e4] ) 
0-: received signum (15), shutting down


Gluster then automatically starts back up, everything remounts and by 
the 8th second log entries stop, and life carries on for another 1 
minute and 53 seconds, and then the shutdown message shows up again.  I 
can provide the full startup log if it will help.


I am experiencing a few hiccups in the server that could possibly be 
traced to this, but I am not sure of that, and generally speaking the 
server doesn't seem to be suffering for this happening.


I have tried tailing all the other logs in conjunction with the glusterd 
log thinking something might show up just before that first second to 
give me a clue what is issuing the shutdown signal, but nothing shows up 
as the culprit.


I have tried setting the log level to DEBUG in the glusterd.sevice 
systemd file.  that didn't seem to work.  As per the red hat manual, I 
also tried setting it on the command line, and the startup message did 
say it started with the debug argument, but it didn't add any thing to 
indicate what is causing it to shut down.


gstatus indicates everything is up and healthy, though it will take a 
couple seconds to run if I run it at the first second after any 
even-numbered minute.


I found an old thread on google about logind.conf file having 
KillUserProcesses=1, it is commented on both servers, but pretty much 
every other mention I found either has no solution or has the same error 
but doesn't match the symptoms otherwise.


Can anyone suggest how I might go about finding out why the one server 
is doing this?

--
Bob Miller
Cell: 867-334-7117
Office: 867-633-3760
Office: 867-322-0362
www.computerisms.ca




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] performance

2020-09-10 Thread Computerisms Corporation

Hi Danti,

the notes are not very verbose, but looks like the following lines were 
removed from their virtualization config:


  
  
  

They also enabled hyperthreading, so having 12 "cores" instead of 6 now. 
 Guessing that had a lot to do with it...


On 2020-09-04 8:20 a.m., Gionatan Danti wrote:

Il 2020-09-04 01:00 Computerisms Corporation ha scritto:

For the sake of completeness I am reporting back that your suspicions
seem to have been validated.  I talked to the data center, they made
some changes.  we talked again some days later, and they made some
more changes, and for several days now load average on both machines
is staying consistently below 5 on both servers.  I still have some
issues to deal with, but the performance of the machines are now no
longer a problem.


Great! Any chances to know *what* they did change?
Thanks.






Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.

2020-09-10 Thread Gionatan Danti

Il 2020-09-10 23:13 Miguel Mascarenhas Filipe ha scritto:

can you explain better how a single disk failing would bring a whole
node out of service?


Oh, I did a bad cut/paste. A single disk failure will not put the entire 
node out-of-service. The main point was the potentially long heal time 
(and the implicit dependence on Gluster to do the right thing when a 
disk fails).


Sorry for the confusion.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.

2020-09-10 Thread Miguel Mascarenhas Filipe
On Thu, 10 Sep 2020 at 21:53, Gionatan Danti  wrote:

> Il 2020-09-09 15:30 Miguel Mascarenhas Filipe ha scritto:
> > I'm setting up GlusterFS on 2 hw w/ same configuration, 8  hdds. This
> > deployment will grow later on.
>
> Hi, I really suggest avoiding a replica 2 cluster unless it is for
> testing only. Be sure to add an arbiter at least (using a replica 2
> arbiter 1 cluster).
>
> > I'm undecided between these different configurations and am seeing
> > comments or advice from more experienced users of GlusterFS.
> >
> > Here is the summary of 3 options:
> > 1. 1 brick per host, Gluster "distributed" volumes, internal
> > redundancy at brick level
>
> I strongly suggest against it: any server reboot will cause troubles for
> mounted clients. I will end with *lower* uptime than a single server.
>
> > 2. 1 brick per drive, Gluster "distributed replicated" volumes, no
> > internal redundancy
>
> This would increase Gluster performance via multiple bricks; however a
> single failed disk will put the entire note out-of-service. Moreover,
> Gluster heals are much slower processes than a simple RAID1/ZFS mirror
> recover.


can you explain better how a single disk failing would bring a whole node
out of service?

from your comments this one sounds the best, but having node outages from
single disk failures doesn’t sound acceptable..



>
> > 3. 1 brick per host, Gluster "distributed replicated" volumes, no
> > internal redundancy
>
> Again, a suggest against it: a single failed disk will put the entire
> note out-of-service *and* will cause massive heal as all data need to be
> copied from the surviving node, which is a long and stressful event for
> the other node (and for the sysadmin).
>
> In short, I would not use Gluster without *both* internal and
> brick-level redundancy. For a simple setup, I suggest option #1 but in
> replica setup (rather than distributed). You can increase the number of
> briks (mountpoint) via multiple zfs datasets, if needed.



>
> Regards.
>
> --
> Danti Gionatan
> Supporto Tecnico
> Assyoma S.r.l. - www.assyoma.it
> email: g.da...@assyoma.it - i...@assyoma.it
> GPG public key ID: FF5F32A8
>
-- 
Miguel Mascarenhas Filipe




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.

2020-09-10 Thread Gionatan Danti

Il 2020-09-09 15:30 Miguel Mascarenhas Filipe ha scritto:

I'm setting up GlusterFS on 2 hw w/ same configuration, 8  hdds. This
deployment will grow later on.


Hi, I really suggest avoiding a replica 2 cluster unless it is for 
testing only. Be sure to add an arbiter at least (using a replica 2 
arbiter 1 cluster).



I'm undecided between these different configurations and am seeing
comments or advice from more experienced users of GlusterFS.

Here is the summary of 3 options:
1. 1 brick per host, Gluster "distributed" volumes, internal
redundancy at brick level


I strongly suggest against it: any server reboot will cause troubles for 
mounted clients. I will end with *lower* uptime than a single server.



2. 1 brick per drive, Gluster "distributed replicated" volumes, no
internal redundancy


This would increase Gluster performance via multiple bricks; however a 
single failed disk will put the entire note out-of-service. Moreover, 
Gluster heals are much slower processes than a simple RAID1/ZFS mirror 
recover.



3. 1 brick per host, Gluster "distributed replicated" volumes, no
internal redundancy


Again, a suggest against it: a single failed disk will put the entire 
note out-of-service *and* will cause massive heal as all data need to be 
copied from the surviving node, which is a long and stressful event for 
the other node (and for the sysadmin).


In short, I would not use Gluster without *both* internal and 
brick-level redundancy. For a simple setup, I suggest option #1 but in 
replica setup (rather than distributed). You can increase the number of 
briks (mountpoint) via multiple zfs datasets, if needed.


Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.da...@assyoma.it - i...@assyoma.it
GPG public key ID: FF5F32A8




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.

2020-09-10 Thread Miguel Mascarenhas Filipe
hi,
thanks both for the replies

On Thu, 10 Sep 2020 at 16:08, Darrell Budic  wrote:

> I run ZFS on my servers (with additional RAM for that reason) in my
> replica-3 production cluster. I choose size and ZFS striping of HDDs, with
> easier compression and ZFS controlled caching using SDDs for my workload
> (mainly VMs). It performs as expected, but I don’t have the resources to
> head-to-head it against a bunch of individual bricks for testing. One
> drawback to large bricks is that it can take longer to heal, in my
> experience. I also run some smaller volumes on SSDs for VMs with databases
> and other high-IOPS workloads, and for those I use tuned XFS volumes
> because I didn’t want compression and did want faster healing.
>
> With the options you’ve presented, I’d use XFS on single bricks, there’s
> not much need for the overhead unless you REALLY want ZFS compression, and
> ZFS if you wanted higher performing volumes, mirrors, or had some cache to
> take advantage of. Or you knew your workload could take advantage of the
> things ZFS is good at, like setting specific record sizes tuned to your
> work load on sub-volumes. But that depends on how you’re planning to
> consume your storage, as file shares or as disk images. The ultimate way to
> find out, of course, is to test each configuration and see which gives you
> the most of what you want :)


yes, zfs (or btrfs ) was for compression but also for the added robustness
provided by checksums. I didnt mention btrfs but i’m confortable with btrfs
for simple volumes with compression.. but i imagine there isnt a large user
base of glusterfs + btrfs.

this is a mostly cold dataset with lots of uncompressed training data for
ML.

there is one argument for bit fat internally redundant (zfs) brick which
is:
 there is more widespread knowledge on how to manage failed drives on zfs..
one of the inputs i was seeking due to my inexperience with glusterfs is
this management side.
i didnt see on the docs how to add spare drives or what happens when a
brick dies.. what type of healing exists.. if for example there isnt a
replacement drive..


>
> And definitely get a 3rd server in there with at least enough storage to
> be an arbiter. At the level you’re talking, I’d try and deck it out
> properly and have 3 active hosts off the bat so you can have a proper
> redundancy scheme. Split brain more than sucks.


agreed, im aware of split brain. will add additional nodes asap, it is
already planned.

>
>
>  -Darrell
>
> > On Sep 10, 2020, at 1:33 AM, Diego Zuccato 
> wrote:
> >
> > Il 09/09/20 15:30, Miguel Mascarenhas Filipe ha scritto:
> >
> > I'm a noob, but IIUC this is the option giving the best performance:
> >
> >> 2. 1 brick per drive, Gluster "distributed replicated" volumes, no
> >> internal redundancy
> >
> > Clients can write to both servers in parallel and read scattered (read
> > performance using multiple files ~ 16x vs 2x with a single disk per
> > host). Moreover it's easier to extend.
> > But why ZFS instead of XFS ? In my experience it's heavier.
> >
> > PS: add a third host ASAP, at least for arbiter volumes (replica 3
> > arbiter 1). Split brain can be a real pain to fix!
> >
> > --
> > Diego Zuccato
> > DIFA - Dip. di Fisica e Astronomia
> > Servizi Informatici
> > Alma Mater Studiorum - Università di Bologna
> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > tel.: +39 051 20 95786
> > 
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://bluejeans.com/441850968
> >
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> --
Miguel Mascarenhas Filipe




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.

2020-09-10 Thread Darrell Budic
I run ZFS on my servers (with additional RAM for that reason) in my replica-3 
production cluster. I choose size and ZFS striping of HDDs, with easier 
compression and ZFS controlled caching using SDDs for my workload (mainly VMs). 
It performs as expected, but I don’t have the resources to head-to-head it 
against a bunch of individual bricks for testing. One drawback to large bricks 
is that it can take longer to heal, in my experience. I also run some smaller 
volumes on SSDs for VMs with databases and other high-IOPS workloads, and for 
those I use tuned XFS volumes because I didn’t want compression and did want 
faster healing.

With the options you’ve presented, I’d use XFS on single bricks, there’s not 
much need for the overhead unless you REALLY want ZFS compression, and ZFS if 
you wanted higher performing volumes, mirrors, or had some cache to take 
advantage of. Or you knew your workload could take advantage of the things ZFS 
is good at, like setting specific record sizes tuned to your work load on 
sub-volumes. But that depends on how you’re planning to consume your storage, 
as file shares or as disk images. The ultimate way to find out, of course, is 
to test each configuration and see which gives you the most of what you want :)

And definitely get a 3rd server in there with at least enough storage to be an 
arbiter. At the level you’re talking, I’d try and deck it out properly and have 
3 active hosts off the bat so you can have a proper redundancy scheme. Split 
brain more than sucks.

 -Darrell

> On Sep 10, 2020, at 1:33 AM, Diego Zuccato  wrote:
> 
> Il 09/09/20 15:30, Miguel Mascarenhas Filipe ha scritto:
> 
> I'm a noob, but IIUC this is the option giving the best performance:
> 
>> 2. 1 brick per drive, Gluster "distributed replicated" volumes, no
>> internal redundancy
> 
> Clients can write to both servers in parallel and read scattered (read
> performance using multiple files ~ 16x vs 2x with a single disk per
> host). Moreover it's easier to extend.
> But why ZFS instead of XFS ? In my experience it's heavier.
> 
> PS: add a third host ASAP, at least for arbiter volumes (replica 3
> arbiter 1). Split brain can be a real pain to fix!
> 
> -- 
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
> 
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users





Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] new daemon gluster-ta-volume.service needed?

2020-09-10 Thread peter knezel
Hello Aravinda,

thanks for the clarification.
So i guessed it correctly - i have disabled it already.

Thanks and kind regards,
peterk

On Thu, 10 Sep 2020 at 16:25, Aravinda VK  wrote:

> Hi Peter,
>
> On 10-Sep-2020, at 7:50 PM, peter knezel  wrote:
>
> Hello all,
>
> i have updated glusterfs (-client,-common,-server) packages from 5.13-1 to
> 6.10-1
> on 2 servers with debian stretch (9.x).
> Strangely new daemon appeared: gluster-ta-volume.service
> Is it needed or can be safely disabled?
> i am not using any arbiter, only: Type: Replicate
>
>
> Not required. Only required to use Thin Arbiter Volume
>
> https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/
>
>
>
> Thanks and kind regards,
> peterk
> 
>
>
>
> Community Meeting Calendar:
>
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
> Aravinda Vishwanathapura
> https://kadalu.io
>
>
>
>




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] new daemon gluster-ta-volume.service needed?

2020-09-10 Thread Aravinda VK
Hi Peter,

> On 10-Sep-2020, at 7:50 PM, peter knezel  wrote:
> 
> Hello all,
> 
> i have updated glusterfs (-client,-common,-server) packages from 5.13-1 to 
> 6.10-1
> on 2 servers with debian stretch (9.x).
> Strangely new daemon appeared: gluster-ta-volume.service
> Is it needed or can be safely disabled?
> i am not using any arbiter, only: Type: Replicate

Not required. Only required to use Thin Arbiter Volume
https://docs.gluster.org/en/latest/Administrator%20Guide/Thin-Arbiter-Volumes/ 


> 
> 
> Thanks and kind regards,
> peterk
> 
> 
> 
> 
> Community Meeting Calendar:
> 
> Schedule -
> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> Bridge: https://bluejeans.com/441850968
> 
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users


Aravinda Vishwanathapura
https://kadalu.io







Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] new daemon gluster-ta-volume.service needed?

2020-09-10 Thread peter knezel
Hello all,

i have updated glusterfs (-client,-common,-server) packages from 5.13-1 to
6.10-1
on 2 servers with debian stretch (9.x).
Strangely new daemon appeared: gluster-ta-volume.service
Is it needed or can be safely disabled?
i am not using any arbiter, only: Type: Replicate


Thanks and kind regards,
peterk




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Fwd: New GlusterFS deployment, doubts on 1 brick per host vs 1 brick per drive.

2020-09-10 Thread Diego Zuccato
Il 09/09/20 15:30, Miguel Mascarenhas Filipe ha scritto:

I'm a noob, but IIUC this is the option giving the best performance:

> 2. 1 brick per drive, Gluster "distributed replicated" volumes, no
> internal redundancy

Clients can write to both servers in parallel and read scattered (read
performance using multiple files ~ 16x vs 2x with a single disk per
host). Moreover it's easier to extend.
But why ZFS instead of XFS ? In my experience it's heavier.

PS: add a third host ASAP, at least for arbiter volumes (replica 3
arbiter 1). Split brain can be a real pain to fix!

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786




Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users