Re: [Gluster-users] performance - what can I expect

2019-05-02 Thread Amar Tumballi Suryanarayan
On Thu, May 2, 2019 at 1:21 PM Pascal Suter  wrote:

> Hi Amar
>
> thanks for rolling this back up. Actually i have done some more
> benchmarking and fiddled with the config to finally reach a performance
> figure i could live with. I now can squeeze about 3GB/s out of that server
> which seems to be close to what i can get out of its network uplink (using
> IP over Omni-Path). The system is now set up and in production so i can't
> run any benchmarks on it anymore but i will get back at benchmarking in the
> near future to test some storage related hardware, and i will try it with
> gluster on top again.
>
> embarassingly the biggest performance issue was that the default
> installation of the server was running the "performance" profile of tuned.
> once i switched it to "throughput-performance" performance increased
> dramatically.
>
> the volume info now looks pretty unspectacular:
>
> Volume Name: storage
> Type: Distribute
> Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: themis01:/data/brick1/brick
> Brick2: themis01:/data/brick2/brick
> Brick3: themis01:/data/brick3/brick
> Options Reconfigured:
> transport.address-family: inet
> nfs.disable: on
>
> thanks for pointing out gluster volume profile, i'll have a go with it
> during my next benchmarking session. so far i was using iostat to track
> brick-level io performance during my benchmarks.
>
> the main question i wanted to ask was, if there is a general rule of
> thumb, how much throughput of the original bare brick throughput would be
> expected to be left over once gluster is added on top of it. to give you an
> example: when I use a parallel filesystem like Lustre or BeeGFS i usually
> expect to get at least about 85% of the raw storage target throughput as
> aggregated bandwidth over a multi-node test out of my Lustre or BeeGFS
> setup. I consider any numbers below that to be too low and therefore will
> have to dig into performance tuning to find the bottle neck.
>
> i was hoping someone could give me a rule-of-thumb number for a simple
> distributed gluster setup, like that 85% number i've established for a
> parallel file system.
>
> so at the moment my takeaway is, in a simple distributed volume across 3
> bricks with an aggregated bandwidth of 6GB/s i can expect to get about
> 3GB/s aggregated bandwith out of the gluster mount, given there are no
> bottle necks in the network. the 3GB/s is a number conducted under ideal
> circumstances, meaning, i primed the storage to make sure i could run a
> benchmark run using three nodes, with each node running a single thread
> writing to a single file and each file was located on another bricke. this
> yielded the maximum perfomance as this was pure streaming IO without any
> overlapping file writing to the bricks other than the overhead created by
> gluster's own internal mechanisms.
>
> Interestingly, the performance didn't drop much when i added nodes and
> threads and introduced more random-ish io by having several processes write
> to the same brick. So I assume, what "eats" up the 50% performance in the
> end is probably Gluster writing all these additional hidden files which I
> assume is some sort of Metadata. This causes additional IO on the disk that
> i'm streaming my one file to and therefore turns my streaming IO into a
> random io load for the raid controller and underlying harddisks which on
> spinning disks would have about the performance impact i was seing in my
> benchmarks.
>

Thanks for all these details.

I have yet to try gluster on a Flash based brick and test its performance
> there.. i would expect to see a better "efficiency" than the 50% i've
> measured on this system here as random io vs. streaming io should not make
> such a difference (or acutally almost no difference at all) on a flash
> based storage. but that's  me guessing now.
>
> so for the moment i'm fine but i would still be interested in hearing
> ball-park figure "efficiency" numbers from others using gluster in a
> similar setup.
>

We couldn't get a single number on this yet. Mainly because of multiple
reasons.
* Gluster's volume type has different behavior (performance wise)
* Network plays more significant role than that of disk performance. Mostly
latency involved in n/w than the throughput.
* Different work loads (like create heavy Vs read/write, sequential
read/write Vs random read/write) needs different options (currently they
are not auto-tuned).
* If one has good n/w and disk speed, even back end filesystem
configuration (because of the layout we have with gfid etc) too matter a
bit.

Best thing is to understand the workload first, and then tuning for it (at
present).

cheers
>
> Pascal
> On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote:
>
> Hi Pascal,
>
> Sorry for complete delay in this one. And thanks for testing out in
> different scenarios.  Few questions before others can have a look and

Re: [Gluster-users] performance - what can I expect

2019-05-02 Thread Pascal Suter

Hi Amar

thanks for rolling this back up. Actually i have done some more 
benchmarking and fiddled with the config to finally reach a performance 
figure i could live with. I now can squeeze about 3GB/s out of that 
server which seems to be close to what i can get out of its network 
uplink (using IP over Omni-Path). The system is now set up and in 
production so i can't run any benchmarks on it anymore but i will get 
back at benchmarking in the near future to test some storage related 
hardware, and i will try it with gluster on top again.


embarassingly the biggest performance issue was that the default 
installation of the server was running the "performance" profile of 
tuned. once i switched it to "throughput-performance" performance 
increased dramatically.


the volume info now looks pretty unspectacular:

Volume Name: storage
Type: Distribute
Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c
Status: Started
Snapshot Count: 0
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: themis01:/data/brick1/brick
Brick2: themis01:/data/brick2/brick
Brick3: themis01:/data/brick3/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on

thanks for pointing out gluster volume profile, i'll have a go with it 
during my next benchmarking session. so far i was using iostat to track 
brick-level io performance during my benchmarks.


the main question i wanted to ask was, if there is a general rule of 
thumb, how much throughput of the original bare brick throughput would 
be expected to be left over once gluster is added on top of it. to give 
you an example: when I use a parallel filesystem like Lustre or BeeGFS i 
usually expect to get at least about 85% of the raw storage target 
throughput as aggregated bandwidth over a multi-node test out of my 
Lustre or BeeGFS setup. I consider any numbers below that to be too low 
and therefore will have to dig into performance tuning to find the 
bottle neck.


i was hoping someone could give me a rule-of-thumb number for a simple 
distributed gluster setup, like that 85% number i've established for a 
parallel file system.


so at the moment my takeaway is, in a simple distributed volume across 3 
bricks with an aggregated bandwidth of 6GB/s i can expect to get about 
3GB/s aggregated bandwith out of the gluster mount, given there are no 
bottle necks in the network. the 3GB/s is a number conducted under ideal 
circumstances, meaning, i primed the storage to make sure i could run a 
benchmark run using three nodes, with each node running a single thread 
writing to a single file and each file was located on another bricke. 
this yielded the maximum perfomance as this was pure streaming IO 
without any overlapping file writing to the bricks other than the 
overhead created by gluster's own internal mechanisms.


Interestingly, the performance didn't drop much when i added nodes and 
threads and introduced more random-ish io by having several processes 
write to the same brick. So I assume, what "eats" up the 50% performance 
in the end is probably Gluster writing all these additional hidden files 
which I assume is some sort of Metadata. This causes additional IO on 
the disk that i'm streaming my one file to and therefore turns my 
streaming IO into a random io load for the raid controller and 
underlying harddisks which on spinning disks would have about the 
performance impact i was seing in my benchmarks.


I have yet to try gluster on a Flash based brick and test its 
performance there.. i would expect to see a better "efficiency" than the 
50% i've measured on this system here as random io vs. streaming io 
should not make such a difference (or acutally almost no difference at 
all) on a flash based storage. but that's  me guessing now.


so for the moment i'm fine but i would still be interested in hearing 
ball-park figure "efficiency" numbers from others using gluster in a 
similar setup.


cheers

Pascal

On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote:

Hi Pascal,

Sorry for complete delay in this one. And thanks for testing out in 
different scenarios.  Few questions before others can have a look and 
advice you.


1. What is the volume info output ?

2. Do you see any concerning logs in glusterfs log files?

3. Please use `gluster volume profile` while running the tests, and 
that gives a lot of information.


4. Considering you are using glusterfs-6.0, please take statedump of 
client process (on any node) before and after the test, so we can 
analyze the latency information of each translators.


With these information, I hope we will be in a better state to answer 
the questions.



On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter > wrote:


i continued my testing with 5 clients, all attached over 100Gbit/s
omni-path via IP over IB. when i run the same iozone benchmark across
all 5 clients where gluster is mounted using the glusterfs client,
i get
an aggretated write throughput of 

Re: [Gluster-users] performance - what can I expect

2019-05-01 Thread Amar Tumballi Suryanarayan
Hi Pascal,

Sorry for complete delay in this one. And thanks for testing out in
different scenarios.  Few questions before others can have a look and
advice you.

1. What is the volume info output ?

2. Do you see any concerning logs in glusterfs log files?

3. Please use `gluster volume profile` while running the tests, and that
gives a lot of information.

4. Considering you are using glusterfs-6.0, please take statedump of client
process (on any node) before and after the test, so we can analyze the
latency information of each translators.

With these information, I hope we will be in a better state to answer the
questions.


On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter  wrote:

> i continued my testing with 5 clients, all attached over 100Gbit/s
> omni-path via IP over IB. when i run the same iozone benchmark across
> all 5 clients where gluster is mounted using the glusterfs client, i get
> an aggretated write throughput of only about 400GB/s and an aggregated
> read throughput of 1.5GB/s. Each node was writing a single 200Gb file in
> 16MB chunks and the files where distributed across all three bricks on
> the server.
>
> the connection was established over Omnipath for sure, as there is no
> other link between the nodes and server.
>
> i have no clue what i'm doing wrong here. i can't believe that this is a
> normal performance people would expect to see from gluster. i guess
> nobody would be using it if it was this slow.
>
> again, when written dreictly to the xfs filesystem on the bricks, i get
> over 6GB/s read and write throughput using the same benchmark.
>
> any advise is appreciated
>
> cheers
>
> Pascal
>
> On 04.04.19 12:03, Pascal Suter wrote:
> > I just noticed i left the most important parameters out :)
> >
> > here's the write command with filesize and recordsize in it as well :)
> >
> > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w
> > -+S 0 -s 200G -r 16384k
> >
> > also i ran the benchmark without direct_io which resulted in an even
> > worse performance.
> >
> > i also tried to mount the gluster volume via nfs-ganesha which further
> > reduced throughput down to about 450MB/s
> >
> > if i run the iozone benchmark with 3 threads writing to all three
> > bricks directly (from the xfs filesystem) i get throughputs of around
> > 6GB/s .. if I run the same benchmark through gluster mounted locally
> > using the fuse client and with enough threads so that each brick gets
> > at least one file written to it, i end up seing throughputs around
> > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the
> > same if i run the benchmark with less threads and files only get
> > written to two out of three bricks.
> >
> > cpu load on the server is around 25% by the way, nicely distributed
> > across all available cores.
> >
> > i can't believe that gluster should really be so slow and everybody is
> > just happily using it. any hints on what i'm doing wrong are very
> > welcome.
> >
> > i'm using gluster 6.0 by the way.
> >
> > regards
> >
> > Pascal
> >
> > On 03.04.19 12:28, Pascal Suter wrote:
> >> Hi all
> >>
> >> I am currently testing gluster on a single server. I have three
> >> bricks, each a hardware RAID6 volume with thin provisioned LVM that
> >> was aligned to the RAID and then formatted with xfs.
> >>
> >> i've created a distributed volume so that entire files get
> >> distributed across my three bricks.
> >>
> >> first I ran a iozone benchmark across each brick testing the read and
> >> write perofrmance of a single large file per brick
> >>
> >> i then mounted my gluster volume locally and ran another iozone run
> >> with the same parameters writing a single file. the file went to
> >> brick 1 which, when used driectly, would write with 2.3GB/s and read
> >> with 1.5GB/s. however, through gluster i got only 800MB/s read and
> >> 750MB/s write throughput
> >>
> >> another run with two processes each writing a file, where one file
> >> went to the first brick and the other file to the second brick (which
> >> by itself when directly accessed wrote at 2.8GB/s and read at
> >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated
> >> read throughput.
> >>
> >> Is this a normal performance i can expect out of a glusterfs or is it
> >> worth tuning in order to really get closer to the actual brick
> >> filesystem performance?
> >>
> >> here are the iozone commands i use for writing and reading.. note
> >> that i am using directIO in order to make sure i don't get fooled by
> >> cache :)
> >>
> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
> >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt
> >>
> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0
> >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt
> >>
> >> cheers
> >>
> >> Pascal
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> 

Re: [Gluster-users] performance - what can I expect

2019-04-10 Thread Pascal Suter
i continued my testing with 5 clients, all attached over 100Gbit/s 
omni-path via IP over IB. when i run the same iozone benchmark across 
all 5 clients where gluster is mounted using the glusterfs client, i get 
an aggretated write throughput of only about 400GB/s and an aggregated 
read throughput of 1.5GB/s. Each node was writing a single 200Gb file in 
16MB chunks and the files where distributed across all three bricks on 
the server.


the connection was established over Omnipath for sure, as there is no 
other link between the nodes and server.


i have no clue what i'm doing wrong here. i can't believe that this is a 
normal performance people would expect to see from gluster. i guess 
nobody would be using it if it was this slow.


again, when written dreictly to the xfs filesystem on the bricks, i get 
over 6GB/s read and write throughput using the same benchmark.


any advise is appreciated

cheers

Pascal

On 04.04.19 12:03, Pascal Suter wrote:

I just noticed i left the most important parameters out :)

here's the write command with filesize and recordsize in it as well :)

./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w 
-+S 0 -s 200G -r 16384k


also i ran the benchmark without direct_io which resulted in an even 
worse performance.


i also tried to mount the gluster volume via nfs-ganesha which further 
reduced throughput down to about 450MB/s


if i run the iozone benchmark with 3 threads writing to all three 
bricks directly (from the xfs filesystem) i get throughputs of around 
6GB/s .. if I run the same benchmark through gluster mounted locally 
using the fuse client and with enough threads so that each brick gets 
at least one file written to it, i end up seing throughputs around 
1.5GB/s .. that's a 4x decrease in performance. at it actually is the 
same if i run the benchmark with less threads and files only get 
written to two out of three bricks.


cpu load on the server is around 25% by the way, nicely distributed 
across all available cores.


i can't believe that gluster should really be so slow and everybody is 
just happily using it. any hints on what i'm doing wrong are very 
welcome.


i'm using gluster 6.0 by the way.

regards

Pascal

On 03.04.19 12:28, Pascal Suter wrote:

Hi all

I am currently testing gluster on a single server. I have three 
bricks, each a hardware RAID6 volume with thin provisioned LVM that 
was aligned to the RAID and then formatted with xfs.


i've created a distributed volume so that entire files get 
distributed across my three bricks.


first I ran a iozone benchmark across each brick testing the read and 
write perofrmance of a single large file per brick


i then mounted my gluster volume locally and ran another iozone run 
with the same parameters writing a single file. the file went to 
brick 1 which, when used driectly, would write with 2.3GB/s and read 
with 1.5GB/s. however, through gluster i got only 800MB/s read and 
750MB/s write throughput


another run with two processes each writing a file, where one file 
went to the first brick and the other file to the second brick (which 
by itself when directly accessed wrote at 2.8GB/s and read at 
2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated 
read throughput.


Is this a normal performance i can expect out of a glusterfs or is it 
worth tuning in order to really get closer to the actual brick 
filesystem performance?


here are the iozone commands i use for writing and reading.. note 
that i am using directIO in order to make sure i don't get fooled by 
cache :)


./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 
-s $filesize -r $recordsize > iozone-brick${b}-write.txt


./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 
-s $filesize -r $recordsize > iozone-brick${b}-read.txt


cheers

Pascal

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] performance - what can I expect

2019-04-04 Thread Pascal Suter

I just noticed i left the most important parameters out :)

here's the write command with filesize and recordsize in it as well :)

./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w 
-+S 0 -s 200G -r 16384k


also i ran the benchmark without direct_io which resulted in an even 
worse performance.


i also tried to mount the gluster volume via nfs-ganesha which further 
reduced throughput down to about 450MB/s


if i run the iozone benchmark with 3 threads writing to all three bricks 
directly (from the xfs filesystem) i get throughputs of around 6GB/s .. 
if I run the same benchmark through gluster mounted locally using the 
fuse client and with enough threads so that each brick gets at least one 
file written to it, i end up seing throughputs around 1.5GB/s .. that's 
a 4x decrease in performance. at it actually is the same if i run the 
benchmark with less threads and files only get written to two out of 
three bricks.


cpu load on the server is around 25% by the way, nicely distributed 
across all available cores.


i can't believe that gluster should really be so slow and everybody is 
just happily using it. any hints on what i'm doing wrong are very welcome.


i'm using gluster 6.0 by the way.

regards

Pascal

On 03.04.19 12:28, Pascal Suter wrote:

Hi all

I am currently testing gluster on a single server. I have three 
bricks, each a hardware RAID6 volume with thin provisioned LVM that 
was aligned to the RAID and then formatted with xfs.


i've created a distributed volume so that entire files get distributed 
across my three bricks.


first I ran a iozone benchmark across each brick testing the read and 
write perofrmance of a single large file per brick


i then mounted my gluster volume locally and ran another iozone run 
with the same parameters writing a single file. the file went to brick 
1 which, when used driectly, would write with 2.3GB/s and read with 
1.5GB/s. however, through gluster i got only 800MB/s read and 750MB/s 
write throughput


another run with two processes each writing a file, where one file 
went to the first brick and the other file to the second brick (which 
by itself when directly accessed wrote at 2.8GB/s and read at 2.7GB/s) 
resulted in 1.2GB/s of aggregated write and also aggregated read 
throughput.


Is this a normal performance i can expect out of a glusterfs or is it 
worth tuning in order to really get closer to the actual brick 
filesystem performance?


here are the iozone commands i use for writing and reading.. note that 
i am using directIO in order to make sure i don't get fooled by cache :)


./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 
-s $filesize -r $recordsize > iozone-brick${b}-write.txt


./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 
-s $filesize -r $recordsize > iozone-brick${b}-read.txt


cheers

Pascal

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] performance - what can I expect

2019-04-03 Thread Pascal Suter

Hi all

I am currently testing gluster on a single server. I have three bricks, 
each a hardware RAID6 volume with thin provisioned LVM that was aligned 
to the RAID and then formatted with xfs.


i've created a distributed volume so that entire files get distributed 
across my three bricks.


first I ran a iozone benchmark across each brick testing the read and 
write perofrmance of a single large file per brick


i then mounted my gluster volume locally and ran another iozone run with 
the same parameters writing a single file. the file went to brick 1 
which, when used driectly, would write with 2.3GB/s and read with 
1.5GB/s. however, through gluster i got only 800MB/s read and 750MB/s 
write throughput


another run with two processes each writing a file, where one file went 
to the first brick and the other file to the second brick (which by 
itself when directly accessed wrote at 2.8GB/s and read at 2.7GB/s) 
resulted in 1.2GB/s of aggregated write and also aggregated read 
throughput.


Is this a normal performance i can expect out of a glusterfs or is it 
worth tuning in order to really get closer to the actual brick 
filesystem performance?


here are the iozone commands i use for writing and reading.. note that i 
am using directIO in order to make sure i don't get fooled by cache :)


./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 -s 
$filesize -r $recordsize > iozone-brick${b}-write.txt


./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 -s 
$filesize -r $recordsize > iozone-brick${b}-read.txt


cheers

Pascal

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users