On Thu, May 2, 2019 at 1:21 PM Pascal Suter <pascal.su...@dalco.ch> wrote:
> Hi Amar > > thanks for rolling this back up. Actually i have done some more > benchmarking and fiddled with the config to finally reach a performance > figure i could live with. I now can squeeze about 3GB/s out of that server > which seems to be close to what i can get out of its network uplink (using > IP over Omni-Path). The system is now set up and in production so i can't > run any benchmarks on it anymore but i will get back at benchmarking in the > near future to test some storage related hardware, and i will try it with > gluster on top again. > > embarassingly the biggest performance issue was that the default > installation of the server was running the "performance" profile of tuned. > once i switched it to "throughput-performance" performance increased > dramatically. > > the volume info now looks pretty unspectacular: > > Volume Name: storage > Type: Distribute > Volume ID: c81c7e46-add5-4d88-9945-24cf7947ef8c > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 > Transport-type: tcp > Bricks: > Brick1: themis01:/data/brick1/brick > Brick2: themis01:/data/brick2/brick > Brick3: themis01:/data/brick3/brick > Options Reconfigured: > transport.address-family: inet > nfs.disable: on > > thanks for pointing out gluster volume profile, i'll have a go with it > during my next benchmarking session. so far i was using iostat to track > brick-level io performance during my benchmarks. > > the main question i wanted to ask was, if there is a general rule of > thumb, how much throughput of the original bare brick throughput would be > expected to be left over once gluster is added on top of it. to give you an > example: when I use a parallel filesystem like Lustre or BeeGFS i usually > expect to get at least about 85% of the raw storage target throughput as > aggregated bandwidth over a multi-node test out of my Lustre or BeeGFS > setup. I consider any numbers below that to be too low and therefore will > have to dig into performance tuning to find the bottle neck. > > i was hoping someone could give me a rule-of-thumb number for a simple > distributed gluster setup, like that 85% number i've established for a > parallel file system. > > so at the moment my takeaway is, in a simple distributed volume across 3 > bricks with an aggregated bandwidth of 6GB/s i can expect to get about > 3GB/s aggregated bandwith out of the gluster mount, given there are no > bottle necks in the network. the 3GB/s is a number conducted under ideal > circumstances, meaning, i primed the storage to make sure i could run a > benchmark run using three nodes, with each node running a single thread > writing to a single file and each file was located on another bricke. this > yielded the maximum perfomance as this was pure streaming IO without any > overlapping file writing to the bricks other than the overhead created by > gluster's own internal mechanisms. > > Interestingly, the performance didn't drop much when i added nodes and > threads and introduced more random-ish io by having several processes write > to the same brick. So I assume, what "eats" up the 50% performance in the > end is probably Gluster writing all these additional hidden files which I > assume is some sort of Metadata. This causes additional IO on the disk that > i'm streaming my one file to and therefore turns my streaming IO into a > random io load for the raid controller and underlying harddisks which on > spinning disks would have about the performance impact i was seing in my > benchmarks. > Thanks for all these details. I have yet to try gluster on a Flash based brick and test its performance > there.. i would expect to see a better "efficiency" than the 50% i've > measured on this system here as random io vs. streaming io should not make > such a difference (or acutally almost no difference at all) on a flash > based storage. but that's me guessing now. > > so for the moment i'm fine but i would still be interested in hearing > ball-park figure "efficiency" numbers from others using gluster in a > similar setup. > We couldn't get a single number on this yet. Mainly because of multiple reasons. * Gluster's volume type has different behavior (performance wise) * Network plays more significant role than that of disk performance. Mostly latency involved in n/w than the throughput. * Different work loads (like create heavy Vs read/write, sequential read/write Vs random read/write) needs different options (currently they are not auto-tuned). * If one has good n/w and disk speed, even back end filesystem configuration (because of the layout we have with gfid etc) too matter a bit. Best thing is to understand the workload first, and then tuning for it (at present). cheers > > Pascal > On 01.05.19 14:55, Amar Tumballi Suryanarayan wrote: > > Hi Pascal, > > Sorry for complete delay in this one. And thanks for testing out in > different scenarios. Few questions before others can have a look and > advice you. > > 1. What is the volume info output ? > > 2. Do you see any concerning logs in glusterfs log files? > > 3. Please use `gluster volume profile` while running the tests, and that > gives a lot of information. > > 4. Considering you are using glusterfs-6.0, please take statedump of > client process (on any node) before and after the test, so we can analyze > the latency information of each translators. > > With these information, I hope we will be in a better state to answer the > questions. > > > On Wed, Apr 10, 2019 at 3:45 PM Pascal Suter <pascal.su...@dalco.ch> > wrote: > >> i continued my testing with 5 clients, all attached over 100Gbit/s >> omni-path via IP over IB. when i run the same iozone benchmark across >> all 5 clients where gluster is mounted using the glusterfs client, i get >> an aggretated write throughput of only about 400GB/s and an aggregated >> read throughput of 1.5GB/s. Each node was writing a single 200Gb file in >> 16MB chunks and the files where distributed across all three bricks on >> the server. >> >> the connection was established over Omnipath for sure, as there is no >> other link between the nodes and server. >> >> i have no clue what i'm doing wrong here. i can't believe that this is a >> normal performance people would expect to see from gluster. i guess >> nobody would be using it if it was this slow. >> >> again, when written dreictly to the xfs filesystem on the bricks, i get >> over 6GB/s read and write throughput using the same benchmark. >> >> any advise is appreciated >> >> cheers >> >> Pascal >> >> On 04.04.19 12:03, Pascal Suter wrote: >> > I just noticed i left the most important parameters out :) >> > >> > here's the write command with filesize and recordsize in it as well :) >> > >> > ./iozone -i 0 -t 1 -F /mnt/gluster/storage/thread1 -+n -c -C -e -I -w >> > -+S 0 -s 200G -r 16384k >> > >> > also i ran the benchmark without direct_io which resulted in an even >> > worse performance. >> > >> > i also tried to mount the gluster volume via nfs-ganesha which further >> > reduced throughput down to about 450MB/s >> > >> > if i run the iozone benchmark with 3 threads writing to all three >> > bricks directly (from the xfs filesystem) i get throughputs of around >> > 6GB/s .. if I run the same benchmark through gluster mounted locally >> > using the fuse client and with enough threads so that each brick gets >> > at least one file written to it, i end up seing throughputs around >> > 1.5GB/s .. that's a 4x decrease in performance. at it actually is the >> > same if i run the benchmark with less threads and files only get >> > written to two out of three bricks. >> > >> > cpu load on the server is around 25% by the way, nicely distributed >> > across all available cores. >> > >> > i can't believe that gluster should really be so slow and everybody is >> > just happily using it. any hints on what i'm doing wrong are very >> > welcome. >> > >> > i'm using gluster 6.0 by the way. >> > >> > regards >> > >> > Pascal >> > >> > On 03.04.19 12:28, Pascal Suter wrote: >> >> Hi all >> >> >> >> I am currently testing gluster on a single server. I have three >> >> bricks, each a hardware RAID6 volume with thin provisioned LVM that >> >> was aligned to the RAID and then formatted with xfs. >> >> >> >> i've created a distributed volume so that entire files get >> >> distributed across my three bricks. >> >> >> >> first I ran a iozone benchmark across each brick testing the read and >> >> write perofrmance of a single large file per brick >> >> >> >> i then mounted my gluster volume locally and ran another iozone run >> >> with the same parameters writing a single file. the file went to >> >> brick 1 which, when used driectly, would write with 2.3GB/s and read >> >> with 1.5GB/s. however, through gluster i got only 800MB/s read and >> >> 750MB/s write throughput >> >> >> >> another run with two processes each writing a file, where one file >> >> went to the first brick and the other file to the second brick (which >> >> by itself when directly accessed wrote at 2.8GB/s and read at >> >> 2.7GB/s) resulted in 1.2GB/s of aggregated write and also aggregated >> >> read throughput. >> >> >> >> Is this a normal performance i can expect out of a glusterfs or is it >> >> worth tuning in order to really get closer to the actual brick >> >> filesystem performance? >> >> >> >> here are the iozone commands i use for writing and reading.. note >> >> that i am using directIO in order to make sure i don't get fooled by >> >> cache :) >> >> >> >> ./iozone -i 0 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 >> >> -s $filesize -r $recordsize > iozone-brick${b}-write.txt >> >> >> >> ./iozone -i 1 -t 1 -F /mnt/brick${b}/thread1 -+n -c -C -e -I -w -+S 0 >> >> -s $filesize -r $recordsize > iozone-brick${b}-read.txt >> >> >> >> cheers >> >> >> >> Pascal >> >> >> >> _______________________________________________ >> >> Gluster-users mailing list >> >> Gluster-users@gluster.org >> >> https://lists.gluster.org/mailman/listinfo/gluster-users >> > _______________________________________________ >> > Gluster-users mailing list >> > Gluster-users@gluster.org >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users >> >> >> > > -- > Amar Tumballi (amarts) > > -- Amar Tumballi (amarts)
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users