Re: [Gluster-users] GlusterFS performance
El dia Wed, 27 Feb 2013 15:37:53 +0600, en/na Nikita A Kardashin va escriure: I have GlusterFS installation with parameters: - 4 servers, connected by 1Gbit/s network (760-800 Mbit/s by iperf) - Distributed-replicated volume with 4 bricks and 2x4 redundancy formula. - Replicated volume with 2 bricks and 2x2 formula. (...) What I can do with performance? What version are you using on it? I'd suggest you 3.3.0 since it offers me best i/o ratios. signature.asc Description: PGP signature ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Isolating Gluster volumes with a firewall (amazon security groups)
Hi All, I am looking to utilize gluster on amazon aws for shared storage among web servers (Elastic Beanstalk). However, since I plan on using the gluster tier for numerous different beanstalk environments, I'd like to isolate the systems from accessing each others data. Since Beanstalk uses dynamic IP addresses, I can't utilize the built in auth.allow and auth.reject in gluster to isolate volumes. Is it acceptable to firewall (utilizing security groups in amazon) the underlying bricks (24009+) to prevent cross volume access? Thanks in advance, Al ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
On 02/27/2013 04:36 AM, Nikita A Kardashin wrote: I am using 3.3.0. Now I remove volume and re-create it with 4-replica count (without distribution) and got 31.9 MB/s :( What are your volume settings? Have you adjusted the cache sizes? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
On 27.02.2013 09:37, Nikita A Kardashin wrote: To Replicated gluster volume: 89MB/s To Distributed-replicated gluster volume: 49MB/s Test command is: sync echo 3 /proc/sys/vm/drop_caches dd if=/dev/zero of=gluster.test.bin bs=1G count=1 Hello Nikita, To me that sounds just about right, it's the kind of speed I get as well. If you think of it, what happens in the background is that the 1GB file is written not only from you to one of the servers, but also from this server to the other servers, so it gets properly replicated/distributed. So depending on your setup, you are not writing 1GB file once, but 3-4 times, hence the drop in speed. You could squeeze a bit more out of it if you can create the volumes over the servers' secondary nic so servers-to-server traffic goes through there. -- Sent from the Delta quadrant using Borg technology! Nux! www.nux.ro ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
On 02/27/2013 07:34 AM, Michael Cronenworth wrote: What are your volume settings? Have you adjusted the cache sizes? Sorry.. I see your original post and the settings now. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
I know. But I think, each file written four times on speed of storage (190MB/s, if network is not overloaded), and overall performance is remains on usable level. And on two same (by hardware) servers in replicated, not distributed mode with scheme 2 bricks, 2 servers, 1x2 redundancy I got about 100MB/s write performance without any tuning (with default settings). Why on 4 servers with 2x2 formula I got only 50MB/s with any tuning settings? What speed I get in planned implementation - 9 servers in distributed and replicated mode with 3x3 redundancy? 5Mb/s? Maybe some system and/or gluster tweals can help me, or I going by wrong way? My use-case is simple - distributed, redundant shared storage for Openstack cloud. Initially we planing to use 9 servers in 3x3 scheme, in future - much more. 2013/2/27 Michael Cronenworth m...@cchtml.com On 02/27/2013 07:34 AM, Michael Cronenworth wrote: What are your volume settings? Have you adjusted the cache sizes? Sorry.. I see your original post and the settings now. __**_ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.**org/mailman/listinfo/gluster-**usershttp://supercolony.gluster.org/mailman/listinfo/gluster-users -- With best regards, differentlocal (www.differentlocal.ru | differentlo...@gmail.com), System administrator. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
On 02/27/2013 08:34 AM, Nux! wrote: To me that sounds just about right, it's the kind of speed I get as well. If you think of it, what happens in the background is that the 1GB file is written not only from you to one of the servers, but also from this server to the other servers, so it gets properly replicated/distributed. Actually not quite. The data has to go directly from the client to each of the servers, so that client's outbound bandwidth gets divided by N replicas. The way you describe would actually perform better, because then you get to use that first server's outbound bandwidth as well, so for N=2 you'd be running at full speed (though at a slight cost in latency if the writes are synchronous). At N=4 you'd be at 1/3 speed, because one copy has to go from client to first server while three have to go from that first server to the others. Some systems even do chain replication where each server sends only to one other, giving an even more extreme tradeoff between bandwidth and latency. In the scenario that started this, the expected I/O rate is wire speed divided by number of replicas. If the measured network performance is 100MB/s (800Mb/s) then with two replicas one would expect no better than 50MB/s and with four replicas no better than 25MB/s. Any better results are likely to be the results of caching, so that the real bandwidth isn't actually being measured. Worse results are often the result of contention, such as very large buffers resulting in memory starvation for something else. I don't know of any reason why results would be worse with distribution, and have never seen that effect myself. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
Mounted with default options by Gluster-FUSE. Gluster-FUSE client takes care of replication. So the client makes sure the data is sent to all server nodes. So with a replicate count of 2 the traffic from the client node doubles. With 3 replica's you triple network load from a client perspective. So a client maxes out around 30 MB/sec with a 1Gbit uplink. You could try to trunk the network on your client but 10Gbit ethernet ( i suggest to use fiber, because of latency issues with copper 10Gbit) or Infiniband would be the way to go for high throughput. Before going that way I would first make sure your real-life load is indeed bandwidth limited and not IOPS limited. Copying a few files and looking at the speed usually does not say a lot about real-life perfomance. (unless real-life is also just one person copying big files around) Most workloads I see hit IOPS limits a long time before throughput. Cheers, Robert ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Performance in VM guests when hosting VM images on Gluster
I'm seeing less-than-stellar performance on my Gluster deployment when hosting VM images on the FUSE mount. I've seen that this topic has surfaced before, but my googling and perusing of the list archive haven't turned out very conclusive. I'm on a 2-node distribute+replicate cluster, the clients use Gluster via the FUSE mount. torbjorn@storage01:~$ sudo gluster volume info Volume Name: gluster0 Type: Distributed-Replicate Volume ID: 81bbf681-ecdb-4866-9b45-41d5d2df7b35 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: storage01.gluster.trollweb.net:/srv/gluster/brick0 Brick2: storage02.gluster.trollweb.net:/srv/gluster/brick0 Brick3: storage01.gluster.trollweb.net:/srv/gluster/brick1 Brick4: storage02.gluster.trollweb.net:/srv/gluster/brick1 The naive dd case from one client, on the dom0, looks like this: torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo dd if=/dev/zero of=bigfile bs=1024k count=2000 2097152000 bytes (2.1 GB) copied, 22.9161 s, 91.5 MB/s The clients see each node on a separate 1Gbps NIC, so this is pretty close to the expected transfer rate. Writing with the sync flag, from dom0, looks like so: torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo dd if=/dev/zero of=bigfile bs=1024k count=2000 oflag=sync 2097152000 bytes (2.1 GB) copied, 51.1271 s, 41.0 MB/s If we use a file on the gluster mount as backing for a loop device, and do a sync write: torbjorn@xen01:/srv/ganeti/shared-file-storage/tmp$ sudo dd if=/dev/zero of=/dev/loop1 bs=1024k count=2000 oflag=sync 2097152000 bytes (2.1 GB) copied, 56.3729 s, 37.2 MB/s The Xen instances are managed by Ganeti, using the loopback interface over a file on Gluster. Inside the Xen instance the performance is not quite what I was hoping. torbjorn@hennec:~$ sudo dd if=/dev/zero of=bigfile bs=1024k count=2000 2097152000 bytes (2.1 GB) copied, 1267.39 s, 1.7 MB/s The transfer rate is similar when using sync or direct flags with dd. Are these expected performance levels ? A couple of threads[1] talk about performance, and seem to indicate my situation isn't unique. However, I'm under the impression that other are using a similar setup with much better performance. [1]: * http://www.gluster.org/pipermail/gluster-users/2012-January/032369.html * http://www.gluster.org/pipermail/gluster-users/2012-July/033763.html -- Vennlig hilsen Torbjørn Thorsen Utvikler / driftstekniker Trollweb Solutions AS - Professional Magento Partner www.trollweb.no Telefon dagtid: +47 51215300 Telefon kveld/helg: For kunder med Serviceavtale Besøksadresse: Luramyrveien 40, 4313 Sandnes Postadresse: Maurholen 57, 4316 Sandnes Husk at alle våre standard-vilkår alltid er gjeldende ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS performance
On Wed, Feb 27, 2013 at 02:46:28PM +, Robert van Leeuwen wrote: You could try to trunk the network on your client but 10Gbit ethernet ( i suggest to use fiber, because of latency issues with copper 10Gbit) Aside: 10G with SFP+ direct-attach cables also works well, even though it's copper. The latency problem is with 10Gbase-T (RJ45 / CAT6(a)) ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Slow read performance
Help please- I am running 3.3.1 on Centos using a 10GB network. I get reasonable write speeds, although I think they could be faster. But my read speeds are REALLY slow. Executive summary: On gluster client- Writes average about 700-800MB/s Reads average about 70-80MB/s On server- Writes average about 1-1.5GB/s Reads average about 2-3GB/s Any thoughts? Here are some additional details: Nothing interesting in any of the log files, everything is very quite. All servers had no other load, and all clients are performing the same way. Volume Name: shared Type: Distribute Volume ID: de11cc19-0085-41c3-881e-995cca244620 Status: Started Number of Bricks: 26 Transport-type: tcp Bricks: Brick1: fs-disk2:/storage/disk2a Brick2: fs-disk2:/storage/disk2b Brick3: fs-disk2:/storage/disk2d Brick4: fs-disk2:/storage/disk2e Brick5: fs-disk2:/storage/disk2f Brick6: fs-disk2:/storage/disk2g Brick7: fs-disk2:/storage/disk2h Brick8: fs-disk2:/storage/disk2i Brick9: fs-disk2:/storage/disk2j Brick10: fs-disk2:/storage/disk2k Brick11: fs-disk2:/storage/disk2l Brick12: fs-disk2:/storage/disk2m Brick13: fs-disk2:/storage/disk2n Brick14: fs-disk2:/storage/disk2o Brick15: fs-disk2:/storage/disk2p Brick16: fs-disk2:/storage/disk2q Brick17: fs-disk2:/storage/disk2r Brick18: fs-disk2:/storage/disk2s Brick19: fs-disk2:/storage/disk2t Brick20: fs-disk2:/storage/disk2u Brick21: fs-disk2:/storage/disk2v Brick22: fs-disk2:/storage/disk2w Brick23: fs-disk2:/storage/disk2x Brick24: fs-disk3:/storage/disk3a Brick25: fs-disk3:/storage/disk3b Brick26: fs-disk3:/storage/disk3c Options Reconfigured: performance.write-behind: on performance.read-ahead: on performance.io-cache: on performance.stat-prefetch: on performance.quick-read: on cluster.min-free-disk: 500GB nfs.disable: off sysctl.conf settings for 10GBe # increase TCP max buffer size settable using setsockopt() net.core.rmem_max = 67108864 net.core.wmem_max = 67108864 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 # increase the length of the processor input queue net.core.netdev_max_backlog = 25 # recommended default congestion control is htcp net.ipv4.tcp_congestion_control=htcp # recommended for hosts with jumbo frames enabled net.ipv4.tcp_mtu_probing=1 Thomas W. Sr. Systems Administrator COLA/IGES tw...@cola.iges.org Affiliate Computer Scientist GMU ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] I/O error repaired only by owner or root access
-- Dan Bretherton ESSC Computer System Manager Department of Meteorology Harry Pitt Building, 3 Earley Gate University of Reading Reading, RG6 7BE (or RG6 6AL for postal service deliveries) UK Tel. +44 118 378 5205, Fax: +44 118 378 6413 -- ## Please sponsor me to run in VSO's 30km Race to the Eye ## ##http://www.justgiving.com/DanBretherton ## On 02/25/2013 04:44 PM, Dan Bretherton wrote: On 02/25/2013 03:49 PM, Shawn Nock wrote: Dan Bretherton d.a.brether...@reading.ac.uk writes: Hello Rajesh- Here are the permissions. The path in question is a directory. [sms05dab@jupiter ~]$ ls -ld /users/gcs/WORK/ORCA1/ORCA1-R07-MEAN/Ctl drwxr-xr-x 60 vq901510 nemo 110592 Feb 23 04:37 /users/gcs/WORK/ORCA1/ORCA1-R07-MEAN/Ctl [sms05dab@jupiter ~]$ ls -ld /users/gcs/WORK/ORCA1/ORCA1-R07-MEAN lrwxrwxrwx 1 gcs nemo 49 Feb 1 2012 /users/gcs/WORK/ORCA1/ORCA1-R07-MEAN - /data/pegasus/users/gcs/WORK/ORCA1/ORCA1-R07-MEAN [sms05dab@jupiter ~]$ ls -ld /data/pegasus/users/gcs/WORK/ORCA1/ORCA1-R07-MEAN drwxr-xr-x 27 gcs nemo 99210 Feb 23 03:14 /data/pegasus/users/gcs/WORK/ORCA1/ORCA1-R07-MEAN As you can see the parent directory in this case was a symlink but that's not significant. I ran the ls -l commands using my account - sms05dab, but the problem was originally reported by user vq901510. until I did ls -l as root neither of us could access the directory, because the parent directory was owned by user gcs. Usually the problem is related to ownership of the file or directory itself. This is the first time I have seen the I/O error caused by parent directory permissions. This problem seems to have started following an add-brick operation a few weeks ago, after which I started gluster volume rebalance VOLNAME fix-layout (which is still running). It occurred to me that the problem could be related to link files, many of which need to be rewritten following add-brick operations. This could explain why the ownership of the parent directory is significant, because users sms05dab and vq901510 don't have permission to write in the parent directory owned by user gcs. Normally this wouldn't be a problem because only read access to other users' data is required, but it appears as though read access was being denied because the new link file couldn't be written by unprivileged users. Is this a plausible explanation of the I/O error do you think? This sounds like my recent bug: https://bugzilla.redhat.com/show_bug.cgi?id=913699 In the bug, I said that writing on the fuse mount of one of the brick servers fixed the problem but those were the only hosts I was attempting access as root. Thanks Shawn. To confirm that we are seeing the same bug I will try accessing affected files from a FUSE mount on a server the next time it happens. -Dan. I have updated the bug report with another recent example. In this most recent case, attempting to open a file for reading as an unprivileged user resulted in the error Invalid argument, although ls -l worked without error. This happened on a compute server and via a GlusterFS client mount point on a GlusterFS storage server. Changing the ownership of the parent directory to give the unprivileged user write access to the directory (making it writeable by group) allowed the user to open the file for reading. -Dan. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Slow read performance
How are you doing the read/write tests on the fuse/glusterfs mountpoint? Many small files will be slow because all the time is spent coordinating locks. On Wed, Feb 27, 2013 at 9:31 AM, Thomas Wakefield tw...@cola.iges.orgwrote: Help please- I am running 3.3.1 on Centos using a 10GB network. I get reasonable write speeds, although I think they could be faster. But my read speeds are REALLY slow. Executive summary: On gluster client- Writes average about 700-800MB/s Reads average about 70-80MB/s On server- Writes average about 1-1.5GB/s Reads average about 2-3GB/s Any thoughts? Here are some additional details: Nothing interesting in any of the log files, everything is very quite. All servers had no other load, and all clients are performing the same way. Volume Name: shared Type: Distribute Volume ID: de11cc19-0085-41c3-881e-995cca244620 Status: Started Number of Bricks: 26 Transport-type: tcp Bricks: Brick1: fs-disk2:/storage/disk2a Brick2: fs-disk2:/storage/disk2b Brick3: fs-disk2:/storage/disk2d Brick4: fs-disk2:/storage/disk2e Brick5: fs-disk2:/storage/disk2f Brick6: fs-disk2:/storage/disk2g Brick7: fs-disk2:/storage/disk2h Brick8: fs-disk2:/storage/disk2i Brick9: fs-disk2:/storage/disk2j Brick10: fs-disk2:/storage/disk2k Brick11: fs-disk2:/storage/disk2l Brick12: fs-disk2:/storage/disk2m Brick13: fs-disk2:/storage/disk2n Brick14: fs-disk2:/storage/disk2o Brick15: fs-disk2:/storage/disk2p Brick16: fs-disk2:/storage/disk2q Brick17: fs-disk2:/storage/disk2r Brick18: fs-disk2:/storage/disk2s Brick19: fs-disk2:/storage/disk2t Brick20: fs-disk2:/storage/disk2u Brick21: fs-disk2:/storage/disk2v Brick22: fs-disk2:/storage/disk2w Brick23: fs-disk2:/storage/disk2x Brick24: fs-disk3:/storage/disk3a Brick25: fs-disk3:/storage/disk3b Brick26: fs-disk3:/storage/disk3c Options Reconfigured: performance.write-behind: on performance.read-ahead: on performance.io-cache: on performance.stat-prefetch: on performance.quick-read: on cluster.min-free-disk: 500GB nfs.disable: off sysctl.conf settings for 10GBe # increase TCP max buffer size settable using setsockopt() net.core.rmem_max = 67108864 net.core.wmem_max = 67108864 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 67108864 net.ipv4.tcp_wmem = 4096 65536 67108864 # increase the length of the processor input queue net.core.netdev_max_backlog = 25 # recommended default congestion control is htcp net.ipv4.tcp_congestion_control=htcp # recommended for hosts with jumbo frames enabled net.ipv4.tcp_mtu_probing=1 Thomas W. Sr. Systems Administrator COLA/IGES tw...@cola.iges.org Affiliate Computer Scientist GMU ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Slow read performance
Every time you open/close a file or a directory you will have to wait for locks which take time. This is totally expected. Why don't you share what you want to do? iozone benchmarks look like crap but serving qcow2 files to qemu works fantastic for me. What are you doing? Make a benchmark that does that. If you are going to have many files with a wide variety of sizes glusterfs/fuse might not be what you are looking for. On Wed, Feb 27, 2013 at 12:56 PM, Thomas Wakefield tw...@cola.iges.orgwrote: I have tested everything, small and large files. I have used file sizes ranging from 128k up to multiple GB files. All the reads are bad. Here is a fairly exhaustive iozone auto test: random random bkwd record stride KB reclen write rewritereadrereadread write read rewrite read fwrite frewrite fread freread 64 4 40222 6349226868300601620 71037 157270570312947709672475 1473613928 64 8 99207 11636613591135133214 97690 3155 10997828920 152018 158480 1893617625 64 16 230257 2537662515628713 10867 223732 8873 24429754796 303383 312204 1506213545 64 32 255943 234481 5735102 7100397 11897 318502 13681 34780124214 695778 528618 2583828094 64 64 214096 681644 6421025 7100397 27453 292156 28117 62165727338 376062 512471 2856932534 128 4 74329 7546826428410891131 72857 111866976 15977377878343 1335113026 128 8 100862 13517024966167342617 118966 2560 12040639156 125121 146613 1617716180 128 16 115114 25398328212178545307 246180 5431 22984347335 255920 271173 2725624445 128 32 256042 3913603984864258 11329 290230 9905 42956338176 490380 463696 2091719219 128 64 248573 592699 4557257 6812590 19583 452366 29263 60335742967 814915 692017 7632737604 128 128 921183 526444 5603747 5379161 45614 390222 65441 82620241384 662962 1040839 7852639023 256 4 76212 7733740295321251289 71866 126164645 14365730953048 2307329550 256 8 126922 14197626237251302566 128058 2565 138981 2985 125060 133603 2284024955 256 16 242883 26363641850243714902 250009 5290 24879289353 243821 247303 2696526199 256 32 409074 4397324010139335 11953 436870 11209 43021883743 409542 479390 3082127750 256 64 259935 5715026484071847 22537 617161 23383 39204791852 672010 802614 4167353111 256 128 847597 812329 18551783198 49383 708831 44668 79488974267 1180188 1662639 5430341018 256 256 481324 709299 5217259 5320671 44668 719277 40954 80805041302 790209 771473 6222435754 512 4 77667 7522635102296961337 66262 145167680 14136926569142 4208427897 512 8 134311 14434130144246462102 134143 2209 134699 2296 108110 128616 2510429123 512 16 200085 24878730235256974196 247240 4179 256116 4768 250003 226436 3235128455 512 32 330341 43980526440392848744 457611 8006 424168 125953 425935 448813 2766026951 512 64 483906 7337294874741121 16032 555938 17424 587256 187343 366977 735740 4170041548 512 128 836636 9077176935994921 42443 761031 36828 964378 123165 651383 695697 5836844459 512 256 520879 860437 145534 135523 40267 847532 31585 66325269696 1270846 1492545 4882248092 512 512 782951 973118 3099691 2942541 42328 871966 46218 91118449791 953248 1036527 5272348347 1024 4 76218 6936236431287111137 66171 117468938 11257056670845 3494228914 1024 8 126045 14052437836156642698 126000 2557 125566 2567 110858 127255 2676427945 1024 16 243398 26142940238232633987 246400 3882 260746 4093 236652 236874 3142925076 1024 32 383109 42207641731