Re: [Gluster-users] Gluster-users Digest, Vol 86, Issue 1 - Message 5: client load high using FUSE mount
- Original Message - From: gluster-users-requ...@gluster.org To: gluster-users@gluster.org Sent: Monday, June 1, 2015 8:00:01 AM Subject: Gluster-users Digest, Vol 86, Issue 1 Message: 5 Date: Mon, 01 Jun 2015 13:11:13 +0200 From: Mitja Miheli? mitja.mihe...@arnes.si To: gluster-users@gluster.org Subject: [Gluster-users] Client load high (300) using fuse mount Message-ID: 556c3dd1.1080...@arnes.si Content-Type: text/plain; charset=utf-8; format=flowed Hi! I am trying to set up a Wordpress cluster using GlusterFS used for storage. Web nodes will access the same Wordpress install on a volume mounted via FUSE from a 3 peer GlusterFS TSP. I started with one web node and Wordpress on local storage. The load average was constantly about 5. iotop showed about 300kB/s disk reads or less. The load average was below 6. When I mounted the GlusterFS volume to the web node the 1min load average went over 300. Each of the 3 peers is transmitting about 10MB/s to my web node regardless of the load. TSP peers are on 10Gbit NICs and the web node is on a 1Gbit NIC. 30 MB/s is about 1/3 line speed for a 1-Gbps NIC port. Sounds like network latency and lack of client-side caching might be your bottleneck, might want to put a 10-Gbps NIC port on your client. You did disable client-side caching (md-cache and io-cache translators) below, was that your intent? Also, defaults for these translators are very conservative, if only 1 client you may want to increase time that data is cached (in the client) using FUSE mount options entry-timeout=30 and attribute-timeout=30. Unlike non-distributed Linux filesystems, Gluster is very conservative about client side caching to avoid cache coherency issues. I'm out of ideas here... Could it be the network? What should I look at for optimizing the network stack on the client? Options set on TSP: Options Reconfigured: performance.cache-size: 4GB network.ping-timeout: 15 cluster.quorum-type: auto network.remote-dio: on cluster.eager-lock: on performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off performance.cache-refresh-timeout: 4 performance.io-thread-count: 32 nfs.disable: on Too many tunings, what are these intended to do? The gluster volume reset command allows you to undo this. in Gluster 3.7, the gluster volume get your-volume all command lets you see what the defaults are. Regards, Mitja -- -- Mitja Miheli? ARNES, Tehnolo?ki park 18, p.p. 7, SI-1001 Ljubljana, Slovenia tel: +386 1 479 8877, fax: +386 1 479 88 78 ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 85, Issue 22 - 9. Re: seq read performance comparion between libgfapi and fuse
Paul, I don't check this list every day, I would expect you can get more than half of minimum of network line speed or storage block device speed using a single libgfapi sequential read thread. I did not see any throughput calculation or file size in your e-mail. HTH, inline below... -ben e - Original Message - From: gluster-users-requ...@gluster.org To: gluster-users@gluster.org Sent: Friday, May 22, 2015 8:00:02 AM Subject: Gluster-users Digest, Vol 85, Issue 22 Message: 8 Date: Fri, 22 May 2015 18:50:40 +0800 From: Paul Guo bigpaul...@foxmail.com To: gluster-users@gluster.org Subject: [Gluster-users] seq read performance comparion between libgfapi andfuse Message-ID: 555f0a00.2060...@foxmail.com Content-Type: text/plain; charset=gbk; format=flowed Hello, I wrote two simple single-process seq read test case to compare libgfapi and fuse. The logic looks like this. char buf[32768]; while (1) { cnt = read(fd, buf, sizeof(buf)); if (cnt == 0) break; else if (cnt 0) total += cnt; // No cnt 0 was found during testing. } Following is the time which is needed to finish reading a large file. fuse libgfapi direct io: 40s 51s non direct io: 40s 47s The version is 3.6.3 on centos6.5. The result shows that libgfapi is obviously slower than the fuse interface although the cpu cycles were saved a lot during libgfapi testing. Each test was run before cleaning up all kernel pagecheinodedentry caches and stopping and then starting glusterdgluster (to clean up gluster cache). so if you use libgfapi in a single-threaded app, you may need to tune gluster volume parameter read-ahead-page-count (defaults to 4). The default is intended to trade-off single-thread performance for better aggregate performance and response time. Here is a example of how to tune it for a single-thread use case, don't do this all the time. gluster volume set your-volume performance.read-ahead-page-count 16 As a debugging tool, you can try disabling readahead translator altogether # gluster v set your-volume read-ahead off To reset parameters to defaults: # gluster v set your-volume read-ahead # gluster v set your-volume read-ahead-page-count I have a benchmark for libgfapi testing in case this is useful to you: https://github.com/bengland2/parallel-libgfapi please e-mail me direct if problems with it. I tested direct io because I suspected that fuse kernel readahead helped more than the read optimization solutions in gluster. I searched a lot but I did not find much about the comparison between fuse and libgfapi. Anyone has known about this and known why? If you use O_DIRECT you may be bypassing readahead translator in Gluster and this may account for your problem. Try NOT using O_DIRECT, and try above tuning. Or if you really need O_DIRECT on client, try this command, which disables O_DIRECT on the server side but not the client, it's equivalent of NFS behavior. # gluster v set your-volume network.remote-dio on Also try turning off io-cache translator which will not help you here. # gluster v set your-volume io-cache off Also, O_DIRECT is passed all the way to the server by Gluster so your disk reads will ALSO use O_DIRECT, this is terrible for performance. You want to have block device readahead when doing this test. Suggest you set it to at least 4096 KB for block devices used for Gluster brick mountpoints. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] High CPU Usage - Glusterfsd
Renchu, I didn't see anything about average file size and read/write mix. One example of how to observe both of these, as well as latency and throughput - on server run these commands: # gluster volume profile your-volume start # gluster volume profile your-volume info /tmp/dontcare # sleep 60 # gluster volume profile your-volume info profile-for-last-minute.log There is also a gluster volume top command that may be of use to you in understanding what your users are doing with Gluster. Also you may want to run top -H and see whether any threads in either glusterfsd or smbd are at or near 100% CPU - if so, you really are hitting a CPU bottleneck. Looking at process CPU utilization can be deceptive, since a process may include multiple threads. sar -n DEV 2 will show you network utilization, and iostat -mdx /dev/sd? 2 on your server will show block device queue depth (latter two tools require sysstat rpm). Together these can help you to understand what kind of bottleneck you are seeing. I don't see how many bricks are in your Gluster volume but it sounds like you have only one glusterfsd/server. If you have idle cores on your servers, you can harness more CPU power by using multiple bricks/server, which results in multiple glusterfsd processes on each server, allowing greater parallelism. For example, you can do this by presenting individual disk drives as bricks rather than RAID volumes. Let us know if these suggestions helped -ben england - Original Message - From: Renchu Mathew ren...@cracknell.com To: gluster-users@gluster.org Cc: gluster-de...@gluster.org Sent: Sunday, February 22, 2015 7:09:09 AM Subject: [Gluster-devel] High CPU Usage - Glusterfsd Dear all, I have implemented glusterfs storage on my company – 2 servers with replicate. But glustherfsd shows more than 100% CPU utilization most of the time. So it is so slow to access the gluster volume. My setup is two glusterfs servers with replication. The gluster volume (almost 10TB of data) is mounted on another server (glusterfs native client) and using samba share for the network users to access those files. Is there any way to reduce the processor usage on these servers? Please give a solution ASAP since the users are complaining about the poor performance. I am using glusterfs version 3.6. Regards Renchu Mathew | Sr. IT Administrator CRACKNELL DUBAI | P.O. Box 66231 | United Arab Emirates | T +971 4 3445417 | F +971 4 3493675 | M +971 50 7386484 ABU DHABI | DUBAI | LONDON | MUSCAT | DOHA | JEDDAH EMAIL ren...@cracknell.com | WEB www.cracknell.com This email, its content and any files transmitted with it are intended solely for the addressee(s) and may be legally privileged and/or confidential. If you are not the intended recipient please let us know by email reply and delete it from the system. Please note that any views or opinions presented in this email do not necessarily represent those of the company. Email transmissions cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The company therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of email transmission. ___ Gluster-devel mailing list gluster-de...@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 77, Issue 2
Message: 9 Date: Tue, 2 Sep 2014 17:17:25 +0800 From: Jaden Liang jaden1...@gmail.com To: gluster-de...@gluster.org, gluster-users@gluster.org Subject: [Gluster-users] [Gluster-devel] Regarding the write performance in replica 1 volume in 1Gbps Ethernet, get about 50MB/s while writing single file. Message-ID: ca+vqw5ndlma+a92wkek2v1foom55ushrnyz-yfahj_32ubq...@mail.gmail.com Content-Type: text/plain; charset=utf-8 Hello, gluster-devel and gluster-users team, We are running a performance test in a replica 1 volume and find out the single file sequence writing performance only get about 50MB/s in a 1Gbps Ethernet. However, if we test multiple files sequence writing, the writing performance can go up to 120MB/s which is the top speed of network. not sure what you mean, are you writing multiple files concurrently or 1 at a time? With FUSE, this matters -- I typically see best throughput with more than one file being transferred at the same time. We also tried to use the stat xlator to find out where is the bottleneck of single file write performance. Here is the stat data: Client-side: .. vs_vol_rep1-client-8.latency.WRITE=total:21834371.00us, mean:2665.328491us, count:8192, max:4063475, min:1849 .. Server-side: .. /data/sdb1/brick1.latency.WRITE=total:6156857.00us, mean:751.569458us, count:8192, max:230864, min:611 .. what's your write transfer size? with FUSE, this really matters a lot, since FUSE does not aggregate writes, so each write has to travel from the application to the glusterfs mountpoint process, resulting in slow performance for small transfer sizes. In general, it's a good idea to supply the details of your workload generator and how it was run, so we can compare with other known workloads and results. Note that the test is write a 1GB single file sequentially to a replica 1 volume through 1Gbps Ethernet network. So for example try using # dd if=/dev/zero of=/mnt/glusterfs/your-file.dd bs=1024k count=1k and see whether your throughput is still 50 MB/s. On the client-side, we can see there are 8192 write requests totally. Every request will write 128KB data. Total eclipsed time is 21834371us, about 21 seconds. The mean time of request is 2665us, about 2.6ms which means it could only serves about 380 requests in 1 seconds. Plus there are other time consuming like statfs, lookup, but those are not major reasons. On the server-side, the mean time of request is 751us include write data to HDD disk. So we think that is not the major reason. And we also modify some codes to do the statistic of system epoll elapsed time. It only took about 20us from enqueue data to finish sent-out. Now we are heading to the rpc mechanism in glusterfs. Still, we think this issue maybe encountered in gluster-devel or gluster-users teams. Therefor, any suggestions would be grateful. Or have anyone know such issue? Best regards, Jaden Liang 9/2/2014 -- Best regards, Jaden Liang -- next part -- An HTML attachment was scrubbed... URL: http://supercolony.gluster.org/pipermail/gluster-users/attachments/20140902/5dcbc91b/attachment-0001.html -- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users End of Gluster-users Digest, Vol 77, Issue 2 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 76, Issue 18 - Re: reading not distributed across bricks
Message: 1 Date: Mon, 11 Aug 2014 09:53:30 -0400 (EDT) From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-users@gluster.org, Ray Mannings manningsr...@gmail.com Subject: Re: [Gluster-users] Reading not distributed across bricks Message-ID: 417182971.4749068.1407765210056.javamail.zim...@redhat.com Content-Type: text/plain; charset=utf-8 - Original Message - hi Ray, Reads are served from the bricks which respond the fastest at the moment. They are not load-balanced. Maybe a good feature for 3.7? :) Ray, There already is a feature, from gluster volume set help: Option: cluster.read-hash-mode Description: inode-read fops happen only on one of the bricks in replicate. AFR will prefer the one computed using the method specified using this option 0 = first responder, 1 = hash by GFID of file (all clients use same subvolume), 2 = hash by GFID of file and client PID This is particularly useful for benchmark tests, where the system may not have response time data sufficient to properly load balance and I have seen all the clients select the same replica using default value of 0. The value 2 is nice because if many clients are reading the same file, the load is distributed across bricks. -ben ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 75, Issue 25 - striped volume x8, poor sequential read performance
Sergey, cmts inline... Is your intended workload really single-client single-thread?Or is it more MPI-like? For example, do you have many clients reading from different parts of the same large file? If the latter, perhaps IOR would be a better benchmark for you. Sorry I'm not familiar with striping translator. - Original Message - From: gluster-users-requ...@gluster.org To: gluster-users@gluster.org Sent: Tuesday, July 22, 2014 7:21:56 AM Subject: Gluster-users Digest, Vol 75, Issue 25 -- Message: 9 Date: Mon, 21 Jul 2014 21:35:15 +0100 (BST) From: Sergey Koposov kopo...@ast.cam.ac.uk To: gluster-users@gluster.org Subject: [Gluster-users] glusterfs, striped volume x8, poor sequential read performance, good write performance Message-ID: alpine.lrh.2.11.1407212046110.17...@calx115.ast.cam.ac.uk Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Hi, I have a HPC installation with 8 nodes. Each node has a software RAID1 using two NLSAS disks. And the disks from 8 nodes are combined into large shared striped 20Tb glusterfs partition which seems to show abnormally slow sequential read performance, with good write performance. Basically I see is that the write performance is very decent ~ 500Mb/sec (tested using dd): [root@ bigstor]# dd if=/dev/zero of=test2 bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (105 GB) copied, 186.393 s, 563 MB/s And all this is is not just seating in the cache of each node, as I see the data being flushed to disks with approximately right speed. In the same time the read performance is (tested using dd with dropping of the caches beforehand) is really bad: [root@ bigstor]# dd if=/data/bigstor/test of=/dev/null bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (10 GB) copied, 309.821 s, 33.8 MB/s When doing this glusterfs processes only take ~ 10-15% of the CPU max. So it isn't CPU starving. The underlying devices do not seem to be loaded at all: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 0.00 73.000.00 9344.00 0.00 256.00 0.111.48 1.47 10.70 To check that the disks are not the problem I did a separate test of the read-speed of the raided disks on all machines and they have read speads of ~ 180Mb/s (uncached). So they aren't the problem. Gluster has a read-ahead-page-count setting, I'd try setting it up to 16 (as high as it will go), default is 4. Writes are different because the write to a brick can complete before the data hits the disk (in other words, as soon as the data reaches server memory), but with reads if the data is not cached in memory then your only solution is to get all bricks reading at the same time. Contrast this with a single-brick 12-disk RAID6 volume (with 32-MB readahead) that can hit 800 MB/s on read. Clearly it isn't the rest of Gluster that's holding you back, it's probably the stripe translator behavior. Does stripe translator support parallel reads to different subvolumes in the stripe? Can you post a protocol trace that shows the on-the-wire behavior (collect with tcpdump, display with wireshark). You could try running a re-read test without the stripe translator, I suspect it will perform better based on my own experience. I also tried to increase the readahead on the raid disks echo 2048 /sys/block/md126/queue/read_ahead_kb but that doesn't seem to help at all. To prove this, try re-reading a file that fits in Linux buffer cache on servers -- block device readahead is then irrelevant since there is no disk I/O at all. You are then doing a network test with Gluster. Also, try just doing a dd read from the brick (subvolume) directly. Does anyone have any advice what to do here ? What knobs to adjust ? To me it looks like a bug, being honest, but I would be happy if there is magic switch I forgot to turn on ) Second, if you are using IPOIB, try jumbo frame setting of MTU=65520 and MODE=connected (in ifcfg-ib0) to reduce Infiniband interrupts on client side. Try FUSE mount option -o gid-timeout=2 . What is the stripe width of the Gluster volume in KB? Looks like it's the default, I forget what this is but you probably want it to be something like 128 KB x 8. A very large stripe size will prevent Gluster from utilizing 1 brick at the same time. Here is more details about my system OS: Centos 6.5 glusterfs : 3.4.4 Kernel 2.6.32-431.20.3.el6.x86_64 mount options and df output: [root@ bigstor]# cat /etc/mtab /dev/md126p4 /data/glvol/brick1 xfs rw 0 0 node1:/glvol /data/bigstor fuse.glusterfs rw,default_permissions,allow_other,max_read=131072 0 0 [root@ bigstor]# df Filesystem 1K-blocksUsed Available Use% Mounted on /dev/md126p42516284988 2356820844
Re: [Gluster-users] Gluster-users Digest, Vol 59, Issue 15 - GlusterFS performance
- Original Message - From: gluster-users-requ...@gluster.org To: gluster-users@gluster.org Sent: Friday, March 1, 2013 4:03:13 PM Subject: Gluster-users Digest, Vol 59, Issue 15 -- Message: 2 Date: Fri, 01 Mar 2013 10:22:21 -0800 From: Joe Julian j...@julianfamily.org To: gluster-users@gluster.org Subject: Re: [Gluster-users] GlusterFS performance Message-ID: 5130f1dd.9050...@julianfamily.org Content-Type: text/plain; charset=iso-8859-1; Format=flowed The kernel developers introduced a bug into ext4 that has yet to be fixed. If you use xfs you won't have those hangs. On 03/01/2013 01:30 AM, Nikita A Kardashin wrote: Hello again! I am complete rebuild my storage. As base: ext4 over mdadm-raid1 Gluster volume in distributed-replicated mode with settings: Options Reconfigured: performance.cache-size: 1024MB nfs.disable: on performance.write-behind-window-size: 4MB performance.io-thread-count: 64 features.quota: off features.quota-timeout: 1800 performance.io-cache: on performance.write-behind: on performance.flush-behind: on performance.read-ahead: on As result, I got write performance about 80MB/s on dd if=/dev/zero of=testfile.bin bs=100M count=10, Make sure your network and storage bricks are performing as you expect them to, Gluster is only as good as underlying hardware. What happens with reads? What happens when you do multiple threads doing writes? for n in `seq 1 4` ; do eval dd if=/dev/zero of=testfile$n.bin bs=100M count=10 done time wait If I try to execute above command inside virtual machine (KVM), first time all going right - about 900MB/s (cache effect, I think), but if I run this test again on existing file - task (dd) hungs up and can be stopped only by Ctrl+C. In future, post qemu process command line (from ps awux). Are you writing to local file system inside virtual disk image or are you mounting Gluster from inside the VM? If you are going through /dev/vda then are you using KVM qemu cache=writeback? You could try cache=writethrough or cache=none, see comments below for cache=none. Also, try io=threads not io=native. Overall virtual system latency is poor too. For example, apt-get upgrade upgrading system very, very slow, freezing on Unpacking replacement and other io-related steps. If you don't have a fast connection to storage, the Linux VM will buffer write data in the kernel buffer cache until it runs out of memory for that (vm.dirty_ratio), then it will freeze any process that issues the writes.If your VM has a lot of memory relative to storage speed, this can result in very long delays. Try reducing Linux kernel vm.dirty_background_ratio to get writes going sooner and vm.dirty_ratio so that the freezes don't last as long. You can even reduce VM's block device queue depth. But most of all make sure that gluster writes are performing near a typical local block device speed. Does glusterfs have any tuning options, that can help me? If your workload is strictly large-file, try this volume tuning: -- storage.linux-aio: off (default) cluster.eager-lock: enable (default is disabled) network.remote-dio: on (default is off) performance.write-behind-window-size: 1MB (default) for pure single-thread sequential read workload, you can tune read-ahead translator to be more aggressive, this will help single-thread reads, but don't do this for other workloads, such as virtual machine images in the Gluster volume (will appear to Gluster as more of a random I/O workload). performance.read-ahead-page-count: 16 (default is 4 128-KB prefetched buffers) http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ Red Hat Storage distribution will help tune Linux block device for better performance on many workloads. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 53, Issue 56 -- GlusterFS performance (Steve Thompson)
Steve, try glusterfs 3.3 and look at: http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ There will be more optimizations in the next Gluster release. Take advantage of the translators that Gluster supplies, including readahead translator and quick-read translator. Red Hat does offer support for Red Hat Storage based on Gluster, and it has a pre-packaged tuning profile built into it. We test with 10-GbE networks and Gluster 3.3 does have reasonably good performance for large-file sequential workloads (and it's scalable). ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 51, Issue 49
Message: 4 Date: Fri, 27 Jul 2012 15:29:41 -0700 From: Harry Mangalam hjmanga...@gmail.com Subject: [Gluster-users] Change NFS parameters post-start To: gluster-users gluster-users@gluster.org Message-ID: CAEib2OnKfENr8NhVwkvpsw21C5QJmzu_=C9j144p2Gkn7KP=l...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 In trying to convert clients from using the gluster native client to an NFS client, I'm trying to get the gluster volume mounted on a test mount point on the same client that the native client has mounted the volume. The client refuses with the error: mount -t nfs bs1:/gl /mnt/glnfs mount: bs1:/gl failed, reason given by server: No such file or directory Harry, Have you tried: # mount -t nfs -o nfsvers=3,tcp bs1:/gl /mnt/glnfs Also, there is an /etc/sysconfig/nfs file that may let you remove RDMA as a mount option for NFS. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 48, Issue 18 - Horrible Gluster Performance
Philip, What parts of your system perform well? Can you give a specific example of your workload (what you are asking system to do)? If it's a mixture of different workloads that's important too. What version of Gluster and Linux are you using? My suggestions would be a) to reset all your gluster tuning parameters to their default values unless you are sure that they actually improve performance, and b) try to isolate your performance problem to as simple a workload as possible before you try to fix it, and try to determine what workloads DO work well in your configuration. This will make it easier for others to help. c) if latency spikes are the issue, this sounds like it could be related to writes being excessively buffered by Linux kernel and then being flushed all at once, which can block reads. If so, Use iostat -kx /dev/sd? 5 or equivalent to observe. You can throttle back dirty pages in kernel and avoid buffering dirty pages for long periods of time to avoid these spikes. http://community.gluster.org/a/linux-kernel-tuning-for-glusterfs/ provides some suggestions that may be relevant to your problem, my recommendations are in a comment here. Message: 9 Date: Fri, 13 Apr 2012 11:25:58 +0200 From: Philip flip...@googlemail.com Subject: [Gluster-users] Horrible Gluster Performance To: gluster-users@gluster.org Message-ID: CAKDbnM7AsprRBgiXH6aHoAra6N9DV=kx69w1cepfmrgw573...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 I have a small GlusterFS Cluster providing a replicated volume. Each server has 2 SAS disks for the OS and logs and 22 SATA disks for the actual data striped together as a RAID10 using MegaRAID SAS 9280-4i4e with this configuration: http://pastebin.com/2xj4401J Connected to this cluster are a few other servers with the native client running nginx to serve files stored on it in the order of 3-10MB. Right now a storage server has a outgoing bandwith of 300Mbit/s and the busy rate of the raid array is at 30-40%. There are also strange side-effects: Sometimes the io-latency skyrockets and there is no access possible on the raid for 10 seconds. This happens at 300Mbit/s or 1000Mbit/s of outgoing bandwidth. The file system used is xfs and it has been tuned to match the raid stripe size. I've tested all sorts of gluster settings but none seem to have any effect because of that I've reset the volume configuration and it is using the default one. Does anyone have an idea what could be the reason for such a bad performance? 22 Disks in a RAID10 should deliver *way* more throughput. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] RDMA/Ethernet wi ROCEE - failed to modify QP to RTR
Did any RDMA/Ethernet users see this Gluster error? If so do you know what caused it and how to fix? If you haven't seen it, what RPMs and configuration do you use specific to RDMA/Ethernet? [2011-11-10 10:30:20.595801] C [rdma.c:2417:rdma_connect_qp]0-rpc-transport/rdma: Failed to modify QP to RTR [2011-11-10 10:30:20.595930] E [rdma.c:4159:rdma_handshake_pollin] 0-rpc-transport/rdma: rdma.management: failed to connect with remote QP I see this when I run RDMA over Ethernet using ROCEE RPMs, but when I run over Infiniband using RHEL 6.2-, it runs fine. On the same Ethernet configuration, Gluster/TCP runs fine, NFS/RDMA runs fine as does AMQP app. But qperf and rping utilities fail in the same way. Firmware on the HCAs is not the latest, is it worth risk to upgrade? I went into debugger and found line where qperf fails, it's near line 2056 in rdma.c in qperf sources (qperf-debuginfo, I did Makefile) (gdb) 2088} else if (dev-trans == IBV_QPT_RC) { (gdb) 2090flags = IBV_QP_STATE | (gdb) 2097if (ibv_modify_qp(dev-qp, rtr_attr, flags) != 0) (gdb) 2098error(SYS, failed to modify QP to RTR); (gdb) Gluster fails in rdma_connect_qp() calling the same routine, but perhaps with different parameters. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster/RDMA
To Harry Mangalam about Gluster/RDMA: make sure these modules are loaded # modprobe -v rdma_ucm # modprobe -v ib_uverbs # modprobe -v ib_ucm To run the subnet manager # modprobe -v ib_umad Make sure libibverbs and (libmlx4 or libmthca) RPMs are installed. I don't understand why they appropriate modules aren't loaded automatically. Could put something in /etc/modprobe.d/ to make this happen maybe? Infiniband should not require troubleshooting after 5-10 years of development, it should just work. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users