Re: [Gluster-users] gluster client performance
On Wednesday 10 August 2011 12:11 AM, Jesse Stroik wrote: Pavan, Thank you for your help. We wanted to get back to you with our results and observations. I'm cc'ing gluster-users for posterity. We did experiment with enable-trickling-writes. That was one of the translator tunables we wanted to know the precise syntax for so that we could be certain we were disabling it. As hoped, disabling trickling writes improved performance somewhat. We are definitely interested in any other undocumented write-buffer related tunables. We've tested the documented tuning parameters. Performance improved significantly when we switched clients to mainline kernel (2.5.35-13). We also updated to OFED 1.5.3 but it wasn't responsible for the performance improvement. Our findings with 32KB block size (cp) write performance: 250-300MB/sec single stream performance 400MB/sec multiple-stream per client performance Ok. Lets see if we can improve this further. Please use the following tunables as suggested below: For write-behind - option cache-size 16MB For read-ahead - option page-count 16 For io-cache - option cache-size 64MB You will need to place these lines in the client volume file, restart the server and remount the volume on the clients. Your client (fuse) volume file sections will look like below (of course, with change in the volume name) - volume testvol-write-behind type performance/write-behind option cache-size 16MB subvolumes testvol-client-0 end-volume volume testvol-read-ahead type performance/read-ahead option page-count 16 subvolumes testvol-write-behind end-volume volume testvol-io-cache type performance/io-cache option cache-size 64MB subvolumes testvol-read-ahead end-volume Run your copy command with these tunables. For now, lets have the default setting for trickling writes which is 'ENABLED'. You can simply remove this tunable from the volume file to get the default behaviour. Pavan This is much higher than we observed with kernel 2.6.18 series. Using the 2.6.18 line, we also observed virtually no difference between running single stream tests and multi stream tests suggesting a bottleneck with the fabric. Both 2.6.18 and 2.6.35-13 performed very well (about 600MB/sec) when writing 128KB blocks. When I disabled write-behind on the 2.6.18 series of kernels as a test, performance plummeted to a few MB/sec when writing blocks sizes smaller than 128KB. We did not test this extensively. Disabling enable-trickling-writes gave us approximately a 20% boost, reflected in the numbers above, for single-stream writes. We observed no significant difference with several streams per client due to disabling that tunable. For reference, we are running another cluster file system on the same underlying hardware/software. With both the old kernel (2.6.18.x) and the new kernel (2.6.35-13) we get approximately: 450-550MB/sec single stream performance 1200MB+/sec multiple stream per client performance We set the test directory to write entire files to a single LUN which is how we configured gluster in an effort to mitigate differences. It is treacherous to speculate why we might be more limited with gluster over RDMA than the other cluster file system without spending a significant amount of analysis. That said, I wonder if there may be an issue with the way in which fuse handles write buffers causing a bottleneck for RMDA. The bottom line is that our observed performance was poor using the 2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels. Updating to the newer kernels was well worth the testing and downtime. Hopefully this information can help others. Best, Jesse Stroik ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
Pavan, Thank you for your help. We wanted to get back to you with our results and observations. I'm cc'ing gluster-users for posterity. We did experiment with enable-trickling-writes. That was one of the translator tunables we wanted to know the precise syntax for so that we could be certain we were disabling it. As hoped, disabling trickling writes improved performance somewhat. We are definitely interested in any other undocumented write-buffer related tunables. We've tested the documented tuning parameters. Performance improved significantly when we switched clients to mainline kernel (2.5.35-13). We also updated to OFED 1.5.3 but it wasn't responsible for the performance improvement. Our findings with 32KB block size (cp) write performance: 250-300MB/sec single stream performance 400MB/sec multiple-stream per client performance This is much higher than we observed with kernel 2.6.18 series. Using the 2.6.18 line, we also observed virtually no difference between running single stream tests and multi stream tests suggesting a bottleneck with the fabric. Both 2.6.18 and 2.6.35-13 performed very well (about 600MB/sec) when writing 128KB blocks. When I disabled write-behind on the 2.6.18 series of kernels as a test, performance plummeted to a few MB/sec when writing blocks sizes smaller than 128KB. We did not test this extensively. Disabling enable-trickling-writes gave us approximately a 20% boost, reflected in the numbers above, for single-stream writes. We observed no significant difference with several streams per client due to disabling that tunable. For reference, we are running another cluster file system on the same underlying hardware/software. With both the old kernel (2.6.18.x) and the new kernel (2.6.35-13) we get approximately: 450-550MB/sec single stream performance 1200MB+/sec multiple stream per client performance We set the test directory to write entire files to a single LUN which is how we configured gluster in an effort to mitigate differences. It is treacherous to speculate why we might be more limited with gluster over RDMA than the other cluster file system without spending a significant amount of analysis. That said, I wonder if there may be an issue with the way in which fuse handles write buffers causing a bottleneck for RMDA. The bottom line is that our observed performance was poor using the 2.6.18 RHEL 5 kernel line relative to the mainline (2.6.35) kernels. Updating to the newer kernels was well worth the testing and downtime. Hopefully this information can help others. Best, Jesse Stroik ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
On 07/27/2011 12:53 AM, Pavan T C wrote: 2. What is the disk bandwidth you are getting on the local filesystem on a given storage node ? I mean, pick any of the 10 storage servers dedicated for Gluster Storage and perform a dd as below: Seeing an average of 740 MB/s write, 971 GB/s read. I presume you did this in one of the /data-brick*/export directories ? Command output with the command line would have been clearer, but thats fine. That is correct -- we used /data-brick1/export. 3. What is the IB bandwidth that you are getting between the compute node and the glusterfs storage node? You can run the tool "rdma_bw" to get the details: 30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec 30407: Bandwidth average: 2593.62 MB/sec 30407: Service Demand peak (#0 to #976): 978 cycles/KB 30407: Service Demand Avg : 978 cycles/KB This looks like a DDR connection. "ibv_devinfo -v" will tell a better story about the line width and speed of your infiniband connection. QDR should have a much higher bandwidth. But that still does not explain why you should get as low as 50 MB/s for a single stream single client write when the backend can support direct IO throughput of more than 700 MB/s. ibv_devinfo shows 4x for active width and 10 Gbps for active speed. Not sure why we're not seeing better bandwidth with rdma_bw -- we'll have to troubleshoot that some more -- but I agree, it shouldn't be the limiting factor as far the Gluster client speed problems we're seeing. I'll send you the log files you requested off-list. John -- John Lalande University of Wisconsin-Madison Space Science& Engineering Center 1225 W. Dayton Street, Room 439, Madison, WI 53706 608-263-2268 / john.lala...@ssec.wisc.edu smime.p7s Description: S/MIME Cryptographic Signature ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
But that still does not explain why you should get as low as 50 MB/s for a single stream single client write when the backend can support direct IO throughput of more than 700 MB/s. On the server, can you collect: # iostat -xcdh 2 > iostat.log.brickXX for the duration of the dd command ? and # strace -f -o stracelog.server -tt -T -e trace=write,writev -p (again for the duration of the dd command) Hi John, A small change in the request. I hope you have not already spent time on this. The strace command should be: strace -f -o stracelog.server -tt -T -e trace=pwrite -p Thanks, Pavan With the above, I want to measure the delay between the writes coming in from the client. iostat will describe the IO scenario on the server. Once the exercise is done, please attach the iostat.log.brickXX and stracelog.server. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
[..] I don't know why my writes are so slow compared to reads. Let me know if you're able to get better write speeds with the newer version of gluster and any of the configurations (if they apply) that I've posted. It might compel me to upgrade. From your documentation of nfsspeedtest, I see that the reads can happen either via dd or via perl's sysread. I'm not sure if one is better over the other. Secondly - Are you doing direct IO on the backend XFS ? If not, try it with direct IO so that you are not misled by the memory situation in the system at the time of your test. It will give a clearer picture of what your backend is capable of. Your test is such that you write a file and immediately read the same file back. It is possible that a good chunk of it is cached on the backend. After the write, do a flush of the filesystem caches by using: echo 3 > /proc/sys/vm/drop_caches. Sleep for a while. Then do the read. Or as suggested earlier, resort to direct IO while testing the backend FS. Pavan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
On Tuesday 26 July 2011 09:24 PM, John Lalande wrote: Thanks for your help, Pavan! Hi John, I would need some more information about your setup to estimate the performance you should get with your gluster setup. 1. Can you provide the details of how disks are connected to the storage boxes ? Is it via FC ? What raid configuration is it using (if at all any) ? The disks are 2TB near-line SAS direct attached via a PERC H700 controller (the Dell PowerEdge R515 has 12 3.5" drive bays). They are in a RAID6 config, exported as a single volume, that's split into 3 equal-size partitions (due to ext4's (well, e2fsprogs') 16 TB limit). 2. What is the disk bandwidth you are getting on the local filesystem on a given storage node ? I mean, pick any of the 10 storage servers dedicated for Gluster Storage and perform a dd as below: Seeing an average of 740 MB/s write, 971 GB/s read. I presume you did this in one of the /data-brick*/export directories ? Command output with the command line would have been clearer, but thats fine. 3. What is the IB bandwidth that you are getting between the compute node and the glusterfs storage node? You can run the tool "rdma_bw" to get the details: 30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec 30407: Bandwidth average: 2593.62 MB/sec 30407: Service Demand peak (#0 to #976): 978 cycles/KB 30407: Service Demand Avg : 978 cycles/KB This looks like a DDR connection. "ibv_devinfo -v" will tell a better story about the line width and speed of your infiniband connection. QDR should have a much higher bandwidth. But that still does not explain why you should get as low as 50 MB/s for a single stream single client write when the backend can support direct IO throughput of more than 700 MB/s. On the server, can you collect: # iostat -xcdh 2 > iostat.log.brickXX for the duration of the dd command ? and # strace -f -o stracelog.server -tt -T -e trace=write,writev -p (again for the duration of the dd command) With the above, I want to measure the delay between the writes coming in from the client. iostat will describe the IO scenario on the server. Once the exercise is done, please attach the iostat.log.brickXX and stracelog.server. Pavan Here's our gluster config: # gluster volume info data Volume Name: data Type: Distribute Status: Started Number of Bricks: 30 Transport-type: rdma Bricks: Brick1: data-3-1-infiniband.infiniband:/data-brick1/export Brick2: data-3-3-infiniband.infiniband:/data-brick1/export Brick3: data-3-5-infiniband.infiniband:/data-brick1/export Brick4: data-3-7-infiniband.infiniband:/data-brick1/export Brick5: data-3-9-infiniband.infiniband:/data-brick1/export Brick6: data-3-11-infiniband.infiniband:/data-brick1/export Brick7: data-3-13-infiniband.infiniband:/data-brick1/export Brick8: data-3-15-infiniband.infiniband:/data-brick1/export Brick9: data-3-17-infiniband.infiniband:/data-brick1/export Brick10: data-3-19-infiniband.infiniband:/data-brick1/export Brick11: data-3-1-infiniband.infiniband:/data-brick2/export Brick12: data-3-3-infiniband.infiniband:/data-brick2/export Brick13: data-3-5-infiniband.infiniband:/data-brick2/export Brick14: data-3-7-infiniband.infiniband:/data-brick2/export Brick15: data-3-9-infiniband.infiniband:/data-brick2/export Brick16: data-3-11-infiniband.infiniband:/data-brick2/export Brick17: data-3-13-infiniband.infiniband:/data-brick2/export Brick18: data-3-15-infiniband.infiniband:/data-brick2/export Brick19: data-3-17-infiniband.infiniband:/data-brick2/export Brick20: data-3-19-infiniband.infiniband:/data-brick2/export Brick21: data-3-1-infiniband.infiniband:/data-brick3/export Brick22: data-3-3-infiniband.infiniband:/data-brick3/export Brick23: data-3-5-infiniband.infiniband:/data-brick3/export Brick24: data-3-7-infiniband.infiniband:/data-brick3/export Brick25: data-3-9-infiniband.infiniband:/data-brick3/export Brick26: data-3-11-infiniband.infiniband:/data-brick3/export Brick27: data-3-13-infiniband.infiniband:/data-brick3/export Brick28: data-3-15-infiniband.infiniband:/data-brick3/export Brick29: data-3-17-infiniband.infiniband:/data-brick3/export Brick30: data-3-19-infiniband.infiniband:/data-brick3/export Options Reconfigured: nfs.disable: on ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
Thanks for your help, Pavan! Hi John, I would need some more information about your setup to estimate the performance you should get with your gluster setup. 1. Can you provide the details of how disks are connected to the storage boxes ? Is it via FC ? What raid configuration is it using (if at all any) ? The disks are 2TB near-line SAS direct attached via a PERC H700 controller (the Dell PowerEdge R515 has 12 3.5" drive bays). They are in a RAID6 config, exported as a single volume, that's split into 3 equal-size partitions (due to ext4's (well, e2fsprogs') 16 TB limit). 2. What is the disk bandwidth you are getting on the local filesystem on a given storage node ? I mean, pick any of the 10 storage servers dedicated for Gluster Storage and perform a dd as below: Seeing an average of 740 MB/s write, 971 GB/s read. 3. What is the IB bandwidth that you are getting between the compute node and the glusterfs storage node? You can run the tool "rdma_bw" to get the details: 30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec 30407: Bandwidth average: 2593.62 MB/sec 30407: Service Demand peak (#0 to #976): 978 cycles/KB 30407: Service Demand Avg : 978 cycles/KB Here's our gluster config: # gluster volume info data Volume Name: data Type: Distribute Status: Started Number of Bricks: 30 Transport-type: rdma Bricks: Brick1: data-3-1-infiniband.infiniband:/data-brick1/export Brick2: data-3-3-infiniband.infiniband:/data-brick1/export Brick3: data-3-5-infiniband.infiniband:/data-brick1/export Brick4: data-3-7-infiniband.infiniband:/data-brick1/export Brick5: data-3-9-infiniband.infiniband:/data-brick1/export Brick6: data-3-11-infiniband.infiniband:/data-brick1/export Brick7: data-3-13-infiniband.infiniband:/data-brick1/export Brick8: data-3-15-infiniband.infiniband:/data-brick1/export Brick9: data-3-17-infiniband.infiniband:/data-brick1/export Brick10: data-3-19-infiniband.infiniband:/data-brick1/export Brick11: data-3-1-infiniband.infiniband:/data-brick2/export Brick12: data-3-3-infiniband.infiniband:/data-brick2/export Brick13: data-3-5-infiniband.infiniband:/data-brick2/export Brick14: data-3-7-infiniband.infiniband:/data-brick2/export Brick15: data-3-9-infiniband.infiniband:/data-brick2/export Brick16: data-3-11-infiniband.infiniband:/data-brick2/export Brick17: data-3-13-infiniband.infiniband:/data-brick2/export Brick18: data-3-15-infiniband.infiniband:/data-brick2/export Brick19: data-3-17-infiniband.infiniband:/data-brick2/export Brick20: data-3-19-infiniband.infiniband:/data-brick2/export Brick21: data-3-1-infiniband.infiniband:/data-brick3/export Brick22: data-3-3-infiniband.infiniband:/data-brick3/export Brick23: data-3-5-infiniband.infiniband:/data-brick3/export Brick24: data-3-7-infiniband.infiniband:/data-brick3/export Brick25: data-3-9-infiniband.infiniband:/data-brick3/export Brick26: data-3-11-infiniband.infiniband:/data-brick3/export Brick27: data-3-13-infiniband.infiniband:/data-brick3/export Brick28: data-3-15-infiniband.infiniband:/data-brick3/export Brick29: data-3-17-infiniband.infiniband:/data-brick3/export Brick30: data-3-19-infiniband.infiniband:/data-brick3/export Options Reconfigured: nfs.disable: on -- John Lalande University of Wisconsin-Madison Space Science & Engineering Center 1225 W. Dayton Street, Room 439, Madison, WI 53706 608-263-2268 / john.lala...@ssec.wisc.edu smime.p7s Description: S/MIME Cryptographic Signature ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
> 3. What is the IB bandwidth that you are getting between the compute node > and the glusterfs storage node? You can run the tool "rdma_bw" to get the > details: This is what I got on bidirectional : 2638: Bandwidth peak (#0 to #785): 6052.22 MB/sec 2638: Bandwidth average: 6050.02 MB/sec 2638: Service Demand peak (#0 to #785): 364 cycles/KB 2638: Service Demand Avg : 364 cycles/KB ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
On Tuesday 26 July 2011 03:42 AM, John Lalande wrote: Hi- I'm new to Gluster, but am trying to get it set up on a new compute cluster we're building. We picked Gluster for one of our cluster file systems (we're also using Lustre for fast scratch space), but the Gluster performance has been so bad that I think maybe we have a configuration problem -- perhaps we're missing a tuning parameter that would help, but I can't find anything in the Gluster documentation -- all the tuning info I've found seems geared toward Gluster 2.x. For some background, our compute cluster has 64 compute nodes. The gluster storage pool has 10 Dell PowerEdge R515 servers, each with 12 x 2 TB disks. We have another 16 Dell PowerEdge R515s used as Lustre storage servers. The compute and storage nodes are all connected via QDR Infiniband. Both Gluster and Lustre are set to use RDMA over Infiniband. We are using OFED version 1.5.2-20101219, Gluster 3.2.2 and CentOS 5.5 on both the compute and storage nodes. Hi John, I would need some more information about your setup to estimate the performance you should get with your gluster setup. 1. Can you provide the details of how disks are connected to the storage boxes ? Is it via FC ? What raid configuration is it using (if at all any) ? 2. What is the disk bandwidth you are getting on the local filesystem on a given storage node ? I mean, pick any of the 10 storage servers dedicated for Gluster Storage and perform a dd as below: Write bandwidth measurement: dd if=/dev/zero of=/export_directory/10g_file bs=128K count=8 oflag=direct Read bandwidth measurement: dd if=/export_directory/10g_file of=/dev/null bs=128K count=8 iflag=direct [The above command is doing a direct IO of 10GB via your backend FS - ext4/xfs.] 3. What is the IB bandwidth that you are getting between the compute node and the glusterfs storage node? You can run the tool "rdma_bw" to get the details: On the server, run: # rdma_bw -b [ -b measures bi-directional bandwidth] On the compute node, run, # rdma_bw -b [If you have not already installed it, rdma_bw is available via - http://mirror.centos.org/centos/5/os/x86_64/CentOS/perftest-1.2.3-1.el5.x86_64.rpm] Lets start with this, and I will ask for more if necessary. Pavan Oddly, it seems like there's some sort of bottleneck on the client side -- for example, we're only seeing about 50 MB/s write throughput from a single compute node when writing a 10GB file. But, if we run multiple simultaneous writes from multiple compute nodes to the same Gluster volume, we get 50 MB/s from each compute node. However, running multiple writes from the same compute node does not increase throughput. The compute nodes have 48 cores and 128 GB RAM, so I don't think the issue is with the compute node hardware. With Lustre, on the same hardware, with the same version of OFED, we're seeing write throughput on that same 10 GB file as follows: 476 MB/s single stream write from a single compute node and aggregate performance of more like 2.4 GB/s if we run simultaneous writes. That leads me to believe that we don't have a problem with RDMA, otherwise Lustre, which is also using RDMA, should be similarly affected. We have tried both xfs and ext4 for the backend file system on the Gluster storage nodes (we're currently using ext4). We went with distributed (not distributed striped) for the Gluster volume -- the thought was that if there was a catastrophic failure of one of the storage nodes, we'd only lose the data on that node; presumably with distributed striped you'd lose any data striped across that volume, unless I have misinterpreted the documentation. So ... what's expected/normal throughput for Gluster over QDR IB to a relatively large storage pool (10 servers / 120 disks)? Does anyone have suggested tuning tips for improving performance? Thanks! John ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
Hi, Here's our QDR IB gluster setup: http://piranha.structbio.vanderbilt.edu We're still using gluster 3.0 on all our servers and clients as well as CENTOS5.6 kernels and ofed 1.4. To simulate a single stream I use this nfsSpeedTest script I wrote : http://code.google.com/p/nfsspeedtest/ >From a single QDR IB connected client to our /pirstripe directory which is a stripe of the gluster storage servers, this is the performance I get (note use a file size > amount of RAM on client and server systems, 13GB in this case) : 4k block size : 111 pir4:/pirstripe% /sb/admin/scripts/nfsSpeedTest -s 13g -y pir4: Write test (dd): 142.281 MB/s 1138.247 mbps 93.561 seconds pir4: Read test (dd): 274.321 MB/s 2194.570 mbps 48.527 seconds testing from 8k - 128k block size on the dd, best performance was achieved at 64k block sizes: 114 pir4:/pirstripe% /sb/admin/scripts/nfsSpeedTest -s 13g -b 64k -y pir4: Write test (dd): 213.344 MB/s 1706.750 mbps 62.397 seconds pir4: Read test (dd): 955.328 MB/s 7642.620 mbps 13.934 seconds This is to the /pirdist directories which are mounted in distribute mode (file is written to only one of the gluster servers) : 105 pir4:/pirdist% /sb/admin/scripts/nfsSpeedTest -s 13g -y pir4: Write test (dd): 182.410 MB/s 1459.281 mbps 72.978 seconds pir4: Read test (dd): 244.379 MB/s 1955.033 mbps 54.473 seconds 106 pir4:/pirdist% /sb/admin/scripts/nfsSpeedTest -s 13g -y -b 64k pir4: Write test (dd): 204.297 MB/s 1634.375 mbps 65.160 seconds pir4: Read test (dd): 340.427 MB/s 2723.419 mbps 39.104 seconds For reference/control, here's the same test writing straight to the XFS filesystem on one of the gluster storage nodes: [sabujp@gluster1 tmp]$ /sb/admin/scripts/nfsSpeedTest -s 13g -y gluster1: Write test (dd): 398.971 MB/s 3191.770 mbps 33.366 seconds gluster1: Read test (dd): 234.563 MB/s 1876.501 mbps 56.752 seconds [sabujp@gluster1 tmp]$ /sb/admin/scripts/nfsSpeedTest -s 13g -y -b 64k gluster1: Write test (dd): 442.251 MB/s 3538.008 mbps 30.101 seconds gluster1: Read test (dd): 219.708 MB/s 1757.660 mbps 60.590 seconds The read test seems to scale linearly with the # of storage servers (almost 1GB/s!). Interestingly, the /pirdist read test at 64k block size was 120MB/s faster than the read test straight from XFS, however, it could have been that gluster1 was busy and when I read from /pirdist the file was actually being read from one of the other 4 less busy storage nodes. Here's our storage node setup (many of these settings may not apply to v3.2) : volume posix-stripe type storage/posix option directory /export/gluster1/stripe end-volume volume posix-distribute type storage/posix option directory /export/gluster1/distribute end-volume volume locks type features/locks subvolumes posix-stripe end-volume volume locks-dist type features/locks subvolumes posix-distribute end-volume volume iothreads type performance/io-threads option thread-count 16 subvolumes locks end-volume volume iothreads-dist type performance/io-threads option thread-count 16 subvolumes locks-dist end-volume volume server type protocol/server option transport-type ib-verbs option auth.addr.iothreads.allow 10.2.178.* option auth.addr.iothreads-dist.allow 10.2.178.* option auth.addr.locks.allow 10.2.178.* option auth.addr.posix-stripe.allow 10.2.178.* subvolumes iothreads iothreads-dist locks posix-stripe end-volume Here's our stripe client setup : volume client-stripe-1 type protocol/client option transport-type ib-verbs option remote-host gluster1 option remote-subvolume iothreads end-volume volume client-stripe-2 type protocol/client option transport-type ib-verbs option remote-host gluster2 option remote-subvolume iothreads end-volume volume client-stripe-3 type protocol/client option transport-type ib-verbs option remote-host gluster3 option remote-subvolume iothreads end-volume volume client-stripe-4 type protocol/client option transport-type ib-verbs option remote-host gluster4 option remote-subvolume iothreads end-volume volume client-stripe-5 type protocol/client option transport-type ib-verbs option remote-host gluster5 option remote-subvolume iothreads end-volume volume readahead-gluster1 type performance/read-ahead option page-count 4 # 2 is default option force-atime-update off # default is off subvolumes client-stripe-1 end-volume volume readahead-gluster2 type performance/read-ahead option page-count 4 # 2 is default option force-atime-update off # default is off subvolumes client-stripe-2 end-volume volume readahead-gluster3 type performance/read-ahead option page-count 4 # 2 is default option force-atime-update off # default is off subvolumes client-stripe-3 end-volume volume readahead-gluster4 type performance/read-ahead option page-count 4 # 2 is default option force-atime-update off # default is