Re: [Gluster-users] gluster client performance
On 07/27/2011 12:53 AM, Pavan T C wrote: 2. What is the disk bandwidth you are getting on the local filesystem on a given storage node ? I mean, pick any of the 10 storage servers dedicated for Gluster Storage and perform a dd as below: Seeing an average of 740 MB/s write, 971 GB/s read. I presume you did this in one of the /data-brick*/export directories ? Command output with the command line would have been clearer, but thats fine. That is correct -- we used /data-brick1/export. 3. What is the IB bandwidth that you are getting between the compute node and the glusterfs storage node? You can run the tool rdma_bw to get the details: 30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec 30407: Bandwidth average: 2593.62 MB/sec 30407: Service Demand peak (#0 to #976): 978 cycles/KB 30407: Service Demand Avg : 978 cycles/KB This looks like a DDR connection. ibv_devinfo -v will tell a better story about the line width and speed of your infiniband connection. QDR should have a much higher bandwidth. But that still does not explain why you should get as low as 50 MB/s for a single stream single client write when the backend can support direct IO throughput of more than 700 MB/s. ibv_devinfo shows 4x for active width and 10 Gbps for active speed. Not sure why we're not seeing better bandwidth with rdma_bw -- we'll have to troubleshoot that some more -- but I agree, it shouldn't be the limiting factor as far the Gluster client speed problems we're seeing. I'll send you the log files you requested off-list. John -- John Lalande University of Wisconsin-Madison Space Science Engineering Center 1225 W. Dayton Street, Room 439, Madison, WI 53706 608-263-2268 / john.lala...@ssec.wisc.edu smime.p7s Description: S/MIME Cryptographic Signature ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster client performance
Thanks for your help, Pavan! Hi John, I would need some more information about your setup to estimate the performance you should get with your gluster setup. 1. Can you provide the details of how disks are connected to the storage boxes ? Is it via FC ? What raid configuration is it using (if at all any) ? The disks are 2TB near-line SAS direct attached via a PERC H700 controller (the Dell PowerEdge R515 has 12 3.5 drive bays). They are in a RAID6 config, exported as a single volume, that's split into 3 equal-size partitions (due to ext4's (well, e2fsprogs') 16 TB limit). 2. What is the disk bandwidth you are getting on the local filesystem on a given storage node ? I mean, pick any of the 10 storage servers dedicated for Gluster Storage and perform a dd as below: Seeing an average of 740 MB/s write, 971 GB/s read. 3. What is the IB bandwidth that you are getting between the compute node and the glusterfs storage node? You can run the tool rdma_bw to get the details: 30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec 30407: Bandwidth average: 2593.62 MB/sec 30407: Service Demand peak (#0 to #976): 978 cycles/KB 30407: Service Demand Avg : 978 cycles/KB Here's our gluster config: # gluster volume info data Volume Name: data Type: Distribute Status: Started Number of Bricks: 30 Transport-type: rdma Bricks: Brick1: data-3-1-infiniband.infiniband:/data-brick1/export Brick2: data-3-3-infiniband.infiniband:/data-brick1/export Brick3: data-3-5-infiniband.infiniband:/data-brick1/export Brick4: data-3-7-infiniband.infiniband:/data-brick1/export Brick5: data-3-9-infiniband.infiniband:/data-brick1/export Brick6: data-3-11-infiniband.infiniband:/data-brick1/export Brick7: data-3-13-infiniband.infiniband:/data-brick1/export Brick8: data-3-15-infiniband.infiniband:/data-brick1/export Brick9: data-3-17-infiniband.infiniband:/data-brick1/export Brick10: data-3-19-infiniband.infiniband:/data-brick1/export Brick11: data-3-1-infiniband.infiniband:/data-brick2/export Brick12: data-3-3-infiniband.infiniband:/data-brick2/export Brick13: data-3-5-infiniband.infiniband:/data-brick2/export Brick14: data-3-7-infiniband.infiniband:/data-brick2/export Brick15: data-3-9-infiniband.infiniband:/data-brick2/export Brick16: data-3-11-infiniband.infiniband:/data-brick2/export Brick17: data-3-13-infiniband.infiniband:/data-brick2/export Brick18: data-3-15-infiniband.infiniband:/data-brick2/export Brick19: data-3-17-infiniband.infiniband:/data-brick2/export Brick20: data-3-19-infiniband.infiniband:/data-brick2/export Brick21: data-3-1-infiniband.infiniband:/data-brick3/export Brick22: data-3-3-infiniband.infiniband:/data-brick3/export Brick23: data-3-5-infiniband.infiniband:/data-brick3/export Brick24: data-3-7-infiniband.infiniband:/data-brick3/export Brick25: data-3-9-infiniband.infiniband:/data-brick3/export Brick26: data-3-11-infiniband.infiniband:/data-brick3/export Brick27: data-3-13-infiniband.infiniband:/data-brick3/export Brick28: data-3-15-infiniband.infiniband:/data-brick3/export Brick29: data-3-17-infiniband.infiniband:/data-brick3/export Brick30: data-3-19-infiniband.infiniband:/data-brick3/export Options Reconfigured: nfs.disable: on -- John Lalande University of Wisconsin-Madison Space Science Engineering Center 1225 W. Dayton Street, Room 439, Madison, WI 53706 608-263-2268 / john.lala...@ssec.wisc.edu smime.p7s Description: S/MIME Cryptographic Signature ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] gluster client performance
Hi- I'm new to Gluster, but am trying to get it set up on a new compute cluster we're building. We picked Gluster for one of our cluster file systems (we're also using Lustre for fast scratch space), but the Gluster performance has been so bad that I think maybe we have a configuration problem -- perhaps we're missing a tuning parameter that would help, but I can't find anything in the Gluster documentation -- all the tuning info I've found seems geared toward Gluster 2.x. For some background, our compute cluster has 64 compute nodes. The gluster storage pool has 10 Dell PowerEdge R515 servers, each with 12 x 2 TB disks. We have another 16 Dell PowerEdge R515s used as Lustre storage servers. The compute and storage nodes are all connected via QDR Infiniband. Both Gluster and Lustre are set to use RDMA over Infiniband. We are using OFED version 1.5.2-20101219, Gluster 3.2.2 and CentOS 5.5 on both the compute and storage nodes. Oddly, it seems like there's some sort of bottleneck on the client side -- for example, we're only seeing about 50 MB/s write throughput from a single compute node when writing a 10GB file. But, if we run multiple simultaneous writes from multiple compute nodes to the same Gluster volume, we get 50 MB/s from each compute node. However, running multiple writes from the same compute node does not increase throughput. The compute nodes have 48 cores and 128 GB RAM, so I don't think the issue is with the compute node hardware. With Lustre, on the same hardware, with the same version of OFED, we're seeing write throughput on that same 10 GB file as follows: 476 MB/s single stream write from a single compute node and aggregate performance of more like 2.4 GB/s if we run simultaneous writes. That leads me to believe that we don't have a problem with RDMA, otherwise Lustre, which is also using RDMA, should be similarly affected. We have tried both xfs and ext4 for the backend file system on the Gluster storage nodes (we're currently using ext4). We went with distributed (not distributed striped) for the Gluster volume -- the thought was that if there was a catastrophic failure of one of the storage nodes, we'd only lose the data on that node; presumably with distributed striped you'd lose any data striped across that volume, unless I have misinterpreted the documentation. So ... what's expected/normal throughput for Gluster over QDR IB to a relatively large storage pool (10 servers / 120 disks)? Does anyone have suggested tuning tips for improving performance? Thanks! John -- John Lalande University of Wisconsin-Madison Space Science Engineering Center 1225 W. Dayton Street, Room 439, Madison, WI 53706 608-263-2268 / john.lala...@ssec.wisc.edu smime.p7s Description: S/MIME Cryptographic Signature ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users