Re: [Gluster-users] gluster client performance

2011-07-27 Thread John Lalande

On 07/27/2011 12:53 AM, Pavan T C wrote:




2. What is the disk bandwidth you are getting on the local filesystem
on a given storage node ? I mean, pick any of the 10 storage servers
dedicated for Gluster Storage and perform a dd as below:

Seeing an average of 740 MB/s write, 971 GB/s read.


I presume you did this in one of the /data-brick*/export directories ?
Command output with the command line would have been clearer, but 
thats fine.

That is correct -- we used /data-brick1/export.




3. What is the IB bandwidth that you are getting between the compute
node and the glusterfs storage node? You can run the tool rdma_bw to
get the details:

30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec
30407: Bandwidth average: 2593.62 MB/sec
30407: Service Demand peak (#0 to #976): 978 cycles/KB
30407: Service Demand Avg : 978 cycles/KB



This looks like a DDR connection. ibv_devinfo -v will tell a better 
story about the line width and speed of your infiniband connection.

QDR should have a much higher bandwidth.
But that still does not explain why you should get as low as 50 MB/s 
for a single stream single client write when the backend can support 
direct IO throughput of more than 700 MB/s.
ibv_devinfo shows 4x for active width and 10 Gbps for active speed. Not 
sure why we're not seeing better bandwidth with rdma_bw -- we'll have to 
troubleshoot that some more -- but I agree, it shouldn't be the limiting 
factor as far the Gluster client speed problems we're seeing.


I'll send you the log files you requested off-list.

John

--



John Lalande
University of Wisconsin-Madison
Space Science  Engineering Center
1225 W. Dayton Street, Room 439, Madison, WI 53706
608-263-2268 / john.lala...@ssec.wisc.edu





smime.p7s
Description: S/MIME Cryptographic Signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster client performance

2011-07-26 Thread John Lalande

Thanks for your help, Pavan!


Hi John,

I would need some more information about your setup to estimate the 
performance you should get with your gluster setup.


1. Can you provide the details of how disks are connected to the 
storage boxes ? Is it via FC ? What raid configuration is it using (if 
at all any) ?
The disks are 2TB near-line SAS direct attached via a PERC H700 
controller (the Dell PowerEdge R515 has 12 3.5 drive bays). They are in 
a RAID6 config, exported as a single volume, that's split into 3 
equal-size partitions (due to ext4's (well, e2fsprogs') 16 TB limit).


2. What is the disk bandwidth you are getting on the local filesystem 
on a given storage node ? I mean, pick any of the 10 storage servers 
dedicated for Gluster Storage and perform a dd as below:

Seeing an average of 740 MB/s write, 971 GB/s read.



3. What is the IB bandwidth that you are getting between the compute 
node and the glusterfs storage node? You can run the tool rdma_bw to 
get the details:

30407: Bandwidth peak (#0 to #976): 2594.58 MB/sec
30407: Bandwidth average: 2593.62 MB/sec
30407: Service Demand peak (#0 to #976): 978 cycles/KB
30407: Service Demand Avg  : 978 cycles/KB


Here's our gluster config:

# gluster volume info data

Volume Name: data
Type: Distribute
Status: Started
Number of Bricks: 30
Transport-type: rdma
Bricks:
Brick1: data-3-1-infiniband.infiniband:/data-brick1/export
Brick2: data-3-3-infiniband.infiniband:/data-brick1/export
Brick3: data-3-5-infiniband.infiniband:/data-brick1/export
Brick4: data-3-7-infiniband.infiniband:/data-brick1/export
Brick5: data-3-9-infiniband.infiniband:/data-brick1/export
Brick6: data-3-11-infiniband.infiniband:/data-brick1/export
Brick7: data-3-13-infiniband.infiniband:/data-brick1/export
Brick8: data-3-15-infiniband.infiniband:/data-brick1/export
Brick9: data-3-17-infiniband.infiniband:/data-brick1/export
Brick10: data-3-19-infiniband.infiniband:/data-brick1/export
Brick11: data-3-1-infiniband.infiniband:/data-brick2/export
Brick12: data-3-3-infiniband.infiniband:/data-brick2/export
Brick13: data-3-5-infiniband.infiniband:/data-brick2/export
Brick14: data-3-7-infiniband.infiniband:/data-brick2/export
Brick15: data-3-9-infiniband.infiniband:/data-brick2/export
Brick16: data-3-11-infiniband.infiniband:/data-brick2/export
Brick17: data-3-13-infiniband.infiniband:/data-brick2/export
Brick18: data-3-15-infiniband.infiniband:/data-brick2/export
Brick19: data-3-17-infiniband.infiniband:/data-brick2/export
Brick20: data-3-19-infiniband.infiniband:/data-brick2/export
Brick21: data-3-1-infiniband.infiniband:/data-brick3/export
Brick22: data-3-3-infiniband.infiniband:/data-brick3/export
Brick23: data-3-5-infiniband.infiniband:/data-brick3/export
Brick24: data-3-7-infiniband.infiniband:/data-brick3/export
Brick25: data-3-9-infiniband.infiniband:/data-brick3/export
Brick26: data-3-11-infiniband.infiniband:/data-brick3/export
Brick27: data-3-13-infiniband.infiniband:/data-brick3/export
Brick28: data-3-15-infiniband.infiniband:/data-brick3/export
Brick29: data-3-17-infiniband.infiniband:/data-brick3/export
Brick30: data-3-19-infiniband.infiniband:/data-brick3/export
Options Reconfigured:
nfs.disable: on

--



John Lalande
University of Wisconsin-Madison
Space Science  Engineering Center
1225 W. Dayton Street, Room 439, Madison, WI 53706
608-263-2268 / john.lala...@ssec.wisc.edu




smime.p7s
Description: S/MIME Cryptographic Signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] gluster client performance

2011-07-25 Thread John Lalande

Hi-

I'm new to Gluster, but am trying to get it set up on a new compute 
cluster we're building. We picked Gluster for one of our cluster file 
systems (we're also using Lustre for fast scratch space), but the 
Gluster performance has been so bad that I think maybe we have a 
configuration problem -- perhaps we're missing a tuning parameter that 
would help, but I can't find anything in the Gluster documentation -- 
all the tuning info I've found seems geared toward Gluster 2.x.


For some background, our compute cluster has 64 compute nodes. The 
gluster storage pool has 10 Dell PowerEdge R515 servers, each with 12 x 
2 TB disks. We have another 16 Dell PowerEdge R515s used as Lustre 
storage servers. The compute and storage nodes are all connected via QDR 
Infiniband. Both Gluster and Lustre are set to use RDMA over Infiniband. 
We are using OFED version 1.5.2-20101219, Gluster 3.2.2 and CentOS 5.5 
on both the compute and storage nodes.


Oddly, it seems like there's some sort of bottleneck on the client side 
-- for example, we're only seeing about 50 MB/s write throughput from a 
single compute node when writing a 10GB file. But, if we run multiple 
simultaneous writes from multiple compute nodes to the same Gluster 
volume, we get 50 MB/s from each compute node. However, running multiple 
writes from the same compute node does not increase throughput. The 
compute nodes have 48 cores and 128 GB RAM, so I don't think the issue 
is with the compute node hardware.


With Lustre, on the same hardware, with the same version of OFED, we're 
seeing write throughput on that same 10 GB file as follows: 476 MB/s 
single stream write from a single compute node and aggregate performance 
of more like 2.4 GB/s if we run simultaneous writes. That leads me to 
believe that we don't have a problem with RDMA, otherwise Lustre, which 
is also using RDMA, should be similarly affected.


We have tried both xfs and ext4 for the backend file system on the 
Gluster storage nodes (we're currently using ext4). We went with 
distributed (not distributed striped) for the Gluster volume -- the 
thought was that if there was a catastrophic failure of one of the 
storage nodes, we'd only lose the data on that node; presumably with 
distributed striped you'd lose any data striped across that volume, 
unless I have misinterpreted the documentation.


So ... what's expected/normal throughput for Gluster over QDR IB to a 
relatively large storage pool (10 servers / 120 disks)? Does anyone have 
suggested tuning tips for improving performance?


Thanks!

John

--



John Lalande
University of Wisconsin-Madison
Space Science  Engineering Center
1225 W. Dayton Street, Room 439, Madison, WI 53706
608-263-2268 / john.lala...@ssec.wisc.edu




smime.p7s
Description: S/MIME Cryptographic Signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users