Re: [Lustre-discuss] How to achieve 20GB/s file system throughput?

2010-07-23 Thread Joe Landman
On 07/23/2010 10:25 PM, henry...@dell.com wrote:
> Hello,
>
> One of my customer want to set up HPC with thousands of compute nodes.
> The parallel file system should have 20GB/s throughput. I am not sure
> whether lustre can make it. How many IO nodes needed to achieve this target?

I hate to say "it depends" but, it does in fact depend upon many things. 
  What type of IO is the customer doing; large block sequential spread 
out over many nodes (parallel IO), or small block random, or a mixture?

It is possible to achieve 20GB/s, and quite a bit more, using Lustre. 
As to whether or not that 20GB/s is meaningful to their code(s), thats a 
different question.  It would be 20GB/s in aggregate, over possibly many 
compute nodes doing IO.

> My assumption is 100 or more IO nodes(rack servers) are needed.

Hmmm ... If you can achieve 500+ MB/s per OST, then you would need about 
40 OSTs.  You can have each OSS handle several OSTs.  There are 
efficiency losses you should be aware of, but 20GB/s using some 
mechanism to measure this, should be possible with a realistic number of 
units.  Don't forget to count efficiency losses in the design.

100 IO nodes ... I presume you mean OSSes?

If your units are slower, then yes, you will need more of them to 
achieve this performance.

You would need to make sure you have a well designed and correctly 
functional Infiniband infrastructure in addition to the other issues. 
We've found that Lustre is ... very sensitive ... to the Infiniband 
implementation.

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to achieve 20GB/s file system throughput?

2010-07-24 Thread Bernd Schubert
On Saturday, July 24, 2010, henry...@dell.com wrote:
> Hello,
> 
> 
> 
> One of my customer want to set up HPC with thousands of compute nodes.
> The parallel file system should have 20GB/s throughput. I am not sure
> whether lustre can make it. How many IO nodes needed to achieve this
> target?
> 
> 
> 
> My assumption is 100 or more IO nodes(rack servers) are needed.
> 

I'm a bit prejudiced, of course, but with DDN storage that would be quite 
simple. With the older DDN S2A 9990, you can get 5GB/s per controller-pair , 
with the newer SFA1 you can get 6.5 to 7GB/s (we are still tuning it) per 
controller pair.
Each controller pair (couplet in DDN terms) usually has 4 servers connected 
and fits into single rack in a 300 drive configuration.
So you can get 20GB/s with 3 or 4 racks and 12 or 16 OSS servers, which is 
much below your 100 IO nodes ;)

Cheers,
Bernd

-- 
Bernd Schubert
DataDirect Networks
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to achieve 20GB/s file system throughput?

2010-07-24 Thread hung-sheng tsao
may be checkout
http://www.terascala.com
provide lustre appliance
it claim rts1000
20TB per enclosure
2GB throughput
regards



On Fri, Jul 23, 2010 at 10:25 PM,  wrote:

>  Hello,
>
>
>
> One of my customer want to set up HPC with thousands of compute nodes. The
> parallel file system should have 20GB/s throughput. I am not sure whether
> lustre can make it. How many IO nodes needed to achieve this target?
>
>
>
> My assumption is 100 or more IO nodes(rack servers) are needed.
>
>
>
> Thanks in advance!
>
> * *
>
> *Henry Xu*,
>
> *S*ystem *C*onsultant,
>
>
>
> ___
> Lustre-discuss mailing list
> Lustre-discuss@lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>


-- 
Hung-Sheng Tsao, Ph.D. 
laot...@gmail.com
http://laotsao.wordpress.com
9734950840
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to achieve 20GB/s file system throughput?

2010-07-24 Thread Joe Landman
Hate to reply to myself ... not an advertisement

On 07/23/2010 10:50 PM, Joe Landman wrote:
> On 07/23/2010 10:25 PM, henry...@dell.com wrote:

[...]

> It is possible to achieve 20GB/s, and quite a bit more, using Lustre.
> As to whether or not that 20GB/s is meaningful to their code(s), thats a
> different question.  It would be 20GB/s in aggregate, over possibly many
> compute nodes doing IO.

I should point out that we have customers with 20GB/s maximum 
theoretical configs (best case scenarios) with our siCluster 
(http://scalableinformatics.com/sicluster), with 8 IO units.  Their 
write patterns and Infiniband configurations don't seem to allow 
achieving this in practice.  Simple benchmark tests (mixtures of llnl 
mpi-io, io-bm, iozone, ...) show sustained results north of 12 GB/s for 
them.

Again, to set expectations, most users codes never utilize storage 
systems very effectively, hence you might design a 20GB/s storage 
system, and the IO being done might not hit much above 500 MB/s for 
single threads.

>> My assumption is 100 or more IO nodes(rack servers) are needed.
> Hmmm ... If you can achieve 500+ MB/s per OST, then you would need about
> 40 OSTs.  You can have each OSS handle several OSTs.  There are
> efficiency losses you should be aware of, but 20GB/s using some
> mechanism to measure this, should be possible with a realistic number of
> units.  Don't forget to count efficiency losses in the design.

We do this in 8 machines (theoretical max performance), and could put 
this in a single rack.  We prefer to break it out among more IO nodes, 
say 16-24 smaller nodes, with 2-3 OSTs per OSS (e.g. IO node).

My comments are to make sure your customer understands the efficiency 
issues, and that simple fortran writes from a single thread aren't going 
to be done at 20GB/s.  That is, not unlike a compute cluster, a storage 
cluster has an aggregate bandwidth, that a single node or reader/writer 
cannot achieve on its own.

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss