Re: [Lustre-discuss] How to achieve 20GB/s file system throughput?

2010-07-24 Thread Bernd Schubert
On Saturday, July 24, 2010, henry...@dell.com wrote:
 Hello,
 
 
 
 One of my customer want to set up HPC with thousands of compute nodes.
 The parallel file system should have 20GB/s throughput. I am not sure
 whether lustre can make it. How many IO nodes needed to achieve this
 target?
 
 
 
 My assumption is 100 or more IO nodes(rack servers) are needed.
 

I'm a bit prejudiced, of course, but with DDN storage that would be quite 
simple. With the older DDN S2A 9990, you can get 5GB/s per controller-pair , 
with the newer SFA1 you can get 6.5 to 7GB/s (we are still tuning it) per 
controller pair.
Each controller pair (couplet in DDN terms) usually has 4 servers connected 
and fits into single rack in a 300 drive configuration.
So you can get 20GB/s with 3 or 4 racks and 12 or 16 OSS servers, which is 
much below your 100 IO nodes ;)

Cheers,
Bernd

-- 
Bernd Schubert
DataDirect Networks
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How to achieve 20GB/s file system throughput?

2010-07-24 Thread Joe Landman
Hate to reply to myself ... not an advertisement

On 07/23/2010 10:50 PM, Joe Landman wrote:
 On 07/23/2010 10:25 PM, henry...@dell.com wrote:

[...]

 It is possible to achieve 20GB/s, and quite a bit more, using Lustre.
 As to whether or not that 20GB/s is meaningful to their code(s), thats a
 different question.  It would be 20GB/s in aggregate, over possibly many
 compute nodes doing IO.

I should point out that we have customers with 20GB/s maximum 
theoretical configs (best case scenarios) with our siCluster 
(http://scalableinformatics.com/sicluster), with 8 IO units.  Their 
write patterns and Infiniband configurations don't seem to allow 
achieving this in practice.  Simple benchmark tests (mixtures of llnl 
mpi-io, io-bm, iozone, ...) show sustained results north of 12 GB/s for 
them.

Again, to set expectations, most users codes never utilize storage 
systems very effectively, hence you might design a 20GB/s storage 
system, and the IO being done might not hit much above 500 MB/s for 
single threads.

 My assumption is 100 or more IO nodes(rack servers) are needed.
 Hmmm ... If you can achieve 500+ MB/s per OST, then you would need about
 40 OSTs.  You can have each OSS handle several OSTs.  There are
 efficiency losses you should be aware of, but 20GB/s using some
 mechanism to measure this, should be possible with a realistic number of
 units.  Don't forget to count efficiency losses in the design.

We do this in 8 machines (theoretical max performance), and could put 
this in a single rack.  We prefer to break it out among more IO nodes, 
say 16-24 smaller nodes, with 2-3 OSTs per OSS (e.g. IO node).

My comments are to make sure your customer understands the efficiency 
issues, and that simple fortran writes from a single thread aren't going 
to be done at 20GB/s.  That is, not unlike a compute cluster, a storage 
cluster has an aggregate bandwidth, that a single node or reader/writer 
cannot achieve on its own.

Regards,

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] panic on jbd:journal_dirty_metadata

2010-07-24 Thread Michael Sternberg
Wojciech,

Thank you very much for your pointer.  Perhaps the fact that the OSTs are 
nearly full contributes(?). I also see higher usage.

In any case, I'll attempt compilation with the patch applied.


With best regards,
Michael


On Jul 22, 2010, at 9:16 , Wojciech Turek wrote:

 Hi Michael,
 
 This looks like the problem we had some time ago after upgrading to 1.8.3
 
   https://bugzilla.lustre.org/show_bug.cgi?id=22889
 
 Best regards
 Wojciech
 
 On 20 July 2010 00:00, Michael Sternberg sternb...@anl.gov wrote:
 Hello,
 
 I use OSSs with external journal partitions and since lustre-1.8.1 about one 
 to two times a week I get frustrating panics on OSSs as follows:
 
:libcfs:cfs_alloc ...
:lvfs:lprocfs_counter_add ...
...
 
RIP [88031e64] :jbd:journal_dirty_metadata+0x7f/0x1e3
  RSP 8101f99c3670
  0Kernel panic - not syncing: Fatal exception
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss