[lustre-discuss] IO500 @ ISC19

2019-04-18 Thread John Bent
Call for Submission

*Deadline*: 10 June 2019 AoE

The IO500 is now accepting and encouraging submissions for the upcoming 4th
IO500 list to be revealed at ISC-HPC 2019 in Frankfurt, Germany. Once
again, we are also accepting submissions to the 10 node I/O challenge to
encourage submission of small scale results. The new ranked lists will be
announced at our ISC19 BoF [2]. We hope to see you, and your results, there.

The benchmark suite is designed to be easy to run and the community has
multiple active support channels to help with any questions. Please submit
and we look forward to seeing many of you at ISC 2019! Please note that
submissions of all size are welcome; the site has customizable sorting so
it is possible to submit on a small system and still get a very good
per-client score for example. Additionally, the list is about much more
than just the raw rank; all submissions help the community by collecting
and publishing a wider corpus of data. More details below.

Following the success of the Top500 in collecting and analyzing historical
trends in supercomputer technology and evolution, the IO500 was created in
2017, published its first list at SC17, and has grown exponentially since
then. The need for such an initiative has long been known within
High-Performance Computing; however, defining appropriate benchmarks had
long been challenging. Despite this challenge, the community, after long
and spirited discussion, finally reached consensus on a suite of benchmarks
and a metric for resolving the scores into a single ranking.

The multi-fold goals of the benchmark suite are as follows:

   1. Maximizing simplicity in running the benchmark suite
   2. Encouraging complexity in tuning for performance
   3. Allowing submitters to highlight their “hero run” performance numbers
   4. Forcing submitters to simultaneously report performance for
   challenging IO patterns.

Specifically, the benchmark suite includes a hero-run of both IOR and
mdtest configured however possible to maximize performance and establish an
upper-bound for performance. It also includes an IOR and mdtest run with
highly prescribed parameters in an attempt to determine a lower-bound.
Finally, it includes a namespace search as this has been determined to be a
highly sought-after feature in HPC storage systems that has historically
not been well-measured. Submitters are encouraged to share their tuning
insights for publication.

The goals of the community are also multi-fold:

   1. Gather historical data for the sake of analysis and to aid
   predictions of storage futures
   2. Collect tuning information to share valuable performance
   optimizations across the community
   3. Encourage vendors and designers to optimize for workloads beyond
   “hero runs”
   4. Establish bounded expectations for users, procurers, and
   administrators

Edit
10 Node I/O Challenge

At ISC, we will announce our second IO-500 award for the 10 Node Challenge.
This challenge is conducted using the regular IO-500 benchmark, however,
with the rule that exactly *10 computes nodes* must be used to run the
benchmark (one exception is find, which may use 1 node). You may use any
shared storage with, e.g., any number of servers. When submitting for the
IO-500 list, you can opt-in for “Participate in the 10 compute node
challenge only”, then we won't include the results into the ranked list.
Other 10 compute node submission will be included in the full list and in
the ranked list. We will announce the result in a separate derived list and
in the full list but not on the ranked IO-500 list at io500.org.
Edit
Birds-of-a-feather

Once again, we encourage you to submit [1], to join our community, and to
attend our BoF “The IO-500 and the Virtual Institute of I/O” at ISC 2019
[2] where we will announce the fourth IO500 list and second 10 node
challenge list. The current list includes results from BeeGPFS, DataWarp,
IME, Lustre, Spectrum Scale, and WekaIO. We hope that the next list has
even more.

We look forward to answering any questions or concerns you might have.

   - [1] http://io500.org/submission
   - [2] The BoF schedule will be announced soon

Edit
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] soft lockup native_queued_spin_lock_slowpath issue

2019-04-18 Thread Bidwell, Matt
In case anyone was looking through this, the recommendation from our vendor was 
to modify lru_max_age instead. Our OST's are right were they recommended (5 
minutes), but our MDT's were off by 60 minutes. -Matt

-Original Message-
From: lustre-discuss  On Behalf Of 
Bidwell, Matt
Sent: Thursday, April 11, 2019 2:53 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] soft lockup native_queued_spin_lock_slowpath issue

I seem to be hitting a known Lustre client issue, as I found several bug 
reports similar to:
https://jira.whamcloud.com/browse/LU-11693. I'm running lustre-client-2.10.5.
Running the proposed temporary solution on test clients lctl set_param 
ldlm.namespaces.*.lru_size=1 appears to fix this issue. 
I'm not really sure the significance of this command. Is it safe to run on 
clients while jobs are running? I'd like to run this now, and then install a 
patched client during our next system time. Thanks. -Matt 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] inodes not adding up

2019-04-18 Thread Andreas Dilger
On Apr 15, 2019, at 12:56, Mohr Jr, Richard Frank (Rick Mohr)  
wrote:
> 
> 
>> On Apr 13, 2019, at 4:57 AM, Youssef Eldakar  
>> wrote:
>> 
>> For one Lustre filesystem, inode count in the summary is notably less than 
>> what the individual OST inode counts would add up to:
> 
> The first thing to understand is that every Lustre file will consume one 
> inode on the MDT, and this inode uses attributes to store information about 
> which OSTs the file is striped over.  Then for each file stripe, there will 
> also be an inode consumed on the corresponding OSTs.  For example, a file 
> with stripe_count=4 will consume one inode on the MDT and four inodes on OSTs 
> (one inode on each OST the file is striped over).
> 
>> # lfs df -i /lfs01
>> UUID  Inodes   IUsed   IFree IUse% Mounted on
>> lustrefs-MDT_UUID  240228761646560885  2355726731   2% 
>> /share/lfs01[MDT:0]
>> lustrefs-OST0001_UUID2411724822883788 1233460  95% 
>> /share/lfs01[OST:1]
>> lustrefs-OST0003_UUID2411724822903308 1213940  95% 
>> /share/lfs01[OST:3]
>> lustrefs-OST0004_UUID2411724822895442 1221806  95% 
>> /share/lfs01[OST:4]
>> lustrefs-OST0006_UUID2411724822890201 1227047  95% 
>> /share/lfs01[OST:6]
>> 
>> filesystem_summary: 5145713846560885 4896253  90% /share/lfs01
> 
> On this file system, there are already 46,560,885 files which also consume 
> the same number of inodes on the MDT (so IUsed=46560885).  However, even 
> though the MDT has over 2 billion inodes free, every file created in the 
> future will use at least one inode on an OST.  If you add up all the free 
> inodes on all the OSTs, you get 4896253.  So at best, there is only space for 
> 4,896,253 more files.  That is why IFree=4896253.  Then, Inodes = IUsed + 
> IFree = 46,560,885 + 4,896,253 = 51,457,138.
> 
>> On another filesystem, this is not the case:
>> 
>> # lfs df -i /lfs02
>> UUID  Inodes   IUsed   IFree IUse% Mounted on
>> lustrefs-MDT_UUID  128850329619222318  1269280978   1% 
>> /share/lfs02[MDT:0]
>> lustrefs-OST0001_UUID24117248 594215618175092  25% 
>> /share/lfs02[OST:1]
>> lustrefs-OST0002_UUID24117248 581646918300779  24% 
>> /share/lfs02[OST:2]
>> lustrefs-OST0003_UUID24117248 598296218134286  25% 
>> /share/lfs02[OST:3]
>> 
>> filesystem_summary: 738324751922231854610157  26% /share/lfs02
> 
> Again, there are already 19,222,318 files on the file system, so 
> IUsed=19222318.   All the OSTs together only have 18,175,092 + 18,300,779 + 
> 18,134,286 = 54,610,157 inodes available, so IFree=54610157.  And Inodes = 
> IUsed + IFree = 73832475.

Thanks to Rick for the good explanation here.  One thing to add is that it 
appears that
the /lfs01 filesystem has a default stripe_count=2, since there are 46560885 
inodes used
on MDT and 91572739 total objects used on the four OSTs, and 
91572739/46560885 = 1.96
OST objects per MDT inode.

If you have a large number of small files, you don't need a high stripe count.

Cheers, Andreas
---
Andreas Dilger
Principal Lustre Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org