It looks like you've already increased arc_meta_limit beyond the default, which 
is c_max / 4. That was critical to performance in our testing.

There is also a patch from Brian that should help performance in your case:
http://review.whamcloud.com/10237

Cheers, Andreas

On Jun 11, 2014, at 12:53, "Scott Nolin" 
<scott.no...@ssec.wisc.edu<mailto:scott.no...@ssec.wisc.edu>> wrote:

We tried a few arc tunables as noted here:

https://jira.hpdd.intel.com/browse/LU-2476

However, I didn't find any clear benefit in the long term. We were just trying 
a few things without a lot of insight.

Scott

On 6/9/2014 12:37 PM, Anjana Kar wrote:
Thanks for all the input.

Before we move away from zfs MDT, I was wondering if we can try setting zfs
tunables to test the performance. Basically what's a value we can use for
arc_meta_limit for our system? Are there are any others settings that can
be changed?

Generating small files on our current system, things started off at 500
files/sec,
then declined so it was about 1/20th of that after 2.45 million files.

-Anjana

On 06/09/2014 10:27 AM, Scott Nolin wrote:
We ran some scrub performance tests, and even without tunables set it
wasn't too bad, for our specific configuration. The main thing we did
was verify it made sense to scrub all OSTs simultaneously.

Anyway, indeed scrub or resilver aren't about Defrag.

Further, the mds performance issues aren't about fragmentation.

A side note, it's probably ideal to stay below 80% due to
fragmentation for ldiskfs too or performance degrades.

Sean, note I am dealing with specific issues for a very create intense
workload, and this is on the mds only where we may change. The data
integrity features of Zfs make it very attractive too. I fully expect
things will improve too with Zfs.

If you want a lot of certainty in your choices, you may want to
consult various vendors if lustre systems.

Scott




On June 8, 2014 11:42:15 AM CDT, "Dilger, Andreas"
<andreas.dil...@intel.com<mailto:andreas.dil...@intel.com>> wrote:

   Scrub and resilver have nothing to so with defrag.

   Scrub is scanning of all the data blocks in the pool to verify their 
checksums and parity to detect silent data corruption, and rewrite the bad 
blocks if necessary.

   Resilver is reconstructing a failed disk onto a new disk using parity or 
mirror copies of all the blocks on the failed disk. This is similar to scrub.

   Both scrub and resilver can be done online, though resilver of course 
requires a spare disk to rebuild onto, which may not be possible to add to a 
running system if your hardware does not support it.

   Both of them do not "improve" the performance or layout of data on disk. 
They do impact performance because they cause a lot if random IO to the disks, 
though this impact can be limited by tunables on the pool.

   Cheers, Andreas

   On Jun 8, 2014, at 4:21, "Sean Brisbane" 
<s.brisba...@physics.ox.ac.uk<mailto:s.brisba...@physics.ox.ac.uk><mailto:s.brisba...@physics.ox.ac.uk>>
 wrote:

   Hi Scott,

   We are considering running zfs backed lustre and the factor of 10ish 
performance hit you see worries me. I know zfs can splurge bits of files all 
over the place by design. The oracle docs do recommend scrubbing the volumes 
and keeping usage below 80% for maintenance and performance reasons, I'm going 
to call it 'defrag' but I'm sure someone who knows better will probably correct 
me as to why it is not the same.
   So are these performance issues after scubbing and is it possible to scrub 
online - I.e. some reasonable level of performance is maintained while the 
scrub happens?
   Resilvering is also recommended. Not sure if that is for performance reasons.

   http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html



   Sent from my HTC Desire C on Three

   ----- Reply message -----
   From: "Scott Nolin" 
<scott.no...@ssec.wisc.edu<mailto:scott.no...@ssec.wisc.edu><mailto:scott.no...@ssec.wisc.edu>>
   To: "Anjana Kar" <k...@psc.edu<mailto:k...@psc.edu><mailto:k...@psc.edu>>, 
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>"
 
<lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>>
   Subject: [Lustre-discuss] number of inodes in zfs MDT
   Date: Fri, Jun 6, 2014 3:23 AM



   Looking at some of our existing zfs filesystems, we have a couple with zfs 
mdts

   One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d 
plan for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

   We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

   A serious issue we have is performance of the zfs arc cache over time. This 
is something we didn’t see in early testing, but with enough use it grinds 
things to a crawl. I believe this may be addressed in the newer version of ZFS, 
which we’re hopefully awaiting.

   Another thing we’ve seen, which is mysterious to me is this it appears hat 
as the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


      1.
   The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. On 
initial testing file creates were about 2500 to 3000 IOPs per second. Follow up 
testing in it’s current state (about half full..) shows them at about 500 IOPs 
now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).
      2.
   We took a snapshot of the filesystem and sent it to the backup MDS, this 
time with the MDT built on 4 SAS drives in a raid0 - really not for performance 
so much as “extra headroom” if that makes any sense. Testing this the IOPs 
started higher, at maybe 800 or 1000 (this is from memory, I don’t have my data 
in front of me). That initial faster speed could just be writing to 4 spindles 
I suppose, but surprising to me, the performance degraded at a slower rate. It 
took much longer to get painfully slow. It still got there. The performance 
didn’t degrade at the same rate if that makes sense - the same number of writes 
on the smaller/slower mdt degraded the performance more quickly.  My guess is 
that had something to do with the total space available. Who knows. I believe 
restarting lustre (and certainly rebooting) ‘resets the clock’ on the file 
create performance degradation.

   For that problem we’re just going to try adding 4 SSD’s, but it’s an ugly 
problem. Also are once again hopeful new zfs version addresses it.

   And finally, we’ve got a real concern with snapshot backups of the MDT that 
my colleague posted about - the problem we see manifests in essentially a 
read-only recovered file system, so it’s a concern and not quite terrifying.

   All in all, the next lustre file system we bring up (in a couple weeks) we 
are very strongly considering going with ldiskfs for the MDT this time.

   Scott








   From: Anjana Kar<mailto:k...@psc.edu>
   Sent: ‎Tuesday‎, ‎June‎ ‎3‎, ‎2014 ‎7‎:‎38‎ ‎PM
   
To:lustre-discuss@lists.lustre.org<mailto:disc...@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>

   Is there a way to set the number of inodes for zfs MDT?

   I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0
   manual, but it
   fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the
   number of
   inodes is getting set to 7 million, which is not enough for a 100TB
   filesystem.

   Thanks in advance.

   -Anjana Kar
      Pittsburgh Supercomputing Center
      k...@psc.edu<mailto:k...@psc.edu><mailto:k...@psc.edu>
   ------------------------------------------------------------------------

   Lustre-discuss mailing list
   
Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org><mailto:Lustre-discuss@lists.lustre.org>
   http://lists.lustre.org/mailman/listinfo/lustre-discuss
   ------------------------------------------------------------------------

   Lustre-discuss mailing list
   
Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org><mailto:Lustre-discuss@lists.lustre.org>
   http://lists.lustre.org/mailman/listinfo/lustre-discuss




_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss



_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to