Looking at some of our existing zfs filesystems, we have a couple with zfs mdts 




One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d plan 
for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.


We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.


A serious issue we have is performance of the zfs arc cache over time. This is 
something we didn’t see in early testing, but with enough use it grinds things 
to a crawl. I believe this may be addressed in the newer version of ZFS, which 
we’re hopefully awaiting.


Another thing we’ve seen, which is mysterious to me is this it appears hat as 
the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. On 
initial testing file creates were about 2500 to 3000 IOPs per second. Follow up 
testing in it’s current state (about half full..) shows them at about 500 IOPs 
now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).


We took a snapshot of the filesystem and sent it to the backup MDS, this time 
with the MDT built on 4 SAS drives in a raid0 - really not for performance so 
much as “extra headroom” if that makes any sense. Testing this the IOPs started 
higher, at maybe 800 or 1000 (this is from memory, I don’t have my data in 
front of me). That initial faster speed could just be writing to 4 spindles I 
suppose, but surprising to me, the performance degraded at a slower rate. It 
took much longer to get painfully slow. It still got there. The performance 
didn’t degrade at the same rate if that makes sense - the same number of writes 
on the smaller/slower mdt degraded the performance more quickly.  My guess is 
that had something to do with the total space available. Who knows. I believe 
restarting lustre (and certainly rebooting) ‘resets the clock’ on the file 
create performance degradation.



For that problem we’re just going to try adding 4 SSD’s, but it’s an ugly 
problem. Also are once again hopeful new zfs version addresses it.


And finally, we’ve got a real concern with snapshot backups of the MDT that my 
colleague posted about - the problem we see manifests in essentially a 
read-only recovered file system, so it’s a concern and not quite terrifying.


All in all, the next lustre file system we bring up (in a couple weeks) we are 
very strongly considering going with ldiskfs for the MDT this time.


Scott














From: Anjana Kar
Sent: ‎Tuesday‎, ‎June‎ ‎3‎, ‎2014 ‎7‎:‎38‎ ‎PM
To: lustre-discuss@lists.lustre.org





Is there a way to set the number of inodes for zfs MDT?

I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0 
manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the 
number of
inodes is getting set to 7 million, which is not enough for a 100TB 
filesystem.

Thanks in advance.

-Anjana Kar
  Pittsburgh Supercomputing Center
  k...@psc.edu
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to