subject:"\[Lustre\-discuss\] number of inodes in zfs MDT"

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-18 Thread Isaac Huang

On Wed, Jun 18, 2014 at 06:11:33AM -0400, Anjana Kar wrote:
> ..
> Instead we have moved to ldiskfs MDT and zfs OSTs, with the same lustre/zfs
> versions, and have a lot more inodes available.
> 
> FilesystemInodes   IUsed   IFree IUse% Mounted on
> x.x.x.x@o2ib:/iconfs
>  39049920 7455386 31594534   20% /iconfs

Since ZFS doesn't create [iz]nodes statically, in statfs it simply
estimates the free inodes as availbytes >> 9, see zfs_statvfs().

So I'd guess that the difference in reported free inodes was caused by
differences in space efficiency.

I don't know about ldiskfs but two things come to mind about ZFS:
1. By default ZFS stores two copies (a.k.a. ditto blocks) of file
system metadata, in addition to whatever replication the pool
already has. As a result, unless ldiskfs does something similar, ZFS
would be only 50% space efficient as ldiskfs, since the work load was
mostly metadata. An easy way to verify this would be to compare the
reported free space of ZFS and ldiskfs.

2. ZFS might be using a bigger sector size for the disks. Some drives
report 512 bytes for compatibility while they are truly 4K, but ZFS
has a built-in list of devices that lie to override the answer from
the drives. This may also contribute to space efficiency of ZFS.

-Isaac
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-18 Thread Dilger, Andreas

gt;> Scott
>>>
>>> On 6/9/2014 12:37 PM, Anjana Kar wrote:
>>> Thanks for all the input.
>>>
>>> Before we move away from zfs MDT, I was wondering if we can try
>>> setting zfs
>>> tunables to test the performance. Basically what's a value we can use
>>> for
>>> arc_meta_limit for our system? Are there are any others settings that
>>> can
>>> be changed?
>>>
>>> Generating small files on our current system, things started off at 500
>>> files/sec,
>>> then declined so it was about 1/20th of that after 2.45 million files.
>>>
>>> -Anjana
>>>
>>> On 06/09/2014 10:27 AM, Scott Nolin wrote:
>>> We ran some scrub performance tests, and even without tunables set it
>>> wasn't too bad, for our specific configuration. The main thing we did
>>> was verify it made sense to scrub all OSTs simultaneously.
>>>
>>> Anyway, indeed scrub or resilver aren't about Defrag.
>>>
>>> Further, the mds performance issues aren't about fragmentation.
>>>
>>> A side note, it's probably ideal to stay below 80% due to
>>> fragmentation for ldiskfs too or performance degrades.
>>>
>>> Sean, note I am dealing with specific issues for a very create intense
>>> workload, and this is on the mds only where we may change. The data
>>> integrity features of Zfs make it very attractive too. I fully expect
>>> things will improve too with Zfs.
>>>
>>> If you want a lot of certainty in your choices, you may want to
>>> consult various vendors if lustre systems.
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>> On June 8, 2014 11:42:15 AM CDT, "Dilger, Andreas"
>>> mailto:andreas.dil...@intel.com>> wrote:
>>>
>>>   Scrub and resilver have nothing to so with defrag.
>>>
>>>   Scrub is scanning of all the data blocks in the pool to verify
>>> their checksums and parity to detect silent data corruption, and
>>> rewrite the bad blocks if necessary.
>>>
>>>   Resilver is reconstructing a failed disk onto a new disk using
>>> parity or mirror copies of all the blocks on the failed disk. This is
>>> similar to scrub.
>>>
>>>   Both scrub and resilver can be done online, though resilver of
>>> course requires a spare disk to rebuild onto, which may not be
>>> possible to add to a running system if your hardware does not support
>>> it.
>>>
>>>   Both of them do not "improve" the performance or layout of data on
>>> disk. They do impact performance because they cause a lot if random
>>> IO to the disks, though this impact can be limited by tunables on the
>>> pool.
>>>
>>>   Cheers, Andreas
>>>
>>>   On Jun 8, 2014, at 4:21, "Sean Brisbane"
>>> 
>>>mailto:s.brisba...@physics.ox.ac.uk>>>o:s.brisba...@physics.ox.ac.uk>>
>>> wrote:
>>>
>>>   Hi Scott,
>>>
>>>   We are considering running zfs backed lustre and the factor of
>>> 10ish performance hit you see worries me. I know zfs can splurge bits
>>> of files all over the place by design. The oracle docs do recommend
>>> scrubbing the volumes and keeping usage below 80% for maintenance and
>>> performance reasons, I'm going to call it 'defrag' but I'm sure
>>> someone who knows better will probably correct me as to why it is not
>>> the same.
>>>   So are these performance issues after scubbing and is it possible
>>> to scrub online - I.e. some reasonable level of performance is
>>> maintained while the scrub happens?
>>>   Resilvering is also recommended. Not sure if that is for
>>> performance reasons.
>>>
>>> http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html
>>>
>>>
>>>
>>>   Sent from my HTC Desire C on Three
>>>
>>>   - Reply message -
>>>   From: "Scott Nolin"
>>> 
>>>mailto:scott.no...@ssec.wisc.edu><mailto:scot
>>>t.no...@ssec.wisc.edu>>
>>>   To: "Anjana Kar"
>>> mailto:k...@psc.edu><mailto:k...@psc.edu>>,
>>> 
>>>"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
>>><mailto:lustre-discuss@lists.lustre.org>"
>>> 
>>>mailto:lustre-discuss@lists.lustre.org>
>>><mailto:lu

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-18 Thread Anjana Kar

he data blocks in the pool to verify
their checksums and parity to detect silent data corruption, and
rewrite the bad blocks if necessary.

Resilver is reconstructing a failed disk onto a new disk using
parity or mirror copies of all the blocks on the failed disk. This is
similar to scrub.

Both scrub and resilver can be done online, though resilver of
course requires a spare disk to rebuild onto, which may not be
possible to add to a running system if your hardware does not support
it.

Both of them do not "improve" the performance or layout of data on
disk. They do impact performance because they cause a lot if random
IO to the disks, though this impact can be limited by tunables on the
pool.

Cheers, Andreas

On Jun 8, 2014, at 4:21, "Sean Brisbane"
mailto:s.brisba...@physics.ox.ac.uk><mailto:s.brisba...@physics.ox.ac.uk>>
wrote:

Hi Scott,

We are considering running zfs backed lustre and the factor of
10ish performance hit you see worries me. I know zfs can splurge bits
of files all over the place by design. The oracle docs do recommend
scrubbing the volumes and keeping usage below 80% for maintenance and
performance reasons, I'm going to call it 'defrag' but I'm sure
someone who knows better will probably correct me as to why it is not
the same.
So are these performance issues after scubbing and is it possible
to scrub online - I.e. some reasonable level of performance is
maintained while the scrub happens?
Resilvering is also recommended. Not sure if that is for
performance reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html

Sent from my HTC Desire C on Three

- Reply message -
From: "Scott Nolin"
mailto:scott.no...@ssec.wisc.edu><mailto:scott.no...@ssec.wisc.edu>>
To: "Anjana Kar"
mailto:k...@psc.edu><mailto:k...@psc.edu>>,
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>"
mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>>

Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM

Looking at some of our existing zfs filesystems, we have a couple
with zfs mdts

One has 103M inodes and uses 152G of MDT space, another 12M and
19G. I’d plan for less than that I guess as Mr. Dilger suggests. It
all depends on your expected average file size and number of files
for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I
believe mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over
time. This is something we didn’t see in early testing, but with
enough use it grinds things to a crawl. I believe this may be
addressed in the newer version of ZFS, which we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it
appears hat as the MDT begins to fill up file create rates go down.
We don’t really have a strong handle on this (not enough for a bug
report I think), but we see this:

1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives
raid10. On initial testing file creates were about 2500 to 3000 IOPs
per second. Follow up testing in it’s current state (about half
full..) shows them at about 500 IOPs now, but with a few iterations
of mdtest those IOPs plummet quickly to unbearable levels (like 30…).

2.
We took a snapshot of the filesystem and sent it to the backup MDS,
this time with the MDT built on 4 SAS drives in a raid0 - really not
for performance so much as “extra headroom” if that makes any sense.
Testing this the IOPs started higher, at maybe 800 or 1000 (this is
from memory, I don’t have my data in front of me). That initial
faster speed could just be writing to 4 spindles I suppose, but
surprising to me, the performance degraded at a slower rate. It took
much longer to get painfully slow. It still got there. The
performance didn’t degrade at the same rate if that makes sense - the
same number of writes on the smaller/slower mdt degraded the
performance more quickly. My guess is that had something to do with
the total space available. Who knows. I believe restarting lustre
(and certainly rebooting) ‘resets the clock’ on the file create
performance degradation.

For that problem we’re just going to try adding 4 SSD’s, but it’s
an ugly problem. Also are once again hopeful new zfs version
addresses it.

And finally, we’ve got a real concern with snapshot backups of the
MDT that my colleague posted about - the problem we see manifests in
essentially a read-only recovered file system, so it’s a concern and
not quite terrifying.

All in all, the next lustre file system we bring up (in a couple
weeks) we are very strongly considering going with ldisk

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-17 Thread Isaac Huang

On Thu, Jun 12, 2014 at 04:41:14PM +, Dilger, Andreas wrote:
> It looks like you've already increased arc_meta_limit beyond the default, 
> which is c_max / 4. That was critical to performance in our testing.
> 
> There is also a patch from Brian that should help performance in your case:
> http://review.whamcloud.com/10237

My understanding is that, without the patch above, increasing
arc_meta_limit would only delay the point when things begin to go
south.

-Isaac
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-16 Thread Scott Nolin

We ran some scrub performance tests,  and even without tunables set it wasn't 
too bad,  for our specific configuration.  The main thing we did was verify it 
made sense to scrub all OSTs simultaneously. 

Anyway,  indeed scrub or resilver aren't about Defrag. 

Further, the mds performance issues aren't about fragmentation. 

 A side note,  it's probably ideal to stay below 80% due to fragmentation for 
ldiskfs too or performance degrades. 

Sean,  note I am dealing with specific issues for a very create intense 
workload,  and this is on the mds only where we may change. The data integrity 
features of Zfs make it very attractive too. I fully expect things will improve 
too with Zfs. 

If you want a lot of certainty in your choices,  you may want to consult 
various vendors if lustre systems. 

Scott 




On June 8, 2014 11:42:15 AM CDT, "Dilger, Andreas"  
wrote:
>Scrub and resilver have nothing to so with defrag.
>
>Scrub is scanning of all the data blocks in the pool to verify their
>checksums and parity to detect silent data corruption, and rewrite the
>bad blocks if necessary.
>
>Resilver is reconstructing a failed disk onto a new disk using parity
>or mirror copies of all the blocks on the failed disk. This is similar
>to scrub.
>
>Both scrub and resilver can be done online, though resilver of course
>requires a spare disk to rebuild onto, which may not be possible to add
>to a running system if your hardware does not support it.
>
>Both of them do not "improve" the performance or layout of data on
>disk. They do impact performance because they cause a lot if random IO
>to the disks, though this impact can be limited by tunables on the
>pool.
>
>Cheers, Andreas
>
>On Jun 8, 2014, at 4:21, "Sean Brisbane"
>mailto:s.brisba...@physics.ox.ac.uk>>
>wrote:
>
>Hi Scott,
>
>We are considering running zfs backed lustre and the factor of 10ish
>performance hit you see worries me. I know zfs can splurge bits of
>files all over the place by design. The oracle docs do recommend
>scrubbing the volumes and keeping usage below 80% for maintenance and
>performance reasons, I'm going to call it 'defrag' but I'm sure someone
>who knows better will probably correct me as to why it is not the same.
>So are these performance issues after scubbing and is it possible to
>scrub online - I.e. some reasonable level of performance is maintained
>while the scrub happens?
>Resilvering is also recommended. Not sure if that is for performance
>reasons.
>
>http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html
>
>
>
>Sent from my HTC Desire C on Three
>
>- Reply message -
>From: "Scott Nolin"
>mailto:scott.no...@ssec.wisc.edu>>
>To: "Anjana Kar" mailto:k...@psc.edu>>,
>"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>"
>mailto:lustre-discuss@lists.lustre.org>>
>Subject: [Lustre-discuss] number of inodes in zfs MDT
>Date: Fri, Jun 6, 2014 3:23 AM
>
>
>
>Looking at some of our existing zfs filesystems, we have a couple with
>zfs mdts
>
>One has 103M inodes and uses 152G of MDT space, another 12M and 19G.
>I’d plan for less than that I guess as Mr. Dilger suggests. It all
>depends on your expected average file size and number of files for what
>will work.
>
>We have run into some unpleasant surprises with zfs for the MDT, I
>believe mostly documented in bug reports, or at least hinted at.
>
>A serious issue we have is performance of the zfs arc cache over time.
>This is something we didn’t see in early testing, but with enough use
>it grinds things to a crawl. I believe this may be addressed in the
>newer version of ZFS, which we’re hopefully awaiting.
>
>Another thing we’ve seen, which is mysterious to me is this it appears
>hat as the MDT begins to fill up file create rates go down. We don’t
>really have a strong handle on this (not enough for a bug report I
>think), but we see this:
>
>
>  1.
>The aforementioned 104M inode / 152GB MDT system has 4 SAS drives
>raid10. On initial testing file creates were about 2500 to 3000 IOPs
>per second. Follow up testing in it’s current state (about half full..)
>shows them at about 500 IOPs now, but with a few iterations of mdtest
>those IOPs plummet quickly to unbearable levels (like 30…).
>  2.
>We took a snapshot of the filesystem and sent it to the backup MDS,
>this time with the MDT built on 4 SAS drives in a raid0 - really not
>for performance so much as “extra headroom” if that makes any sense.
>Testing this the IOPs started higher, at maybe 800 or 1000 (this is
>from memory, I don’t have my data in front of me). That initial faster
>speed could

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-12 Thread Scott Nolin

t;<mailto:scott.no...@ssec.wisc.edu>>
To: "Anjana Kar"
mailto:k...@psc.edu><mailto:k...@psc.edu>>,
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>"
mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>>

Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM

Looking at some of our existing zfs filesystems, we
have a couple with zfs mdts

One has 103M inodes and uses 152G of MDT space,
another 12M and 19G. I’d plan for less than that I guess
as Mr. Dilger suggests. It all depends on your expected
average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs
for the MDT, I believe mostly documented in bug reports,
or at least hinted at.

A serious issue we have is performance of the zfs arc
cache over time. This is something we didn’t see in early
testing, but with enough use it grinds things to a crawl.
I believe this may be addressed in the newer version of
ZFS, which we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is
this it appears hat as the MDT begins to fill up file
create rates go down. We don’t really have a strong
handle on this (not enough for a bug report I think), but
we see this:

1.
The aforementioned 104M inode / 152GB MDT system has 4
SAS drives raid10. On initial testing file creates were
about 2500 to 3000 IOPs per second. Follow up testing in
it’s current state (about half full..) shows them at
about 500 IOPs now, but with a few iterations of mdtest
those IOPs plummet quickly to unbearable levels (like
30…).

2.
We took a snapshot of the filesystem and sent it to
the backup MDS, this time with the MDT built on 4 SAS
drives in a raid0 - really not for performance so much as
“extra headroom” if that makes any sense. Testing this
the IOPs started higher, at maybe 800 or 1000 (this is
from memory, I don’t have my data in front of me). That
initial faster speed could just be writing to 4 spindles
I suppose, but surprising to me, the performance degraded
at a slower rate. It took much longer to get painfully
slow. It still got there. The performance didn’t degrade
at the same rate if that makes sense - the same number of
writes on the smaller/slower mdt degraded the performance
more quickly. My guess is that had something to do with
the total space available. Who knows. I believe
restarting lustre (and certainly rebooting) ‘resets the
clock’ on the file create performance degradation.

For that problem we’re just going to try adding 4
SSD’s, but it’s an ugly problem. Also are once again
hopeful new zfs version addresses it.

And finally, we’ve got a real concern with snapshot
backups of the MDT that my colleague posted about - the
problem we see manifests in essentially a read-only
recovered file system, so it’s a concern and not quite
terrifying.

All in all, the next lustre file system we bring up
(in a couple weeks) we are very strongly considering
going with ldiskfs for the MDT this time.

Scott

From: Anjana Kar<mailto:k...@psc.edu>
Sent: ‎Tuesday‎, ‎June‎ ‎3‎, ‎2014 ‎7‎:‎38‎ ‎PM

To:lustre-discuss@lists.lustre.org<mailto:disc...@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>

Is there a way to set the number of inodes for zfs
MDT?

I've tried using --mkfsoptions="-N value" mentioned in
lustre 2.0

manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for
the MDT, but the

number of
inodes is getting set to 7 million, which is not
enough for a 100TB

filesystem.

Thanks in advance.

-Anjana Kar
Pittsburgh Supercomputing Center
k...@psc.edu<mailto:k...@psc.edu><mailto:k...@psc.edu>

Lustre-discuss mailing list

Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org><mailto:Lustre-discuss@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre-discuss mailing list

Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org><mailto:Lustre-discuss@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org<mailto:Lustre-discuss@lists.lustre.org>
http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-12 Thread Dilger, Andreas

It looks like you've already increased arc_meta_limit beyond the default, which
is c_max / 4. That was critical to performance in our testing.

There is also a patch from Brian that should help performance in your case:
http://review.whamcloud.com/10237

Cheers, Andreas

On Jun 11, 2014, at 12:53, "Scott Nolin"
mailto:scott.no...@ssec.wisc.edu>> wrote:

We tried a few arc tunables as noted here:

https://jira.hpdd.intel.com/browse/LU-2476

However, I didn't find any clear benefit in the long term. We were just trying
a few things without a lot of insight.

Scott

On 6/9/2014 12:37 PM, Anjana Kar wrote:
Thanks for all the input.

Before we move away from zfs MDT, I was wondering if we can try setting zfs
tunables to test the performance. Basically what's a value we can use for
arc_meta_limit for our system? Are there are any others settings that can
be changed?

Generating small files on our current system, things started off at 500
files/sec,
then declined so it was about 1/20th of that after 2.45 million files.

-Anjana

On 06/09/2014 10:27 AM, Scott Nolin wrote:
We ran some scrub performance tests, and even without tunables set it
wasn't too bad, for our specific configuration. The main thing we did
was verify it made sense to scrub all OSTs simultaneously.

Anyway, indeed scrub or resilver aren't about Defrag.

Further, the mds performance issues aren't about fragmentation.

A side note, it's probably ideal to stay below 80% due to
fragmentation for ldiskfs too or performance degrades.

Sean, note I am dealing with specific issues for a very create intense
workload, and this is on the mds only where we may change. The data
integrity features of Zfs make it very attractive too. I fully expect
things will improve too with Zfs.

If you want a lot of certainty in your choices, you may want to
consult various vendors if lustre systems.

Scott

On June 8, 2014 11:42:15 AM CDT, "Dilger, Andreas"
mailto:andreas.dil...@intel.com>> wrote:

Scrub and resilver have nothing to so with defrag.

Scrub is scanning of all the data blocks in the pool to verify their
checksums and parity to detect silent data corruption, and rewrite the bad
blocks if necessary.

Resilver is reconstructing a failed disk onto a new disk using parity or
mirror copies of all the blocks on the failed disk. This is similar to scrub.

Both scrub and resilver can be done online, though resilver of course
requires a spare disk to rebuild onto, which may not be possible to add to a
running system if your hardware does not support it.

Both of them do not "improve" the performance or layout of data on disk.
They do impact performance because they cause a lot if random IO to the disks,
though this impact can be limited by tunables on the pool.

Cheers, Andreas

On Jun 8, 2014, at 4:21, "Sean Brisbane"
mailto:s.brisba...@physics.ox.ac.uk><mailto:s.brisba...@physics.ox.ac.uk>>
wrote:

Hi Scott,

We are considering running zfs backed lustre and the factor of 10ish
performance hit you see worries me. I know zfs can splurge bits of files all
over the place by design. The oracle docs do recommend scrubbing the volumes
and keeping usage below 80% for maintenance and performance reasons, I'm going
to call it 'defrag' but I'm sure someone who knows better will probably correct
me as to why it is not the same.
So are these performance issues after scubbing and is it possible to scrub
online - I.e. some reasonable level of performance is maintained while the
scrub happens?
Resilvering is also recommended. Not sure if that is for performance reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html

Sent from my HTC Desire C on Three

- Reply message -
From: "Scott Nolin"
mailto:scott.no...@ssec.wisc.edu><mailto:scott.no...@ssec.wisc.edu>>
To: "Anjana Kar" mailto:k...@psc.edu><mailto:k...@psc.edu>>,
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>"

mailto:lustre-discuss@lists.lustre.org><mailto:lustre-discuss@lists.lustre.org>>
Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM

Looking at some of our existing zfs filesystems, we have a couple with zfs
mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d
plan for less than that I guess as Mr. Dilger suggests. It all depends on your
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This
is something we didn’t see in early testing, but with enough use i

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-11 Thread Scott Nolin


We tried a few arc tunables as noted here:

https://jira.hpdd.intel.com/browse/LU-2476

However, I didn't find any clear benefit in the long term. We were just 
trying a few things without a lot of insight.


Scott

On 6/9/2014 12:37 PM, Anjana Kar wrote:

Thanks for all the input.

Before we move away from zfs MDT, I was wondering if we can try setting zfs
tunables to test the performance. Basically what's a value we can use for
arc_meta_limit for our system? Are there are any others settings that can
be changed?

Generating small files on our current system, things started off at 500
files/sec,
then declined so it was about 1/20th of that after 2.45 million files.

-Anjana

On 06/09/2014 10:27 AM, Scott Nolin wrote:

We ran some scrub performance tests, and even without tunables set it
wasn't too bad, for our specific configuration. The main thing we did
was verify it made sense to scrub all OSTs simultaneously.

Anyway, indeed scrub or resilver aren't about Defrag.

Further, the mds performance issues aren't about fragmentation.

A side note, it's probably ideal to stay below 80% due to
fragmentation for ldiskfs too or performance degrades.

Sean, note I am dealing with specific issues for a very create intense
workload, and this is on the mds only where we may change. The data
integrity features of Zfs make it very attractive too. I fully expect
things will improve too with Zfs.

If you want a lot of certainty in your choices, you may want to
consult various vendors if lustre systems.

Scott




On June 8, 2014 11:42:15 AM CDT, "Dilger, Andreas"
 wrote:

Scrub and resilver have nothing to so with defrag.

Scrub is scanning of all the data blocks in the pool to verify their 
checksums and parity to detect silent data corruption, and rewrite the bad 
blocks if necessary.

Resilver is reconstructing a failed disk onto a new disk using parity or 
mirror copies of all the blocks on the failed disk. This is similar to scrub.

Both scrub and resilver can be done online, though resilver of course 
requires a spare disk to rebuild onto, which may not be possible to add to a 
running system if your hardware does not support it.

Both of them do not "improve" the performance or layout of data on disk. 
They do impact performance because they cause a lot if random IO to the disks, though 
this impact can be limited by tunables on the pool.

Cheers, Andreas

On Jun 8, 2014, at 4:21, "Sean Brisbane" 
mailto:s.brisba...@physics.ox.ac.uk>> wrote:

Hi Scott,

We are considering running zfs backed lustre and the factor of 10ish 
performance hit you see worries me. I know zfs can splurge bits of files all 
over the place by design. The oracle docs do recommend scrubbing the volumes 
and keeping usage below 80% for maintenance and performance reasons, I'm going 
to call it 'defrag' but I'm sure someone who knows better will probably correct 
me as to why it is not the same.
So are these performance issues after scubbing and is it possible to scrub 
online - I.e. some reasonable level of performance is maintained while the 
scrub happens?
Resilvering is also recommended. Not sure if that is for performance 
reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html



Sent from my HTC Desire C on Three

- Reply message -
From: "Scott Nolin" 
mailto:scott.no...@ssec.wisc.edu>>
To: "Anjana Kar" mailto:k...@psc.edu>>, 
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM



Looking at some of our existing zfs filesystems, we have a couple with zfs 
mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d 
plan for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This 
is something we didn’t see in early testing, but with enough use it grinds 
things to a crawl. I believe this may be addressed in the newer version of ZFS, 
which we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it appears hat 
as the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


   1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. 
On initial testing file creates were about 2500 to 3000 IOPs per second. Follow 
up testing in it’s current state (about half full..) shows th

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-09 Thread Anjana Kar


Thanks for all the input.

Before we move away from zfs MDT, I was wondering if we can try setting zfs
tunables to test the performance. Basically what's a value we can use for
arc_meta_limit for our system? Are there are any others settings that can
be changed?

Generating small files on our current system, things started off at 500 
files/sec,

then declined so it was about 1/20th of that after 2.45 million files.

-Anjana

On 06/09/2014 10:27 AM, Scott Nolin wrote:
We ran some scrub performance tests, and even without tunables set it 
wasn't too bad, for our specific configuration. The main thing we did 
was verify it made sense to scrub all OSTs simultaneously.


Anyway, indeed scrub or resilver aren't about Defrag.

Further, the mds performance issues aren't about fragmentation.

A side note, it's probably ideal to stay below 80% due to 
fragmentation for ldiskfs too or performance degrades.


Sean, note I am dealing with specific issues for a very create intense 
workload, and this is on the mds only where we may change. The data 
integrity features of Zfs make it very attractive too. I fully expect 
things will improve too with Zfs.


If you want a lot of certainty in your choices, you may want to 
consult various vendors if lustre systems.


Scott




On June 8, 2014 11:42:15 AM CDT, "Dilger, Andreas" 
 wrote:


Scrub and resilver have nothing to so with defrag.

Scrub is scanning of all the data blocks in the pool to verify their 
checksums and parity to detect silent data corruption, and rewrite the bad 
blocks if necessary.

Resilver is reconstructing a failed disk onto a new disk using parity or 
mirror copies of all the blocks on the failed disk. This is similar to scrub.

Both scrub and resilver can be done online, though resilver of course 
requires a spare disk to rebuild onto, which may not be possible to add to a 
running system if your hardware does not support it.

Both of them do not "improve" the performance or layout of data on disk. 
They do impact performance because they cause a lot if random IO to the disks, though 
this impact can be limited by tunables on the pool.

Cheers, Andreas

On Jun 8, 2014, at 4:21, "Sean Brisbane" 
mailto:s.brisba...@physics.ox.ac.uk>> wrote:

Hi Scott,

We are considering running zfs backed lustre and the factor of 10ish 
performance hit you see worries me. I know zfs can splurge bits of files all 
over the place by design. The oracle docs do recommend scrubbing the volumes 
and keeping usage below 80% for maintenance and performance reasons, I'm going 
to call it 'defrag' but I'm sure someone who knows better will probably correct 
me as to why it is not the same.
So are these performance issues after scubbing and is it possible to scrub 
online - I.e. some reasonable level of performance is maintained while the 
scrub happens?
Resilvering is also recommended. Not sure if that is for performance 
reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html



Sent from my HTC Desire C on Three

- Reply message -
From: "Scott Nolin" 
mailto:scott.no...@ssec.wisc.edu>>
To: "Anjana Kar" mailto:k...@psc.edu>>, 
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM



Looking at some of our existing zfs filesystems, we have a couple with zfs 
mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d 
plan for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This 
is something we didn’t see in early testing, but with enough use it grinds 
things to a crawl. I believe this may be addressed in the newer version of ZFS, 
which we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it appears hat 
as the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


   1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. 
On initial testing file creates were about 2500 to 3000 IOPs per second. Follow 
up testing in it’s current state (about half full..) shows them at about 500 
IOPs now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).
   2.
We took a snapshot of the filesystem and sent it to the backup MDS, this 
time with the MDT built on 4 SAS dri

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-08 Thread Dilger, Andreas

Scrub and resilver have nothing to so with defrag.

Scrub is scanning of all the data blocks in the pool to verify their checksums 
and parity to detect silent data corruption, and rewrite the bad blocks if 
necessary.

Resilver is reconstructing a failed disk onto a new disk using parity or mirror 
copies of all the blocks on the failed disk. This is similar to scrub.

Both scrub and resilver can be done online, though resilver of course requires 
a spare disk to rebuild onto, which may not be possible to add to a running 
system if your hardware does not support it.

Both of them do not "improve" the performance or layout of data on disk. They 
do impact performance because they cause a lot if random IO to the disks, 
though this impact can be limited by tunables on the pool.

Cheers, Andreas

On Jun 8, 2014, at 4:21, "Sean Brisbane" 
mailto:s.brisba...@physics.ox.ac.uk>> wrote:

Hi Scott,

We are considering running zfs backed lustre and the factor of 10ish 
performance hit you see worries me. I know zfs can splurge bits of files all 
over the place by design. The oracle docs do recommend scrubbing the volumes 
and keeping usage below 80% for maintenance and performance reasons, I'm going 
to call it 'defrag' but I'm sure someone who knows better will probably correct 
me as to why it is not the same.
So are these performance issues after scubbing and is it possible to scrub 
online - I.e. some reasonable level of performance is maintained while the 
scrub happens?
Resilvering is also recommended. Not sure if that is for performance reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html



Sent from my HTC Desire C on Three

- Reply message -
From: "Scott Nolin" 
mailto:scott.no...@ssec.wisc.edu>>
To: "Anjana Kar" mailto:k...@psc.edu>>, 
"lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM



Looking at some of our existing zfs filesystems, we have a couple with zfs mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d plan 
for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This is 
something we didn’t see in early testing, but with enough use it grinds things 
to a crawl. I believe this may be addressed in the newer version of ZFS, which 
we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it appears hat as 
the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


  1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. On 
initial testing file creates were about 2500 to 3000 IOPs per second. Follow up 
testing in it’s current state (about half full..) shows them at about 500 IOPs 
now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).
  2.
We took a snapshot of the filesystem and sent it to the backup MDS, this time 
with the MDT built on 4 SAS drives in a raid0 - really not for performance so 
much as “extra headroom” if that makes any sense. Testing this the IOPs started 
higher, at maybe 800 or 1000 (this is from memory, I don’t have my data in 
front of me). That initial faster speed could just be writing to 4 spindles I 
suppose, but surprising to me, the performance degraded at a slower rate. It 
took much longer to get painfully slow. It still got there. The performance 
didn’t degrade at the same rate if that makes sense - the same number of writes 
on the smaller/slower mdt degraded the performance more quickly.  My guess is 
that had something to do with the total space available. Who knows. I believe 
restarting lustre (and certainly rebooting) ‘resets the clock’ on the file 
create performance degradation.

For that problem we’re just going to try adding 4 SSD’s, but it’s an ugly 
problem. Also are once again hopeful new zfs version addresses it.

And finally, we’ve got a real concern with snapshot backups of the MDT that my 
colleague posted about - the problem we see manifests in essentially a 
read-only recovered file system, so it’s a concern and not quite terrifying.

All in all, the next lustre file system we bring up (in a couple weeks) we are 
very strongly considering going with ldiskfs for the MDT this time.

Scott








From: Anjana Kar<mailto:k...@psc.edu>
Sent: ‎Tuesday‎, ‎June‎ ‎3‎, ‎2014 ‎7‎:‎38‎ ‎PM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.l

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-08 Thread Sean Brisbane

Hi Scott,

We are considering running zfs backed lustre and the factor of 10ish 
performance hit you see worries me. I know zfs can splurge bits of files all 
over the place by design. The oracle docs do recommend scrubbing the volumes 
and keeping usage below 80% for maintenance and performance reasons, I'm going 
to call it 'defrag' but I'm sure someone who knows better will probably correct 
me as to why it is not the same.
So are these performance issues after scubbing and is it possible to scrub 
online - I.e. some reasonable level of performance is maintained while the 
scrub happens?
Resilvering is also recommended. Not sure if that is for performance reasons.

http://docs.oracle.com/cd/E23824_01/html/821-1448/zfspools-4.html



Sent from my HTC Desire C on Three

- Reply message -
From: "Scott Nolin" 
To: "Anjana Kar" , "lustre-discuss@lists.lustre.org" 

Subject: [Lustre-discuss] number of inodes in zfs MDT
Date: Fri, Jun 6, 2014 3:23 AM



Looking at some of our existing zfs filesystems, we have a couple with zfs mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d plan 
for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This is 
something we didn’t see in early testing, but with enough use it grinds things 
to a crawl. I believe this may be addressed in the newer version of ZFS, which 
we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it appears hat as 
the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


  1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. On 
initial testing file creates were about 2500 to 3000 IOPs per second. Follow up 
testing in it’s current state (about half full..) shows them at about 500 IOPs 
now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).
  2.
We took a snapshot of the filesystem and sent it to the backup MDS, this time 
with the MDT built on 4 SAS drives in a raid0 - really not for performance so 
much as “extra headroom” if that makes any sense. Testing this the IOPs started 
higher, at maybe 800 or 1000 (this is from memory, I don’t have my data in 
front of me). That initial faster speed could just be writing to 4 spindles I 
suppose, but surprising to me, the performance degraded at a slower rate. It 
took much longer to get painfully slow. It still got there. The performance 
didn’t degrade at the same rate if that makes sense - the same number of writes 
on the smaller/slower mdt degraded the performance more quickly.  My guess is 
that had something to do with the total space available. Who knows. I believe 
restarting lustre (and certainly rebooting) ‘resets the clock’ on the file 
create performance degradation.

For that problem we’re just going to try adding 4 SSD’s, but it’s an ugly 
problem. Also are once again hopeful new zfs version addresses it.

And finally, we’ve got a real concern with snapshot backups of the MDT that my 
colleague posted about - the problem we see manifests in essentially a 
read-only recovered file system, so it’s a concern and not quite terrifying.

All in all, the next lustre file system we bring up (in a couple weeks) we are 
very strongly considering going with ldiskfs for the MDT this time.

Scott








From: Anjana Kar<mailto:k...@psc.edu>
Sent: ‎Tuesday‎, ‎June‎ ‎3‎, ‎2014 ‎7‎:‎38‎ ‎PM
To: lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>

Is there a way to set the number of inodes for zfs MDT?

I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0
manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the
number of
inodes is getting set to 7 million, which is not enough for a 100TB
filesystem.

Thanks in advance.

-Anjana Kar
  Pittsburgh Supercomputing Center
  k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-06 Thread Dilger, Andreas

Note that the MDT performance degradation over time (and especially under load) 
can be addressed by ZFS ARC cache tunables.  In particular arc_meta_limit is 
critical, since this artificially limits the amount of metadata that can be 
cached on the MDS, since it doesn't store any data.

Cheers, Andreas

On 2014/06/05, 7:58 PM, "Scott Nolin" 
mailto:scott.no...@ssec.wisc.edu>> wrote:

Looking at some of our existing zfs filesystems, we have a couple with zfs mdts

One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d plan 
for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.

We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.

A serious issue we have is performance of the zfs arc cache over time. This is 
something we didn’t see in early testing, but with enough use it grinds things 
to a crawl. I believe this may be addressed in the newer version of ZFS, which 
we’re hopefully awaiting.

Another thing we’ve seen, which is mysterious to me is this it appears hat as 
the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


  1.
The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. On 
initial testing file creates were about 2500 to 3000 IOPs per second. Follow up 
testing in it’s current state (about half full..) shows them at about 500 IOPs 
now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).
  2.
We took a snapshot of the filesystem and sent it to the backup MDS, this time 
with the MDT built on 4 SAS drives in a raid0 - really not for performance so 
much as “extra headroom” if that makes any sense. Testing this the IOPs started 
higher, at maybe 800 or 1000 (this is from memory, I don’t have my data in 
front of me). That initial faster speed could just be writing to 4 spindles I 
suppose, but surprising to me, the performance degraded at a slower rate. It 
took much longer to get painfully slow. It still got there. The performance 
didn’t degrade at the same rate if that makes sense - the same number of writes 
on the smaller/slower mdt degraded the performance more quickly.  My guess is 
that had something to do with the total space available. Who knows. I believe 
restarting lustre (and certainly rebooting) ‘resets the clock’ on the file 
create performance degradation.

For that problem we’re just going to try adding 4 SSD’s, but it’s an ugly 
problem. Also are once again hopeful new zfs version addresses it.

And finally, we’ve got a real concern with snapshot backups of the MDT that my 
colleague posted about - the problem we see manifests in essentially a 
read-only recovered file system, so it’s a concern and not quite terrifying.

All in all, the next lustre file system we bring up (in a couple weeks) we are 
very strongly considering going with ldiskfs for the MDT this time.

Scott








From: Anjana Kar
Sent: Tuesday, June 3, 2014 7:38 PM
To: lustre-discuss@lists.lustre.org

Is there a way to set the number of inodes for zfs MDT?

I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0
manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the
number of
inodes is getting set to 7 million, which is not enough for a 100TB
filesystem.

Thanks in advance.

-Anjana Kar
  Pittsburgh Supercomputing Center
  k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Cheers, Andreas
--
Andreas Dilger
Lustre Software Architect
Intel High Performance Data Division
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-05 Thread Scott Nolin

Looking at some of our existing zfs filesystems, we have a couple with zfs mdts 




One has 103M inodes and uses 152G of MDT space, another 12M and 19G. I’d plan 
for less than that I guess as Mr. Dilger suggests. It all depends on your 
expected average file size and number of files for what will work.


We have run into some unpleasant surprises with zfs for the MDT, I believe 
mostly documented in bug reports, or at least hinted at.


A serious issue we have is performance of the zfs arc cache over time. This is 
something we didn’t see in early testing, but with enough use it grinds things 
to a crawl. I believe this may be addressed in the newer version of ZFS, which 
we’re hopefully awaiting.


Another thing we’ve seen, which is mysterious to me is this it appears hat as 
the MDT begins to fill up file create rates go down. We don’t really have a 
strong handle on this (not enough for a bug report I think), but we see this:


The aforementioned 104M inode / 152GB MDT system has 4 SAS drives raid10. On 
initial testing file creates were about 2500 to 3000 IOPs per second. Follow up 
testing in it’s current state (about half full..) shows them at about 500 IOPs 
now, but with a few iterations of mdtest those IOPs plummet quickly to 
unbearable levels (like 30…).


We took a snapshot of the filesystem and sent it to the backup MDS, this time 
with the MDT built on 4 SAS drives in a raid0 - really not for performance so 
much as “extra headroom” if that makes any sense. Testing this the IOPs started 
higher, at maybe 800 or 1000 (this is from memory, I don’t have my data in 
front of me). That initial faster speed could just be writing to 4 spindles I 
suppose, but surprising to me, the performance degraded at a slower rate. It 
took much longer to get painfully slow. It still got there. The performance 
didn’t degrade at the same rate if that makes sense - the same number of writes 
on the smaller/slower mdt degraded the performance more quickly.  My guess is 
that had something to do with the total space available. Who knows. I believe 
restarting lustre (and certainly rebooting) ‘resets the clock’ on the file 
create performance degradation.



For that problem we’re just going to try adding 4 SSD’s, but it’s an ugly 
problem. Also are once again hopeful new zfs version addresses it.


And finally, we’ve got a real concern with snapshot backups of the MDT that my 
colleague posted about - the problem we see manifests in essentially a 
read-only recovered file system, so it’s a concern and not quite terrifying.


All in all, the next lustre file system we bring up (in a couple weeks) we are 
very strongly considering going with ldiskfs for the MDT this time.


Scott














From: Anjana Kar
Sent: ‎Tuesday‎, ‎June‎ ‎3‎, ‎2014 ‎7‎:‎38‎ ‎PM
To: lustre-discuss@lists.lustre.org





Is there a way to set the number of inodes for zfs MDT?

I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0 
manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the 
number of
inodes is getting set to 7 million, which is not enough for a 100TB 
filesystem.

Thanks in advance.

-Anjana Kar
  Pittsburgh Supercomputing Center
  k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

2014-06-04 Thread Dilger, Andreas

On 2014/06/03, 6:37 PM, "Anjana Kar"  wrote:

>Is there a way to set the number of inodes for zfs MDT?

No.  ZFS can create inodes until the filesystem is full.

>I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0
>manual, but it fails to accept it.

This is an ldiskfs-specific option.

> We are mirroring 2 80GB SSDs for the MDT, but the number of inodes is
>getting set to 7 million, which is not enough for a 100TB filesystem.

Since the "free" inode count is just an estimate for ZFS, and is quite a
big guess when the filesystem is new, but gets more accurate as the
filesystem has more inodes created.  For 80GB SSDs with about 2KB/inode
(usable space) you might get at most about 20M inodes for such a
filesystem.

Cheers, Andreas
-- 
Andreas Dilger

Lustre Software Architect
Intel High Performance Data Division

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

[Lustre-discuss] number of inodes in zfs MDT

2014-06-03 Thread Anjana Kar


Is there a way to set the number of inodes for zfs MDT?

I've tried using --mkfsoptions="-N value" mentioned in lustre 2.0 
manual, but it
fails to accept it. We are mirroring 2 80GB SSDs for the MDT, but the 
number of
inodes is getting set to 7 million, which is not enough for a 100TB 
filesystem.


Thanks in advance.

-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

Re: [Lustre-discuss] number of inodes in zfs MDT

[Lustre-discuss] number of inodes in zfs MDT

15 matches

Site Navigation

Mail list logo

Footer information