Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-26 Thread Vincent Fox
  So to my mind, the downside of ZFS flush disable is.

Data on disk may not be as current in the unlikely event
of power outage.  In point of fact MOST filesystem do not operate
in journalled data mode anyhow and most people just don't
realize this fact.   The default for Linux EXT filesystems with journalling
is just to ensure metadata integrity, and few people set data journalling
because it costs performance.

However you set it, ZFS is not going to come up with an fsck
prompt resulting in hours of single-user downtime.  Which is
what really matters the most eh?





Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-26 Thread Eric Luyten
On Tue, August 10, 2010 4:06 pm, Gary Mills wrote:
> On Mon, Aug 09, 2010 at 09:03:44PM +0200, Pascal Gienger wrote:
>
>> Am 09.08.10 19:46, schrieb Vincent Fox:
>>
>>> * Turn off ZFS cache flushing
>>> set zfs:zfs_nocacheflush = 1
>>
>> For hardware (fiberchannel, iSCSI, SSA, ...) arrays with their own Cache
>> this is a must.
>
> Only if the SAN device handles cache flush requests incorrectly.
> It should consider a write to battery-backed memory as a write to
> permanent storage, and manage its own writes to disk from there.


Folks,


Be aware that the zfs_nocacheflush parameter is global and thus applies
to all ZFS filesystems on your server, local (system) disks included ...

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes

offers extra reading material.


Pascal, Vincent and Gary, I have been ploughing your valuable comments back
into our development team's discussions ! Thank you.


Eric Luyten.


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-10 Thread Gary Mills
On Mon, Aug 09, 2010 at 09:03:44PM +0200, Pascal Gienger wrote:
> Am 09.08.10 19:46, schrieb Vincent Fox:
> > * Turn off ZFS cache flushing
> > set zfs:zfs_nocacheflush = 1
> 
> For hardware (fiberchannel, iSCSI, SSA, ...) arrays with their own Cache 
> this is a must.

Only if the SAN device handles cache flush requests incorrectly.
It should consider a write to battery-backed memory as a write to
permanent storage, and manage its own writes to disk from there.

> > * Increase DNLC (Directory Name Lookup Cache)
> > set ncsize = 50
> 
> vmstat -s | grep 'total name lookups'
> 135562914356 total name lookups (cache hits 96%)
> 
> :-)
> Unless the percent ratio is not below 90% increasing the DNLC is not so 
> useful.

According to:

http://docs.sun.com/app/docs/doc/817-0404/chapter2-35?a=view

the proper statistics to determine if the cache is too small are
provided by `kstat -n dnlcstats'.  Beware also that the cache will
always overflow during backups because they typically read all of
the directories once, running the cache.  It's the cache activity
during normal IMAP access that's important.

-- 
-Gary Mills--Unix Group--Computer and Network Services-

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-09 Thread Vincent Fox
On Mon, 2010-08-09 at 17:22 +0200, Eric Luyten wrote:
> Folks,
> 
>  did you consider, measure and/or carry
> out a change of the default 128 KB blocksize ?

To more directly answer your question than last post...

We did some testing with Bonnie++ prior to deployment
and changing recordsize didn't reveal any particular
improvement for what we guessed represented simulation.

After deployment we ran into performance problems, which
turned out to be related to fsync "corner" in then-current
release, later fixed in a patch.  We ran a performance
tool from Sun which clearly showed the problem with fsync
but I can't recall it's name right now.  We were in production
though at that point and not free to vary recordsizes and
see the effect with that tool.




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-09 Thread Pascal Gienger
Am 09.08.10 19:46, schrieb Vincent Fox:
> * Turn off ZFS cache flushing
> set zfs:zfs_nocacheflush = 1

For hardware (fiberchannel, iSCSI, SSA, ...) arrays with their own Cache 
this is a must.

> * Increase DNLC (Directory Name Lookup Cache)
> set ncsize = 50

vmstat -s | grep 'total name lookups'
135562914356 total name lookups (cache hits 96%)

:-)
Unless the percent ratio is not below 90% increasing the DNLC is not so 
useful.

> Turn off atime of course.

Sure.

> Turn on LZJB compression for metapartition but gzip for
> the mail data filesystem. Our compression ratio on the mail
> filesystem is showing 1.68x.

Yes. GZIP for Mail, LZJB for Meta. Identical configuration here.

Pascal

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-09 Thread Vincent Fox
For what Cyrus is doing on Solaris with ZFS, the
recordsize seems nearly negligible.  What with all the
caching in the way, and how ZFS orders transactions, it's
about the last tuneable I'd worry about.

Here's what works well for us, add this to /etc/system:

* Turn off ZFS cache flushing
set zfs:zfs_nocacheflush = 1
* Increase DNLC (Directory Name Lookup Cache)
set ncsize = 50

Turn off atime of course.

Turn on LZJB compression for metapartition but gzip for
the mail data filesystem. Our compression ratio on the mail
filesystem is showing 1.68x.

Our I/O channels average only 4-5% busy with ~6,000 users
per backed mailstore.  We run nightly snapshots and then
backup every other night from the most recent snapshot and
that is factored into the iostat number.






Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-09 Thread Pascal Gienger
Am 09.08.10 17:33, schrieb Pascal Gienger:

> A smaller record size is a good option if you notice an i/o bottleneck
> on your fiberchannel/iSCSI/SAS link. It won't bring you a performance
> gain in random i/o. There is a small exception: Database systems writing
> always the same fixed blocksize. For MySQL some people advise 32k.

Just another note:
For us, gzip compression had a performance plus, reducing i/o bandwidth 
much better than a smaller recordsize (gzip compression for the 
mailstore, NOT (!) for the meta partition containing the cyrus.* files!).

Just for your info as a reference, we're running happy with this:

-bash-3.00$ zfs get all mail/imap
NAME   PROPERTY  VALUE  SOURCE
mail/imap  type  filesystem -
mail/imap  creation  Mon Aug 13 13:19 2007  -
mail/imap  used  1.58T  -
mail/imap  available 4.96T  -
mail/imap  referenced1.51T  -
mail/imap  compressratio 1.61x  -
mail/imap  mounted   yes-
mail/imap  quota none   default
mail/imap  reservation   none   default
mail/imap  recordsize128K   local
mail/imap  mountpoint/mail/imap default
mail/imap  sharenfs  offdefault
mail/imap  checksum  on default
mail/imap  compression   gzip   local
mail/imap  atime offlocal
mail/imap  devices   offlocal
mail/imap  exec  offlocal
mail/imap  setuidofflocal
mail/imap  readonly  offdefault
mail/imap  zoned offdefault
mail/imap  snapdir   hidden default
mail/imap  aclmode   groupmask  default
mail/imap  aclinheritrestricted default
mail/imap  canmount  on default
mail/imap  shareiscsioffdefault
mail/imap  xattr on default
mail/imap  copies1  default
mail/imap  version   1  -
mail/imap  utf8only  off-
mail/imap  normalization none   -
mail/imap  casesensitivity   sensitive  -
mail/imap  vscan offdefault
mail/imap  nbmandoffdefault
mail/imap  sharesmb  offdefault
mail/imap  refquota  none   default
mail/imap  refreservationnone   default
mail/imap  primarycache  alldefault
mail/imap  secondarycachealldefault
-bash-3.00$ 


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-09 Thread Pascal Gienger
Am 09.08.10 17:22, schrieb Eric Luyten:
> Folks,
>
> A question for those of you running ZFS as the filesystem architecture
> for your Cyrus message store : did you consider, measure and/or carry
> out a change of the default 128 KB blocksize ?
> If so, what value are you using ?

First:
Changes to ZFS recordsize do not change the on-disk-format of your 
zfs/zpool. It just applies to NEWLY created files or file parts/zfs 
records (!).

Second: As said on a ZFS volume the recordsize is NOT the block size. 
The record size is the size of a single ZFS record read at once. Due to 
the ZIL changes to files get written nearly sequentially so the 
recordsize is nearly irrelevant.

A smaller record size is a good option if you notice an i/o bottleneck 
on your fiberchannel/iSCSI/SAS link. It won't bring you a performance 
gain in random i/o. There is a small exception: Database systems writing 
always the same fixed blocksize. For MySQL some people advise 32k.


ZFS record size is not the same as zfs block size of a zvol (zfs block 
volume). That's another story. But I assume you are not talking about a 
ZFS block volume iSCSI server with a non-zfs-filesystem written on it.

Just my $0.02,

Pascal

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Reducing ZFS blocksize to improve Cyrus write performance ?

2010-08-09 Thread Eric Luyten
Folks,

A question for those of you running ZFS as the filesystem architecture
for your Cyrus message store : did you consider, measure and/or carry
out a change of the default 128 KB blocksize ?
If so, what value are you using ?

Regards,
Eric Luyten, Computing Centre VUB/ULB.



Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html