Re: Reducing ZFS blocksize to improve Cyrus write performance ?
So to my mind, the downside of ZFS flush disable is. Data on disk may not be as current in the unlikely event of power outage. In point of fact MOST filesystem do not operate in journalled data mode anyhow and most people just don't realize this fact. The default for Linux EXT filesystems with journalling is just to ensure metadata integrity, and few people set data journalling because it costs performance. However you set it, ZFS is not going to come up with an fsck prompt resulting in hours of single-user downtime. Which is what really matters the most eh? Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
On Tue, August 10, 2010 4:06 pm, Gary Mills wrote: > On Mon, Aug 09, 2010 at 09:03:44PM +0200, Pascal Gienger wrote: > >> Am 09.08.10 19:46, schrieb Vincent Fox: >> >>> * Turn off ZFS cache flushing >>> set zfs:zfs_nocacheflush = 1 >> >> For hardware (fiberchannel, iSCSI, SSA, ...) arrays with their own Cache >> this is a must. > > Only if the SAN device handles cache flush requests incorrectly. > It should consider a write to battery-backed memory as a write to > permanent storage, and manage its own writes to disk from there. Folks, Be aware that the zfs_nocacheflush parameter is global and thus applies to all ZFS filesystems on your server, local (system) disks included ... http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes offers extra reading material. Pascal, Vincent and Gary, I have been ploughing your valuable comments back into our development team's discussions ! Thank you. Eric Luyten. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
On Mon, Aug 09, 2010 at 09:03:44PM +0200, Pascal Gienger wrote: > Am 09.08.10 19:46, schrieb Vincent Fox: > > * Turn off ZFS cache flushing > > set zfs:zfs_nocacheflush = 1 > > For hardware (fiberchannel, iSCSI, SSA, ...) arrays with their own Cache > this is a must. Only if the SAN device handles cache flush requests incorrectly. It should consider a write to battery-backed memory as a write to permanent storage, and manage its own writes to disk from there. > > * Increase DNLC (Directory Name Lookup Cache) > > set ncsize = 50 > > vmstat -s | grep 'total name lookups' > 135562914356 total name lookups (cache hits 96%) > > :-) > Unless the percent ratio is not below 90% increasing the DNLC is not so > useful. According to: http://docs.sun.com/app/docs/doc/817-0404/chapter2-35?a=view the proper statistics to determine if the cache is too small are provided by `kstat -n dnlcstats'. Beware also that the cache will always overflow during backups because they typically read all of the directories once, running the cache. It's the cache activity during normal IMAP access that's important. -- -Gary Mills--Unix Group--Computer and Network Services- Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
On Mon, 2010-08-09 at 17:22 +0200, Eric Luyten wrote: > Folks, > > did you consider, measure and/or carry > out a change of the default 128 KB blocksize ? To more directly answer your question than last post... We did some testing with Bonnie++ prior to deployment and changing recordsize didn't reveal any particular improvement for what we guessed represented simulation. After deployment we ran into performance problems, which turned out to be related to fsync "corner" in then-current release, later fixed in a patch. We ran a performance tool from Sun which clearly showed the problem with fsync but I can't recall it's name right now. We were in production though at that point and not free to vary recordsizes and see the effect with that tool. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
Am 09.08.10 19:46, schrieb Vincent Fox: > * Turn off ZFS cache flushing > set zfs:zfs_nocacheflush = 1 For hardware (fiberchannel, iSCSI, SSA, ...) arrays with their own Cache this is a must. > * Increase DNLC (Directory Name Lookup Cache) > set ncsize = 50 vmstat -s | grep 'total name lookups' 135562914356 total name lookups (cache hits 96%) :-) Unless the percent ratio is not below 90% increasing the DNLC is not so useful. > Turn off atime of course. Sure. > Turn on LZJB compression for metapartition but gzip for > the mail data filesystem. Our compression ratio on the mail > filesystem is showing 1.68x. Yes. GZIP for Mail, LZJB for Meta. Identical configuration here. Pascal Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
For what Cyrus is doing on Solaris with ZFS, the recordsize seems nearly negligible. What with all the caching in the way, and how ZFS orders transactions, it's about the last tuneable I'd worry about. Here's what works well for us, add this to /etc/system: * Turn off ZFS cache flushing set zfs:zfs_nocacheflush = 1 * Increase DNLC (Directory Name Lookup Cache) set ncsize = 50 Turn off atime of course. Turn on LZJB compression for metapartition but gzip for the mail data filesystem. Our compression ratio on the mail filesystem is showing 1.68x. Our I/O channels average only 4-5% busy with ~6,000 users per backed mailstore. We run nightly snapshots and then backup every other night from the most recent snapshot and that is factored into the iostat number. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
Am 09.08.10 17:33, schrieb Pascal Gienger: > A smaller record size is a good option if you notice an i/o bottleneck > on your fiberchannel/iSCSI/SAS link. It won't bring you a performance > gain in random i/o. There is a small exception: Database systems writing > always the same fixed blocksize. For MySQL some people advise 32k. Just another note: For us, gzip compression had a performance plus, reducing i/o bandwidth much better than a smaller recordsize (gzip compression for the mailstore, NOT (!) for the meta partition containing the cyrus.* files!). Just for your info as a reference, we're running happy with this: -bash-3.00$ zfs get all mail/imap NAME PROPERTY VALUE SOURCE mail/imap type filesystem - mail/imap creation Mon Aug 13 13:19 2007 - mail/imap used 1.58T - mail/imap available 4.96T - mail/imap referenced1.51T - mail/imap compressratio 1.61x - mail/imap mounted yes- mail/imap quota none default mail/imap reservation none default mail/imap recordsize128K local mail/imap mountpoint/mail/imap default mail/imap sharenfs offdefault mail/imap checksum on default mail/imap compression gzip local mail/imap atime offlocal mail/imap devices offlocal mail/imap exec offlocal mail/imap setuidofflocal mail/imap readonly offdefault mail/imap zoned offdefault mail/imap snapdir hidden default mail/imap aclmode groupmask default mail/imap aclinheritrestricted default mail/imap canmount on default mail/imap shareiscsioffdefault mail/imap xattr on default mail/imap copies1 default mail/imap version 1 - mail/imap utf8only off- mail/imap normalization none - mail/imap casesensitivity sensitive - mail/imap vscan offdefault mail/imap nbmandoffdefault mail/imap sharesmb offdefault mail/imap refquota none default mail/imap refreservationnone default mail/imap primarycache alldefault mail/imap secondarycachealldefault -bash-3.00$ Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Re: Reducing ZFS blocksize to improve Cyrus write performance ?
Am 09.08.10 17:22, schrieb Eric Luyten: > Folks, > > A question for those of you running ZFS as the filesystem architecture > for your Cyrus message store : did you consider, measure and/or carry > out a change of the default 128 KB blocksize ? > If so, what value are you using ? First: Changes to ZFS recordsize do not change the on-disk-format of your zfs/zpool. It just applies to NEWLY created files or file parts/zfs records (!). Second: As said on a ZFS volume the recordsize is NOT the block size. The record size is the size of a single ZFS record read at once. Due to the ZIL changes to files get written nearly sequentially so the recordsize is nearly irrelevant. A smaller record size is a good option if you notice an i/o bottleneck on your fiberchannel/iSCSI/SAS link. It won't bring you a performance gain in random i/o. There is a small exception: Database systems writing always the same fixed blocksize. For MySQL some people advise 32k. ZFS record size is not the same as zfs block size of a zvol (zfs block volume). That's another story. But I assume you are not talking about a ZFS block volume iSCSI server with a non-zfs-filesystem written on it. Just my $0.02, Pascal Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Reducing ZFS blocksize to improve Cyrus write performance ?
Folks, A question for those of you running ZFS as the filesystem architecture for your Cyrus message store : did you consider, measure and/or carry out a change of the default 128 KB blocksize ? If so, what value are you using ? Regards, Eric Luyten, Computing Centre VUB/ULB. Cyrus Home Page: http://cyrusimap.web.cmu.edu/ Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html