Re: [Dovecot] mailbox format w/ separate headers/data
On Fri, Jan 22, 2010 at 09:03:42PM -0500, Charles Sprickman wrote: On Fri, 22 Jan 2010, Frank Cusack wrote: On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote: Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight. zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes? Sorry for the tangent, but I wonder if anyone here is running lots of Maildirs on zfs? I just recently started experimenting with it on our backups server (FBSD 8.0), and I really am liking it. I was also surprised at how my little 4 drive raidz volume performed in benchmarks - quite impressive. We used to have our Maildirs on ZFS but we've moved to ext3. ZFS worked reasonably well, except for the days when it slowed down to less than 10% of normal throughput. After a reboot or a couple of days of slow running it would perform normally again. This was on Solaris 10, at most a couple of months behind on patches. I had read the ZFS evil tuning guide, and the ZFS best practices guide, but they didn't help. It wasn't just mail that was slow - listing the contents of a small directory could take over a minute. We're much happier since switching to ext3; I haven't worried about mail performance since. -- John Tobin No no no. You're supposed to test with -march=... -fomit-frame-pointer -ffancy-math -fuse-lots-of-resources-go-very-fast -fsacrifice-more-goats -fsummon-cthulu-if-that-helps as root at nice -20, preferably in single user mode and jumps should be aligned on pentagrams, not 8 byte boundaries. Definitely not use debugging :-) -- Nicholas Clark, in perl6-internals
Re: [Dovecot] mailbox format w/ separate headers/data
On Fri, 2010-01-22 at 23:05 +0200, Timo Sirainen wrote: It would also be possible to already write such Maildir feature. Someone on this list already wrote header/body separation code, which was pretty easy to do with a plugin. Someone = Alex Baule signature.asc Description: This is a digitally signed message part
Re: [Dovecot] mailbox format w/ separate headers/data
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote: Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight. zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes? -frank
Re: [Dovecot] mailbox format w/ separate headers/data
On Fri, 2010-01-22 at 16:09 -0500, Frank Cusack wrote: On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote: Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight. zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes? I don't have any evidence, but my logic goes like: Mail is written to disk once. Most users use a single client, which downloads the message once. Or maybe they're using webmail, and they read the same message approximately once (or maybe max. 1.1 times). In both cases read:write is about 1:1. Index files are of course a different thing. They're read a lot more often. But dedup doesn't help with them. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] mailbox format w/ separate headers/data
On Fri, 2010-01-22 at 23:12 +0200, Timo Sirainen wrote: I don't have any evidence, but my logic goes like: Mail is written to disk once. Most users use a single client, which downloads the message once. Or maybe they're using webmail, and they read the same message approximately once (or maybe max. 1.1 times). In both cases read:write is about 1:1. Also if message is read close to after it was read, it's already in cache and won't have to be read from disk. In those cases read:write might be close to 0:1.. signature.asc Description: This is a digitally signed message part
Re: [Dovecot] mailbox format w/ separate headers/data
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote: On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote: In the future, it would be cool if there were a mailbox format (dbox2?) where mail headers and each mime part were stored in separate files. This would enable the zfs dedup feature to be used to maximum benefit. This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :) But if the mail system has to handle it, it only knows about mails written at the same time. For example, if postfix delivers mail with a single recipient per mail (the recommended config somewhere, not sure if recommended by postfix or by dovecot), dbox won't get the opportunity to dedup. And for mails which are re-forwarded (pretty common occurrence), again dbox won't get the chance to dedup. Or will there be a global index? -frank
Re: [Dovecot] mailbox format w/ separate headers/data
On 22.1.2010, at 23.14, Frank Cusack wrote: This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :) But if the mail system has to handle it, it only knows about mails written at the same time. For example, if postfix delivers mail with a single recipient per mail (the recommended config somewhere, not sure if recommended by postfix or by dovecot), dbox won't get the opportunity to dedup. Well, doing the multiple-recipients-at-a-time already works with v1.1+ with Maildir. And for mails which are re-forwarded (pretty common occurrence), again dbox won't get the chance to dedup. Or will there be a global index? Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something).
Re: [Dovecot] mailbox format w/ separate headers/data
On 22.1.2010, at 23.39, Frank Cusack wrote: On January 22, 2010 11:21:09 PM +0200 Timo Sirainen t...@iki.fi wrote: Or will there be a global index? Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something). In the case of zfs then, the filesystem may as well do the dedup'ing. Or dbox may as well do the deduping? :) I guess it comes down to whose algorithm is fastest. I suppose they're more or less the same, if it's possible to tell zfs to dedup files only in /mail/attachments/ directory (I guess you can create a separate filesystem for that).
Re: [Dovecot] mailbox format w/ separate headers/data
On January 22, 2010 11:44:07 PM +0200 Timo Sirainen t...@iki.fi wrote: On 22.1.2010, at 23.39, Frank Cusack wrote: On January 22, 2010 11:21:09 PM +0200 Timo Sirainen t...@iki.fi wrote: Or will there be a global index? Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something). In the case of zfs then, the filesystem may as well do the dedup'ing. Or dbox may as well do the deduping? :) I guess it comes down to whose algorithm is fastest. Yeah, I just meant that if dbox has a global hash list then either method should have similar overhead. zfs checksums every single block written anyway (regardless of dedup) so I think it would be faster vs dbox. Of course dbox can be used on systems without zfs. I would suggest that using zfs would give you more portability (mail files appear normal and copied or manipulated however you care to), however normal mail files do not separate the headers and the message parts so that isn't valid. -frank
Re: [Dovecot] mailbox format w/ separate headers/data
On Fri, 22 Jan 2010, Frank Cusack wrote: On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote: Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight. zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes? Sorry for the tangent, but I wonder if anyone here is running lots of Maildirs on zfs? I just recently started experimenting with it on our backups server (FBSD 8.0), and I really am liking it. I was also surprised at how my little 4 drive raidz volume performed in benchmarks - quite impressive. I'd seen some comments here in the past that zfs+maildirs = bad. Anything to back that up? Any comparisons to UFS2 on FBSD? For a number of reasons, running zfs on my main mail host would be very handy (backups and easy expansion being the two big ones). Thanks, Charles -frank