Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-25 Thread John Tobin
On Fri, Jan 22, 2010 at 09:03:42PM -0500, Charles Sprickman wrote:
 On Fri, 22 Jan 2010, Frank Cusack wrote:

 On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote:
 Dunno about zfs, but I've heard that at least in one NetApp installation
 deduplication was way too heavyweight.

 zfs dedup is pretty resources intensive -- for writes.  For mail I
 suspect reads overwhelm writes?

 Sorry for the tangent, but I wonder if anyone here is running lots of  
 Maildirs on zfs?  I just recently started experimenting with it on our  
 backups server (FBSD 8.0), and I really am liking it.  I was also  
 surprised at how my little 4 drive raidz volume performed in benchmarks - 
 quite impressive.

We used to have our Maildirs on ZFS but we've moved to ext3.  ZFS worked
reasonably well, except for the days when it slowed down to less than
10% of normal throughput.  After a reboot or a couple of days of slow
running it would perform normally again.  This was on Solaris 10, at
most a couple of months behind on patches.  I had read the ZFS evil
tuning guide, and the ZFS best practices guide, but they didn't help.
It wasn't just mail that was slow - listing the contents of a small
directory could take over a minute.  We're much happier since switching
to ext3; I haven't worried about mail performance since.

-- 
John Tobin
No no no. You're supposed to test with -march=... -fomit-frame-pointer
-ffancy-math -fuse-lots-of-resources-go-very-fast -fsacrifice-more-goats
-fsummon-cthulu-if-that-helps as root at nice -20, preferably in single
user mode and jumps should be aligned on pentagrams, not 8 byte
boundaries.
Definitely not use debugging :-)
  -- Nicholas Clark, in perl6-internals


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 23:05 +0200, Timo Sirainen wrote:
 It would also be possible to already write such Maildir feature. Someone
 on this list already wrote header/body separation code, which was pretty
 easy to do with a plugin.

Someone = Alex Baule



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Frank Cusack

On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote:

Dunno about zfs, but I've heard that at least in one NetApp installation
deduplication was way too heavyweight.


zfs dedup is pretty resources intensive -- for writes.  For mail I
suspect reads overwhelm writes?

-frank


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 16:09 -0500, Frank Cusack wrote:
 On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote:
  Dunno about zfs, but I've heard that at least in one NetApp installation
  deduplication was way too heavyweight.
 
 zfs dedup is pretty resources intensive -- for writes.  For mail I
 suspect reads overwhelm writes?

I don't have any evidence, but my logic goes like: Mail is written to
disk once. Most users use a single client, which downloads the message
once. Or maybe they're using webmail, and they read the same message
approximately once (or maybe max. 1.1 times). In both cases read:write
is about 1:1.

Index files are of course a different thing. They're read a lot more
often. But dedup doesn't help with them.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Timo Sirainen
On Fri, 2010-01-22 at 23:12 +0200, Timo Sirainen wrote:

 I don't have any evidence, but my logic goes like: Mail is written to
 disk once. Most users use a single client, which downloads the message
 once. Or maybe they're using webmail, and they read the same message
 approximately once (or maybe max. 1.1 times). In both cases read:write
 is about 1:1.

Also if message is read close to after it was read, it's already in
cache and won't have to be read from disk. In those cases read:write
might be close to 0:1..



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Frank Cusack

On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote:

On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote:

In the future, it would be cool if there were a mailbox format (dbox2?)
where mail headers and each mime part were stored in separate files.
This would enable the zfs dedup feature to be used to maximum benefit.


This is more or less what dbox's single instance storage is going to do.
Maybe in half a year or so.. And you don't even need filesystem
deduplication feature. :)


But if the mail system has to handle it, it only knows about mails
written at the same time.  For example, if postfix delivers mail
with a single recipient per mail (the recommended config somewhere,
not sure if recommended by postfix or by dovecot), dbox won't get
the opportunity to dedup.

And for mails which are re-forwarded (pretty common occurrence), again
dbox won't get the chance to dedup.

Or will there be a global index?

-frank


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Timo Sirainen
On 22.1.2010, at 23.14, Frank Cusack wrote:

 This is more or less what dbox's single instance storage is going to do.
 Maybe in half a year or so.. And you don't even need filesystem
 deduplication feature. :)
 
 But if the mail system has to handle it, it only knows about mails
 written at the same time.  For example, if postfix delivers mail
 with a single recipient per mail (the recommended config somewhere,
 not sure if recommended by postfix or by dovecot), dbox won't get
 the opportunity to dedup.

Well, doing the multiple-recipients-at-a-time already works with v1.1+ with 
Maildir.

 And for mails which are re-forwarded (pretty common occurrence), again
 dbox won't get the chance to dedup.
 
 Or will there be a global index?

Yes. That's what dbox SIS is about. You have a global repository of (large) 
MIME parts, indexed by their SHA1 sum (or something).



Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Timo Sirainen
On 22.1.2010, at 23.39, Frank Cusack wrote:

 On January 22, 2010 11:21:09 PM +0200 Timo Sirainen t...@iki.fi wrote:
 Or will there be a global index?
 
 Yes. That's what dbox SIS is about. You have a global repository of
 (large) MIME parts, indexed by their SHA1 sum (or something).
 
 In the case of zfs then, the filesystem may as well do the dedup'ing.

Or dbox may as well do the deduping? :) I guess it comes down to whose 
algorithm is fastest. I suppose they're more or less the same, if it's possible 
to tell zfs to dedup files only in /mail/attachments/ directory (I guess you 
can create a separate filesystem for that).



Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Frank Cusack

On January 22, 2010 11:44:07 PM +0200 Timo Sirainen t...@iki.fi wrote:

On 22.1.2010, at 23.39, Frank Cusack wrote:


On January 22, 2010 11:21:09 PM +0200 Timo Sirainen t...@iki.fi wrote:

Or will there be a global index?


Yes. That's what dbox SIS is about. You have a global repository of
(large) MIME parts, indexed by their SHA1 sum (or something).


In the case of zfs then, the filesystem may as well do the dedup'ing.


Or dbox may as well do the deduping? :) I guess it comes down to whose
algorithm is fastest.


Yeah, I just meant that if dbox has a global hash list then either
method should have similar overhead.  zfs checksums every single block
written anyway (regardless of dedup) so I think it would be faster
vs dbox.

Of course dbox can be used on systems without zfs.

I would suggest that using zfs would give you more portability (mail
files appear normal and copied or manipulated however you care to),
however normal mail files do not separate the headers and the message
parts so that isn't valid.

-frank


Re: [Dovecot] mailbox format w/ separate headers/data

2010-01-22 Thread Charles Sprickman

On Fri, 22 Jan 2010, Frank Cusack wrote:


On January 22, 2010 11:05:22 PM +0200 Timo Sirainen t...@iki.fi wrote:

Dunno about zfs, but I've heard that at least in one NetApp installation
deduplication was way too heavyweight.


zfs dedup is pretty resources intensive -- for writes.  For mail I
suspect reads overwhelm writes?


Sorry for the tangent, but I wonder if anyone here is running lots of 
Maildirs on zfs?  I just recently started experimenting with it on our 
backups server (FBSD 8.0), and I really am liking it.  I was also 
surprised at how my little 4 drive raidz volume performed in benchmarks - 
quite impressive.


I'd seen some comments here in the past that zfs+maildirs = bad.  Anything 
to back that up?  Any comparisons to UFS2 on FBSD?


For a number of reasons, running zfs on my main mail host would be very 
handy (backups and easy expansion being the two big ones).


Thanks,

Charles


-frank