Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-11 Thread Ed W

On 10/03/2010 21:19, Timo Sirainen wrote:

On 10.8.2009, at 20.01, Timo Sirainen wrote:

   

(3.5. Implement async I/O filesystem backend.)
 

You know what I found out today? Linux doesn't support async IO for regular 
buffered files. I had heard there were issues, but I thought it was mainly 
about some annoying APIs and such. Anyone know if some project has successfully 
figured out some usable way to do async disk IO? The possibilities seem to be:

a) Use Linux's native AIO, which requires direct-io for files. This *might* not 
be horribly bad for mail files. After all, same mail is rarely read multiple 
times. Except when parsing its headers first and then its body. Maybe the 
process could do some internal buffering?..

I guess no one ever tried my posix_fadvise() patch? The idea was that it would 
tell the kernel after closing a mail file that it's no longer needed in memory, 
so kernel could remove it from page cache. I never heard any positive or 
negative comments about how it affected performance.. 
http://dovecot.org/patches/1.1/fadvise.diff

b) Use threads, either via some library or implement yourself. Each thread of 
course uses some extra memory. Also enabling threads causes glibc to start 
using a thread-safe version of malloc() (I think?), which slows things down 
(unless that can be avoided, maybe by using clone() directly instead of 
pthreads?).

c) I read someone's idea about using posix_fadvise() and fincore() functions to somehow 
make it kind of work, usually, maybe. I'm not sure if there's a practical way 
to make them work though. And of course I don't think fincore() has even been accepted by 
Linus yet.

   


Perhaps mail this question to the kernel list, stand back and watch it 
ignite?


Ed


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-11 Thread Sebastian Färber
b) Use threads, either via some library or implement yourself. Each thread of 
course uses some extra memory. Also enabling threads causes glibc to start 
using a thread-safe version of malloc() (I think?), which slows things down 
(unless that can be avoided, maybe by using clone() directly instead of 
pthreads?).

Perhaps libeio (http://software.schmorp.de/pkg/libeio.html) is a good
starting point?
I don't have any experience with it but it's used by node.js
(http://nodejs.org/) for the async I/O stuff.

-Sebastian


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-10 Thread Timo Sirainen
On 10.8.2009, at 20.01, Timo Sirainen wrote:

 (3.5. Implement async I/O filesystem backend.)

You know what I found out today? Linux doesn't support async IO for regular 
buffered files. I had heard there were issues, but I thought it was mainly 
about some annoying APIs and such. Anyone know if some project has successfully 
figured out some usable way to do async disk IO? The possibilities seem to be:

a) Use Linux's native AIO, which requires direct-io for files. This *might* not 
be horribly bad for mail files. After all, same mail is rarely read multiple 
times. Except when parsing its headers first and then its body. Maybe the 
process could do some internal buffering?..

I guess no one ever tried my posix_fadvise() patch? The idea was that it would 
tell the kernel after closing a mail file that it's no longer needed in memory, 
so kernel could remove it from page cache. I never heard any positive or 
negative comments about how it affected performance.. 
http://dovecot.org/patches/1.1/fadvise.diff

b) Use threads, either via some library or implement yourself. Each thread of 
course uses some extra memory. Also enabling threads causes glibc to start 
using a thread-safe version of malloc() (I think?), which slows things down 
(unless that can be avoided, maybe by using clone() directly instead of 
pthreads?).

c) I read someone's idea about using posix_fadvise() and fincore() functions to 
somehow make it kind of work, usually, maybe. I'm not sure if there's a 
practical way to make them work though. And of course I don't think fincore() 
has even been accepted by Linus yet.



Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-10 Thread Stan Hoeppner
Timo Sirainen put forth on 3/10/2010 3:19 PM:
 On 10.8.2009, at 20.01, Timo Sirainen wrote:
 
 (3.5. Implement async I/O filesystem backend.)
 
 You know what I found out today? Linux doesn't support async IO for regular 
 buffered files. I had heard there were issues, but I thought it was mainly 
 about some annoying APIs and such. Anyone know if some project has 
 successfully figured out some usable way to do async disk IO? The 
 possibilities seem to be:
 
 a) Use Linux's native AIO, which requires direct-io for files. This *might* 
 not be horribly bad for mail files. After all, same mail is rarely read 
 multiple times. Except when parsing its headers first and then its body. 
 Maybe the process could do some internal buffering?..
 
 I guess no one ever tried my posix_fadvise() patch? The idea was that it 
 would tell the kernel after closing a mail file that it's no longer needed in 
 memory, so kernel could remove it from page cache. I never heard any positive 
 or negative comments about how it affected performance.. 
 http://dovecot.org/patches/1.1/fadvise.diff
 
 b) Use threads, either via some library or implement yourself. Each thread of 
 course uses some extra memory. Also enabling threads causes glibc to start 
 using a thread-safe version of malloc() (I think?), which slows things down 
 (unless that can be avoided, maybe by using clone() directly instead of 
 pthreads?).
 
 c) I read someone's idea about using posix_fadvise() and fincore() functions 
 to somehow make it kind of work, usually, maybe. I'm not sure if there's a 
 practical way to make them work though. And of course I don't think fincore() 
 has even been accepted by Linus yet.
 

Considering the extent to which Linus hates O_DIRECT, I would think if he
was a fan of Async I/O at all, he'd have pushed its use via the buffer
cache.  Given that Async I/O is implemented via O_DIRECT I'd say Linus isn't
a fan of Async I/O either.  I've not read anything Linus has written on
Async I/O, if he even has, I'm merely making an educated guess based on the
current implementation of Async I/O in Linux.

-- 
Stan



Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread paulmon




On Mon, 2009-08-10 at 14:33 -0700, Seth Mattinen wrote:

 Nothing forces you to switch from maildir, if you're happy with it :)
 But if you want to support millions of users, it's simpler to distribute
 the storage and disk I/O evenly across hundreds of servers using a
 database that was designed for it. And by databases I mean here some of
 those key/value-like databases, not SQL. (What's a good collective name
 for those dbs anyway? BASE and NoSQL are a couple names I've seen.)


Timo, I've been thinking the same exact thing as you lately.  As mail starts
to move away from traditional pop3 users to more online storage in the
form of webmail the scalability of maildir for large multi giabyte mailboxes
goes out the window, loading cur in that type of scenario takes WAY too
long.  Gmail on Maildir isn't possible.  I can't speak for anyone else buy
my users are moving into webmail, POP users are becoming rare.

My current thinking is a key/value store as you've proposed.  Something like
Hadoop components or Project Voldamort.  Voldamort might be a better fit
from what I've read. The main issue here is applications such as local
delivery as well as pop/imap access would need to be rewritten to support
this.  Obviously creating a Hadoop or Voldamort aware local delivery agent
means being able to stay away from writing a complete MTA, likewise if one
treats IMAP as the main way of accessing a mailbox (proxies for POP3 for
example) then a new local delivery agent and IMAPd with key/value smarts
would all that would be needed to create this system.

My current thinking if having the local delivery break messages up into
their component pieces, headers, from address, to address, spam scores, body
etc into various key:value relationships.  Combine this with the replication
support of systems such as Hadoop or Voldamort and you end up with a
massively scalable based on commodity hardware.  You get rid of RAID
completely, remove NFS servers and replace with a cluster of beige boxes
with ~4 drives each.  Redundancy is handled by the native replication in the
key:value application itself (Voldamort for example can replicate upto 3
times) on each machine, so yes, you would store a single message more than
once but if each of your beige box storage systems have 4*2TB drives your
cost of storage is far less than the cost of traditional NFS server
manufacturers.

Anyways, this is just something that's currently floating in my head...

Paul





-- 
View this message in context: 
http://www.nabble.com/Scalability-plans%3A-Abstract-out-filesystem-and-make-it-someone-else%27s-problem-tp24903458p25645652.html
Sent from the Dovecot mailing list archive at Nabble.com.



Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Ed W

paulmon wrote:

My current thinking if having the local delivery break messages up into
their component pieces, headers, from address, to address, spam scores, body
etc into various key:value relationships.



Whilst this looks appealing on the surface I think the details are going 
to need some benchmarking to see if they stackup.  Certainly I hope this 
new abstraction works out because I wonder if we won't see a bunch of 
interesting ideas get implemented, such as you describe!!


Just to knock your theoretical idea around a bit though.  My guess would 
be that you need to look at the access patterns for this data to make 
sure you don't over normalise it.  eg if it's normal to simply open up 
a mailbox and then ask it for every one of the following X fields for 
the each message, then over normalising the header fields will lead to 
response time being dominated by access times for each field (especially 
if that creates a disk seek, etc).


At present I think dovecot's architecture kind of assumes that random 
access dominates for individual email message and then it optimises for 
a particular case of header accesses by caching those into a local 
database type structure which caches just a certain amount of 
recently requested header fields.  The access times then seem to be 
bounded by time to scan the inbox for new unseen messages and update 
this index with maildir (not sure what bounds mailbox scanning times in 
general use?).  ie it's optimising for returning every field X from 
every message in a folder, or else it is returning bits of a given message?



I should imagine that in general this architecture is near optimal for 
the general case and the main improvement is just in speeding up the 
updates after new emails are added/deleted... (done automatically at 
present if you use deliver, incurs a speed hit if you update yourself)


I should imagine that once you add a requirement to distribute the data 
and handle failover, etc then the problems of any cache coherency 
dominate the design and this could be interesting to play with ideas to 
solve this.


Anyway, I think the point is that for anyone who hasn't tried it yet, to 
first have a look at how your favourite IMAP client implements imap and 
watch the stream of commands being issued... It's usually quite a bit 
different to what you expect and to me it's a lot different to what 
might be optimal if I got to design their algorithm...


The point being that you shouldn't optimise too much for what you hope 
people will do, so much as have a look at your favourite webmail client 
or desktop client and optimise for whatever stream of idiocy they 
request you to keep pumping at them...


I for one look forward to these changes - I desperately hope I get some 
time to then play with some ideas because like you I'm itching to play 
with my next greatest idea!!


My only request to Timo was to kind of consider that a bunch of these 
ideas from the audience will almost certainly involve splitting up the 
mime message into component parts and that the abstracted interface 
should try not to throw away any potential speed benefit that this might 
achieve because the interface can't express what it needs clearly enough?


Good luck

Ed W



Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Timo Sirainen
On Mon, 2009-09-28 at 09:00 -0700, paulmon wrote:
 My current thinking is a key/value store as you've proposed.  Something like
 Hadoop components or Project Voldamort.  Voldamort might be a better fit
 from what I've read. 

My understanding of Hadoop is that it's more about distributed computing
instead of storage.

 My current thinking if having the local delivery break messages up into
 their component pieces, headers, from address, to address, spam scores, body
 etc into various key:value relationships.  

I was planning on basically just storing key=username/message-guid,
value=message pairs instead of splitting it up. Or perhaps split header
and body, but I think piecing it smaller than those just makes the
performance worse. To get different headers quickly there would still be
dovecot.index.cache (which would be in a some quick in-memory storage
but also stored in the database).



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Timo Sirainen
On Mon, 2009-09-28 at 17:57 +0100, Ed W wrote:
 My only request to Timo was to kind of consider that a bunch of these 
 ideas from the audience will almost certainly involve splitting up the 
 mime message into component parts and that the abstracted interface 
 should try not to throw away any potential speed benefit that this might 
 achieve because the interface can't express what it needs clearly enough?

It might become too complex to initially consider how to support split
MIME messages and such. I'm not really sure if it even belongs to this
filesystem abstraction layer. I was hoping that the FS API would be
really really simple and could also be used for other things than just
email.

But I'm also hoping to support things like single-instance storage at
some point. I'm not really sure if that should just be written into dbox
code directly or try to abstract it out..



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Ed W

Timo Sirainen wrote:

On Mon, 2009-09-28 at 09:00 -0700, paulmon wrote:
  

My current thinking is a key/value store as you've proposed.  Something like
Hadoop components or Project Voldamort.  Voldamort might be a better fit
from what I've read. 



My understanding of Hadoop is that it's more about distributed computing
instead of storage.
  


I believe it's possible to use it to ask lots of machines to parse a bit 
of database and then get the answer back from all of them.  eg some 
people are alleged to be using it to parse huge log files in sensible 
time by splitting up their log files across lots of machines and asking 
each of them to do a bit of filtering...


I'm out of my depth at this point - only read the executive summary...



My current thinking if having the local delivery break messages up into
their component pieces, headers, from address, to address, spam scores, body
etc into various key:value relationships.  



I was planning on basically just storing key=username/message-guid,
value=message pairs instead of splitting it up. Or perhaps split header
and body, but I think piecing it smaller than those just makes the
performance worse. To get different headers quickly there would still be
dovecot.index.cache (which would be in a some quick in-memory storage
but also stored in the database).
  


This can presumably be rephrased as:

- access times are say 10ms
- linear read times are say 60MB/sec
- Therefore don't break up a message into more than 0.010s * 60MB = 
600KB (ish) chunks or your seek times dominate simply doing linear reads 
and throwing away what you don't need...
- Obviously insert whatever timings you like and re-run the numbers, eg 
if you have some fancy pants flash drive then insert shorter seek times


However, these numbers and some very limited knowledge of how a small 
bunch of email clients seem to behave would suggest that the following 
is also worth optimising to varying degrees (please don't overlook 
someone wanting to implement some backend to try these ideas):


Theory: Attachments larger than K are worth breaking out according to 
the formula above
Justification: Above actually a fairly small attachment size it's 
cheaper to do a seek than to linear scan to the next mail message. For 
some storage designs this might be helpful (mbox type packing).  
Additionally some users have suggested that they want to try and single 
instance popular attachments, so K might be customisable, or better 
yet some design might choose to keep a cache of attachment fingerprints 
and de-dup them when a dup is next seen..


Theory: breakout (all) headers from bodies
Justification: scanning headers seems a popular task and dovecot keeps a 
local database to optimise the common case.  Reseeks would be slow 
though and some storage designs might be able to optimise and get fast 
linear seeks across all headers (eg pack them as per mbox and compress 
them?)


Theory: breakout individual headers
Justification: err... not got a good case for this one, but some of 
these fancy key value databases are optimised for creating views on 
certain headers across certain messages.  I imagine this won't fly in 
practice, but it seems a shame not to try it... Definitely anyone 
implementing an SQL database option will want to try it though... (bet 
it's slow though...)


Theory: pack message bodies together as per mbox
Justification: mbox seems faster, compresses better and all round seems 
better than maildir for access speed, except in certain circumstances 
such as deletes.  Dovecot already seems to optimise some corner cases by 
just marking messages dead without deleting them, so clearly there is 
tremendous scope for improvement here (dbox going down this route?).  
Some bright spark might design some backend which uses multiple mbox 
files to overcome the huge hit when defragging and it may well be that 
by incorporating eg splitting out larger attachments, and lightly 
compressing, then some workloads might see some really good performance! 
(Could be really interesting for archive mailboxes, etc?)



Just my 2p...

Ed


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Ed W

Timo Sirainen wrote:

On Mon, 2009-09-28 at 17:57 +0100, Ed W wrote:
  
My only request to Timo was to kind of consider that a bunch of these 
ideas from the audience will almost certainly involve splitting up the 
mime message into component parts and that the abstracted interface 
should try not to throw away any potential speed benefit that this might 
achieve because the interface can't express what it needs clearly enough?



It might become too complex to initially consider how to support split
MIME messages and such. I'm not really sure if it even belongs to this
filesystem abstraction layer. I was hoping that the FS API would be
really really simple and could also be used for other things than just
email.
  


Well, I think if you just implement a wrapper around read(fh, start, 
count) then it's going to be quite hard to implement some kind of 
storage which splits out the message in some way?


I guess the API would need to lineup with the IMAP commands to retrieve 
mime parts.  For the most part these are poorly supported by clients, so 
I guess most mail clients will undo all this cleverness, but I would 
imagine it will have a low impact on performance since it's just extra 
seeks on fetching individual messages?


I am starting to see newer clients finally get this right though.  I'm 
using profimail on my N97 and whilst I didn't look at it's imap stream 
it *seems* to be doing everything right from the client point of view.  
I even get to choose to download all the message if the size is Y and 
ignore larger attachments than Z, etc (In theory Thunderbird does this, 
but at least on my machine it just repeatedly downloads the same message 
again and again in various ways - it grinds to a halt every time I click 
on an email with a decent sized attachment, even if I have already read 
it... grr)




But I'm also hoping to support things like single-instance storage at
some point. I'm not really sure if that should just be written into dbox
code directly or try to abstract it out..
  


I agree it should at least initially go into the dbox, etc code.  I 
guess if a enough people do the same implementation (in all the new 
backends which I'm sure will arrive within days of some API coming 
out) it could bubble up, etc?


I would have thought that your API will prefer to request message parts 
where it can (eg header, body, mime part), and just issue a read_bytes, 
where that's what the client is asking for otherwise.  This would allow 
the storage engine to optimise where it can and sadly for the dumb 
client we just stream bytes since that's all they asked for...


Perhaps the API should also request specific headers from the storage 
engine where possible and ask for all headers only where it's 
necessary?  This would allow an sql database to be heavily normalised 
(I'm sure performance is iffy, but we have to pre-suppose some reason 
why this design is useful for other reasons)


Does this seem feasible?

Ed W



Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Timo Sirainen
On Mon, 2009-09-28 at 18:35 +0100, Ed W wrote:
 I would have thought that your API will prefer to request message parts 
 where it can (eg header, body, mime part), and just issue a read_bytes, 
 where that's what the client is asking for otherwise.  This would allow 
 the storage engine to optimise where it can and sadly for the dumb 
 client we just stream bytes since that's all they asked for...

In my mind this is more about what lib-storage API was supposed to
abstract out, whereas my filesystem API would be used simply for binary
data storage. The same FS API could be used to store both dbox files and
index files.

 Perhaps the API should also request specific headers from the storage 
 engine where possible and ask for all headers only where it's 
 necessary?  This would allow an sql database to be heavily normalised 
 (I'm sure performance is iffy, but we have to pre-suppose some reason 
 why this design is useful for other reasons)

This is really going towards what lib-storage API is supposed to do
already.. It's not even horribly difficult to write a new backend for
it. For example in v2.0 the fully functional Cydir backend code looks
like:

% wc *[ch]
  152   357  3740 cydir-mail.c
  319   783  8420 cydir-save.c
  402  1087 10806 cydir-storage.c
   3582  1085 cydir-storage.h
  187   465  4798 cydir-sync.c
   2454   615 cydir-sync.h
 1119  2828 29464 total

There is still a bit of code duplication between backends that could
reduce the line count by maybe 100-200 lines. Anyway I think the only
good way to implement support for normalized SQL database in Dovecot
would be to implement a new lib-storage backend, and it shouldn't be a
hugely difficult job.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Charles Marcus
On 9/28/2009, Ed W (li...@wildgooses.com) wrote:
 In theory Thunderbird does this, but at least on my machine it just
 repeatedly downloads the same message again and again in various ways
 - it grinds to a halt every time I click on an email with a decent
 sized attachment, even if I have already read it... grr

TB3 has finally fixed this absurd behavior (yay!)...

In fact there are lots of IMAP improvements in v3... I can't wait until
all my extensions catch up, and I figure out how to customize the UI the
way I want (e.g., how in the world do I get rid of the stupid Tabs??)

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Timo Sirainen
On Mon, 2009-09-28 at 19:21 +0100, Ed W wrote:
  In my mind this is more about what lib-storage API was supposed to
  abstract out, whereas my filesystem API would be used simply for binary
  data storage. The same FS API could be used to store both dbox files and
  index files.

 
 I guess in this case it would be interesting to hear the kind of use 
 cases you imagine that the storage API will be used for in practice?  I 
 think I might be kind of overthinking the problem?

lib-storage API has existed since Dovecot v1.0 and it's used to abstract
out access to maildir, mbox, dbox, cydir, etc. SQL would fit right there
with those.

Or did you mean FS API? For that my plans are to implement backends for:

 - POSIX (just the way it works now)
 - Async I/O (once Dovecot can do more things in parallel)
 - Some kind of proxying to support shared mailboxes between different
servers (or within same server when users are using different UIDs and
don't have a common group)
 - Massively distributed database storage for mails
 - In-memory cache for index files, which permanently writes them using
another storage. This is useful for any kind of multi-master setup like
distributed database, NFS, clusterfs.

 Seems like it's a very thin shim between a real file system and dovecot 
 and would be mainly useful for supporting filesystems with non posix 
 protocols, eg someone wants to store their mail files on mogile or DAV, 
 but it doesn't address anything lower or higher than blocks of data?

Right, path/filename (or key) - binary byte stream.

 Seems like it would be useful for:
 
 - implementing very specific optimisations for example for NFS
 - optimisation for filesystems with unusual strengths/weaknesses, eg GFS 
 or Gluster?

In both of these I think the primary problem is that Dovecot tries to do
IPC via filesystem (index files). So accessing the indexes via the
in-memory cache that is guaranteed to be always up-to-date would get rid
of all these ugly NFS cache flushing attempts etc.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Ed W

Timo Sirainen wrote:

On Mon, 2009-09-28 at 19:21 +0100, Ed W wrote:
  

In my mind this is more about what lib-storage API was supposed to
abstract out, whereas my filesystem API would be used simply for binary
data storage. The same FS API could be used to store both dbox files and
index files.
  
  
I guess in this case it would be interesting to hear the kind of use 
cases you imagine that the storage API will be used for in practice?  I 
think I might be kind of overthinking the problem?



lib-storage API has existed since Dovecot v1.0 and it's used to abstract
out access to maildir, mbox, dbox, cydir, etc. SQL would fit right there
with those.
  


OK, I thought that was what you were going to be simplifying...

I did have a poke around in there some time back and it did feel quite 
complicated to follow what was going on... I found your sql backend 
code as a simpler way to poke around, but even there it was pretty 
quickly going to need some earnest digging to figure out how it was all 
working...


OK, I guess this can never be an easy middle ground - presumably things 
are as they are for a reason...


Cheers

Ed W


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Timo Sirainen
On Mon, 2009-09-28 at 20:11 +0100, Ed W wrote:
  lib-storage API has existed since Dovecot v1.0 and it's used to abstract
  out access to maildir, mbox, dbox, cydir, etc. SQL would fit right there
  with those.

 
 OK, I thought that was what you were going to be simplifying...

Nope. It can still be simplified a bit, but only a bit. :) But in every
release I am always simplifying it, moving more and more code to common
functions and making the API more powerful and cleaner at the same
time. :)

 I did have a poke around in there some time back and it did feel quite 
 complicated to follow what was going on... I found your sql backend 
 code as a simpler way to poke around, but even there it was pretty 
 quickly going to need some earnest digging to figure out how it was all 
 working...

The SQL code was for v1.0 and the lib-storage API has been simplified
since then, maybe not hugely but still pretty much. Maybe some day I'll
see about updating the SQL code for v2.0 API.

Oh and some documentation about it would probably help a lot too. I
guess I should write some, someday. :)


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Charles Marcus
On 9/28/2009 4:24 PM, Jeff Grossman wrote:
 In fact there are lots of IMAP improvements in v3... I can't wait until
 all my extensions catch up, and I figure out how to customize the UI the
 way I want (e.g., how in the world do I get rid of the stupid Tabs??)

 You can't get rid of tabs per se, but you can make it so you don't use
 them.  I hate tabs personally also.  Go to Options, Advanced, Reading
 and Display, and select Open Messages In: An Existing Window or A New
 Window.  I use an existing window.

Yeah, already did that, but it *does* still use the Tab bar, everything
is just limited to one tab - the Tab row is still there wasting my
screen real estate.

I'll figure out how to kill it... I know I'm not the only one who
hates/won't use it...

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread Ed W

Timo Sirainen wrote:


The SQL code was for v1.0 and the lib-storage API has been simplified
since then, maybe not hugely but still pretty much. Maybe some day I'll
see about updating the SQL code for v2.0 API.

Oh and some documentation about it would probably help a lot too. I
guess I should write some, someday. :)
  



Some overview docs might be somewhat helpful for sure, but I think at 
this level you probably mainly need to get your hands dirty


Having an example storage engine which is also a bit simpler (eg an 
updated sql engine) would actually be quite good for this I suspect.  I 
quickly dropped looking at the real code for playing with the sql code 
and found it quite a bit simpler to get an overview


Thanks and interested to see this progress

Ed W


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-15 Thread Timo Sirainen
On Wed, 2009-08-12 at 18:18 -0400, Timo Sirainen wrote:
 On Wed, 2009-08-12 at 14:54 -0700, Daniel L. Miller wrote:
  If every attachment in a given message is individually scanned to 
  generate some unique identifier, and that identifier then used to 
  determine whether or not it exists in the database - this could have 
  HUGE effects.  This now addresses not just the simple broadcast - but 
  some really crazy possibilities.
 
 Oh BTW. I think dbmail 2.3 does that. Then again I haven't yet seen a
 stable dbmail version. But looks like they've released 2.3.6 recently
 that I haven't tested yet.

Tested again. Still crashes in the middle of imaptest runs. And imaptest
now reports more bugs than last time I tried..

Archiveopteryx probably does SIS and works better.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-14 Thread Steffen Kaiser

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 12 Aug 2009, Timo Sirainen wrote:


So yeah, either one are as reliable as the script :)


Well, I think this is not what the OP intended :)

This will work for only a little amount of mails I see, because:

a) forwarded messages differ,
b) re-sent messges differ in headers,
c) many mailing lists sent one mail per subscriber to catch user-specific 
bounces (headers differ),
d) some mail relays or MTAs split the recipients list, if it is too large 
(headers differ).


Although I would like it to have in Dovecot, it certainly makes same 
administration stuff on the server more difficult, so I'm not sure, if I 
would actually use it ... .


Bye,

- -- 
Steffen Kaiser

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBSoUqzXWSIuGy1ktrAQKNegf/ehbepOZv7XgKo2vwJkEXqK/WcvmPJcN3
ddYOXT4Rh/E6cets5QegWMVLc6eX7M7/Uxi8HKcoa+Fg1bAJzeWkQkrPdfwcj5EI
KdkhOyx8nC8TrUC8eagKbUp6gvnF4K7TQnOBOTZh2S0rwW35HYuKRJr5OXAGdqmP
G0Xs25/AuInKrI/PW/ahmfdtI5c+pkJqg3wKhRv2MjUqsoyg5exAwjg2L+aV1K21
1rYoHNwAzRsx4DhmZJRuGTH0Me2utllvYMu3JpgzlhNIe7lRz6Cr+yuZ0MvEQ/Ey
/7lMA/U6qmPpYgmpZ4ddvPROTyiOieQ1KK54JLBW0llv7UimkgqPFw==
=HvVe
-END PGP SIGNATURE-


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-14 Thread Charles Marcus
 But for example, what I'd really like to be able to do is say something
 like:
 
 SiS_mode = binary,64K
 
 So only binary attachments over 64KB in size would be checksummed and
 single instance stored. I don't care bout the headers, or body text, or
 tiny ( 64KB) sig attachments, or text attachments (PGP sigs, etc).

Also, I don't care about putting them in an SQL db...

It would be good enough for me to also be able to do:

SiS_dir = /var/virtual/mail/attachments

Have all attachments dumped in there and hardlinked to each message, and
just use a simple index file in the directory with the attachment name
and MD5 checksum (if MD5 is good enough - I'd like to avoid collisions too).

This way the attachments could even be stored in some other filesystem,
to keep the big stuff off the main server.

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-13 Thread Charles Marcus
On 8/12/2009 3:58 PM, Timo Sirainen wrote:
 - and I've never heard of any other mail server supporting such a
 thing.

 Exchange does... and is the single and only reason I have *considered*
 switching to it shudder in the past few years...

 I heard that the next Exchange version drops that feature, because it
 typically causes more disk I/O when reading. I don't know if it's still
 possible to enable it optionally though.

Wow... I can hear a lot of sysadmins screaming at the top of their lungs
if/when they discover this the hard way.

I'm also having trouble figuring out how using hard links (or their
equivalent) for messages with large attachments and having only one
instance of the attachment could cause *more* disk I/O than having
dozens/hundred of multiple copies of the message.

Guess its an Exchange 'feature'... ;)

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-13 Thread Charles Marcus
On 8/12/2009 4:02 PM, Timo Sirainen wrote:
 Do you need per-MIME part single instance storage, or would per-email be
 enough? Since the per-email can already done with hard links.

 The only thing I can find about this on the wiki is where it says single
 instance attachment storage (for dbox) is planned. Is how to accomplish
 single instance email storage documented anywhere? And is this reliable
 enough to use on a production system?

 Two possible ways:

Heh... ok, so when you said 'it is possible', you didn't mean dovecot
has native support for it...

Sadly, since ianap, I will have to wait for something that is officially
supported... but thanks for the explanation.

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-13 Thread Charles Marcus
On 8/12/2009, Ed W (li...@wildgooses.com) wrote:
 It also had a bunch of limitations, it was basically only single
 instance for CC recipients on a message (more or less).  Quite a lot
 of things such as certain types of virus scanning would (I think)
 easily disable the single instance storage also?
 
 So I doubt it would help in most of the cases mentioned here, ie each
 time it was re-forwarded internally it would not be single instanced
 
 I still think it would be instructive to do some benchmarks though -
 often these things look good on paper, but are surprisingly less
 effective (given the implementation cost)  when measured.  I'm not
 disagreeing, just would be interested to see some numbers...

Amazing... I mean, since Exchange is already a 'database', how hard
would it be to do it right (checksum each mime part, and use hardlinks
for subsequent duplicate checkummed mimeparts)? As long as everything
was properly and effectively indexed, it should be easily doable.

Make it do the work at delivery when the load is light enough, and have
a background task that does this for all messages that are not flagged
as having already been de-duped at delivery time when the load is light
enough.

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-13 Thread Charles Marcus
On 8/12/2009, Daniel L. Miller (dmil...@amfes.com) wrote:
 Under the structure I've proposed, net storage consumed by the
 attachments should be one copy of attachment 1, and one copy of
 attachment two, plus headers and any comments in the messages times
 the number of recipients.  Domino would store one copy of attachment
 1, then a copy of attachment 1 + attachment 2, then another copy of
 attachment 1.

Personally, I only care about binary attachments over a certain size.

I have said before, I don't see the value in doing this for every
message and for every mime-part. That said, if it doesn't really cost
anything extra to do the entire message and all mime-parts, then fine, I
don't really have anything against it, as long as it is robust and reliable.

But for example, what I'd really like to be able to do is say something
like:

SiS_mode = binary,64K

So only binary attachments over 64KB in size would be checksummed and
single instance stored. I don't care bout the headers, or body text, or
tiny ( 64KB) sig attachments, or text attachments (PGP sigs, etc).

Again - for shops that must deal with large binary attachments, this
would be a god-send.

Our max allowed message size is 50MB, and we typically get anywhere from
2-10 messages a day containing 20, 30, or even 40MB attachments sent to
our distribution lists - so these would go to 50+ people, who then
forward them to others, etc, etc ad nauseum.

Currently, I have mailman set to hold these, then I go in and strip off
the attachment, put it in a shared location, then let the message (minus
the attachment) through. But we still have a *lot* of messages like this
that don't go through our lists, but are sent to 2, 3, or 10 of our reps
individually.

I did a manual approximation on one persons mail store once, abd
determined that our total storage requirements, if SiS was implemented
for large attachments, would be reduced by about 90-95%. So, from about
2TB currently, to about 100-200GB. That is HUGE, from both a storage
*and* backup standpoint.

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-13 Thread Timo Sirainen

On Aug 13, 2009, at 8:13 AM, Charles Marcus wrote:


I'm also having trouble figuring out how using hard links (or their
equivalent) for messages with large attachments and having only one
instance of the attachment could cause *more* disk I/O than having
dozens/hundred of multiple copies of the message.


The thinking is that nowadays seeks are what's killing disk I/O, so  
whenever possible just do a single large read. With single instance  
storage there would be one additional seek (if the message wasn't  
already in memory).




Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Steffen Kaiser

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 11 Aug 2009, Eric Jon Rostetter wrote:


For a massively scaled system, there may be sufficient performance to
put the queues elsewhere.


Which also allows that the queue can easily have multiple machines pushing 
 poping items.



 But on a small system, with 90% of the mail
being spam/virus/malware, performance will usually dictate local/memory
file systems for such queues...


Well, this discussion reads a bit like local filesystems are prone to 
loose data on crash.

Journaling filesystems, RAID1 / 5 / 10, SANs do their job.

However, I guess that Seth and Timo look at the thing from a different 
point of view, Timo seems to focus on one queue - multiple accessees, 
whereas Seth focuses on temporary working directory.


Bye,

- -- 
Steffen Kaiser

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBSoJtbnWSIuGy1ktrAQKj9Af/ajuegRCmDRZq/E7vt3EwDxd6ob8bNaY0
bP0Vu2bs2df/GeGKbrFiOCNyq4NMADTejNie9WQMANSB8dM7qMPjdLD68rbD70+k
/UIafifb0fXBlvZTrPvKHGf1grB2qb71NAXhPi0QinbCo1CSdP4+J53XssxElrYD
YLpAOBpQFkZ2I3Ji1DDpS4Xu7n0lCG0nf4dB8frtGyBf7BGFis0EpudByAAOMsiJ
MesR5jbz3xFD5KM62YWlOyRF/3DaOCSo1DVMg6TG+ddTyulW0mCsxKRQ01Py7khm
CKp87ucG77gDR1gn341x7zbhH5TtrC1t4rRzpBBujLDcy8F0DkM4yw==
=0WvU
-END PGP SIGNATURE-


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen

On Aug 12, 2009, at 11:26 AM, Ed W wrote:


Hi


* Mail data on the other hand is just written once and usually read
maybe once or a couple of times. Caching mail data in memory probably
doesn't help all that much. Latency isn't such a horrible issue as  
long
as multiple mails can be fetched at once / in parallel, so there's  
only

a single latency wait.



This logically seems correct.  Couple of questions then:

1) Since latency requirements are low, why did performance drop so  
much previously when you implemented a very simple mysql storage  
backend?  I glanced at the code a few weeks ago and whilst it's  
surprisingly complicated right now to implement a backend, I was  
also surprised that a database storage engine sucked I think you  
phrased it? Possibly the code also placed the indexes on the DB?  
Certainly this could very well kill performance?  (Note I'm not  
arguing sql storage is a good thing, I just want to understand the  
latency to backend requirements)


Yes, it placed indexes also to SQL. That's slow. But even without it,  
Dovecot code needs to be changed to access more mails in parallel  
before the performance can be good for high-latency mail storages.


2) I would be thinking that with some care, even very high latency  
storage would be workable, eg S3/Gluster/MogileFs ?  I would love to  
see a backend using S3 - If nothing else I think it would quickly  
highlight all the bottlenecks in any design...


Yes, S3 should be possible. With dbox it could even be used to store  
the old mails and keep new mails in lower latency storage.



5. Implement filesystem backend for dbox and permanent index storage
using some scalable distributed database, such as maybe Cassandra.


CouchDB?  It is just the Lotus Notes database after all, and  
personally I have built some *amazing* applications using that as  
the backend. (I just love the concept of Notes - the gui is another  
matter...)


Note that CouchDB is interesting in that it is multi-master with  
eventual synchronisation.  This potentially has some interesting  
issues/benefits for offline use


CouchDB seems like it would still be more difficult than necessary to  
scale. I'd really just want something that distributes the load and  
disk usage evenly across all servers and allows easily plugging in  
more servers and it automatically rebalances the load. CouchDB seems  
like much of that would have to be done manually (or building scripts  
to do it).


For the filesystem backend have you looked at the various log  
structured filesystems appearing?  Whenever I watch the debate  
between Maildir vs Mailbox I always think that a hybrid is the best  
solution because you are optimising for a write one, read many  
situation, where you have an increased probability of having good  
cache localisation on any given read.


To me this ends up looking like log structured storage... (which  
feels like a hybrid between maildir/mailbox)


Hmm. I don't really see how it looks like log structured storage.. But  
you do know that multi-dbox is kind of a maildir/mbox hybrid, right?



* Scalability, of course. It'll be as scalable as the distributed
database being used to store mails.



I would be very interested to see a kind of where the time goes  
benchmark of dovecot.  Have you measured and found that latency of  
this part accounts for x% of the response time and CPU bound here is  
another y%, etc?  eg if you deliberately introduce X ms of latency  
in the index lookups, what does that do to the response time of the  
system once the cache warms up?  What about if the response time to  
the storage backend changes?  I would have thought this would help  
you determine how to scale this thing?


I haven't really done any explicit benchmarks, but there are a few  
reasons why I think low-latency for indexes is really important:


 * All commands that access mails in any ways need to first do index  
lookup first to find the mail.


 * Anything using IMAP UIDs need to do a binary search on the index  
to find the mail.


 * Anything accessing mail metadata needs to do dovecot.index.cache  
lookups, often many of them. For example FETCH ENVELOPE does something  
like 10 lookups to cache for each mail.


 * After each command Dovecot needs to check if there are new mails  
by checking if dovecot.index.log has changed.


I think it's pretty obvious that if any of those lookups had latency  
the performance would soon become pretty horrible. And the reasons why  
I think the actual mail storage can live with high latency:


 * Whenever processing a command, Dovecot knows beforehand what kind  
of data it needs. It can quickly go through index/cache file to find  
out what message contents it needs to have, and then send requests to  
all of those immediately. (Or if there are hundreds, maybe always have  
something like 20 queued, or whatever is good.) After the first one  
has arrived, the rest should already be available immediately 

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Eric Rostetter
On Aug 12, 2009, at 2:21 AM, Steffen Kaiser skdove...@smail.inf.fh-brs.de 
 wrote:



-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 11 Aug 2009, Eric Jon Rostetter wrote:


For a massively scaled system, there may be sufficient performance to
put the queues elsewhere.


Which also allows that the queue can easily have multiple machines  
pushing  poping items.


Pushing is easy. Popping can be more problematic, depending on varios  
factors.



But on a small system, with 90% of the mail
being spam/virus/malware, performance will usually dictate local/ 
memory

file systems for such queues...


Well, this discussion reads a bit like local filesystems are prone  
to loose data on crash.

Journaling filesystems, RAID1 / 5 / 10, SANs do their job.


The issue I brought up is OS caching and is not dependent on the  
backend really. Only real solution is redundent storage AND disabling  
OS caching, which is not cheap and won't be the best performance.   
Always a tradeoff.



However, I guess that Seth and Timo look at the thing from a  
different point of view, Timo seems to focus on one queue -  
multiple accessees, whereas Seth focuses on temporary working  
directory.


Well Timo looks at it from dovecot's point of view.

I look at it from a mail server's point of view (MTA also, etc).


Bye,

- -- Steffen Kaiser
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBSoJtbnWSIuGy1ktrAQKj9Af/ajuegRCmDRZq/E7vt3EwDxd6ob8bNaY0
bP0Vu2bs2df/GeGKbrFiOCNyq4NMADTejNie9WQMANSB8dM7qMPjdLD68rbD70+k
/UIafifb0fXBlvZTrPvKHGf1grB2qb71NAXhPi0QinbCo1CSdP4+J53XssxElrYD
YLpAOBpQFkZ2I3Ji1DDpS4Xu7n0lCG0nf4dB8frtGyBf7BGFis0EpudByAAOMsiJ
MesR5jbz3xFD5KM62YWlOyRF/3DaOCSo1DVMg6TG+ddTyulW0mCsxKRQ01Py7khm
CKp87ucG77gDR1gn341x7zbhH5TtrC1t4rRzpBBujLDcy8F0DkM4yw==
=0WvU
-END PGP SIGNATURE-


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Ed W



1) Since latency requirements are low, why did performance drop so 
much previously when you implemented a very simple mysql storage 
backend?  I glanced at the code a few weeks ago and whilst it's 
surprisingly complicated right now to implement a backend, I was also 
surprised that a database storage engine sucked I think you phrased 
it? Possibly the code also placed the indexes on the DB? Certainly 
this could very well kill performance?  (Note I'm not arguing sql 
storage is a good thing, I just want to understand the latency to 
backend requirements)


Yes, it placed indexes also to SQL. That's slow. But even without it, 
Dovecot code needs to be changed to access more mails in parallel 
before the performance can be good for high-latency mail storages.


My expectation then is that with local index and sql message storage 
that the performance should be very reasonable for a large class of 
users... (ok, other problems perhaps arise)



2) I would be thinking that with some care, even very high latency 
storage would be workable, eg S3/Gluster/MogileFs ?  I would love to 
see a backend using S3 - If nothing else I think it would quickly 
highlight all the bottlenecks in any design...


Yes, S3 should be possible. With dbox it could even be used to store 
the old mails and keep new mails in lower latency storage.


Mogile doesn't handle S3, but I always thought it would be terrific to 
be able to have one copy of your data on fast local storage, but to be 
able to use slower (sometimes cheaper) storage for backups or less 
valuable data (eg older messages), ie replicating some data to other 
storage boxes



CouchDB seems like it would still be more difficult than necessary to 
scale. I'd really just want something that distributes the load and 
disk usage evenly across all servers and allows easily plugging in 
more servers and it automatically rebalances the load. CouchDB seems 
like much of that would have to be done manually (or building scripts 
to do it).


Ahh fair enough - I thought it being massively multi-master would allow 
simply querying different machines for different users.  Not a perfect 
scale-out, but good enough for a whole class of requirements...


For the filesystem backend have you looked at the various log 
structured filesystems appearing?  Whenever I watch the debate 
between Maildir vs Mailbox I always think that a hybrid is the best 
solution because you are optimising for a write one, read many 
situation, where you have an increased probability of having good 
cache localisation on any given read.


To me this ends up looking like log structured storage... (which 
feels like a hybrid between maildir/mailbox)


Hmm. I don't really see how it looks like log structured storage.. But 
you do know that multi-dbox is kind of a maildir/mbox hybrid, right?


Well the access is largely append only, with some deletes and noise at 
the writing end, but largely the older storage stays static with much 
longer gaps between deletes (and extremely infrequent edits)


So maildir is optimised really for deletes, but improves random access 
for a subset of operations.  Mailbox is optimised for writes and seems 
like it's generally fast for most operations except deletes (people do 
worry about having a lot of eggs in one basket, but I think this is 
really a symptom of other problems at work).  Mailbox also has improved 
packing for small messages and probably has improved cache locality on 
certain read patterns


So one obvious hybrid would be a mailbox type structure which perhaps 
splits messages up into variable sized sub mailboxes based on various 
criteria, perhaps including message age, type of message or message 
size...?  The rapid write delete would happen at the head, perhaps even 
as a maildir layout and gradually the storage would become larger and 
ever more compressed mailboxes as the age/frequency of access/etc declines.


Perhaps this is exactly dbox?

It would also be interesting to consier separate message headers from 
body content.  Have heavy localisation of message headers, and slower 
higher latency access to the message body.  Would this improve access 
speeds in general?  Also the mime structure could be torn apart to store 
attachments individually - the motivation being single instance storage 
of large attachments with identical content...  Anyway, these seem like 
very speculative directions...




I haven't really done any explicit benchmarks, but there are a few 
reasons why I think low-latency for indexes is really important:


I think low latency for indexes is a given.  You appear to have 
architected the system so that all responses are delivered from the 
index and baring an increase in index efficiency the remaining time is 
spent doing the initial generation and maintenance of those indexes.  I 
would have thought bar downloading an entire INBOX that the access time 
of individual mails was very much secondary?



- If the goal is performance 

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 17:46 +0100, Ed W wrote:
 My expectation then is that with local index and sql message storage 
 that the performance should be very reasonable for a large class of 
 users... (ok, other problems perhaps arise)

If messages are stored to SQL in dummy blobs then the performance is
probably comparable to any other database I'm thinking about.

  Yes, S3 should be possible. With dbox it could even be used to store 
  the old mails and keep new mails in lower latency storage.
 
 Mogile doesn't handle S3, but I always thought it would be terrific to 
 be able to have one copy of your data on fast local storage, but to be 
 able to use slower (sometimes cheaper) storage for backups or less 
 valuable data (eg older messages), ie replicating some data to other 
 storage boxes

dsync can do the replication, dbox can have primary/secondary partitions
for message data (if mail is not found from primary, it's looked up from
secondary). All that's needed is lib-storage backend for S3, or using
some filesystem layer to it. :)

  CouchDB seems like it would still be more difficult than necessary to 
  scale. I'd really just want something that distributes the load and 
  disk usage evenly across all servers and allows easily plugging in 
  more servers and it automatically rebalances the load. CouchDB seems 
  like much of that would have to be done manually (or building scripts 
  to do it).
 
 Ahh fair enough - I thought it being massively multi-master would allow 
 simply querying different machines for different users.  Not a perfect 
 scale-out, but good enough for a whole class of requirements...

If users' all mails are stuck on a particular cluster of servers, it's
possible that suddenly several users in those servers starts increasing
their disk load or disk usage and starts killing the performance /
available space for others. If a user's mails were spread across 100
servers, this would be much less likely.

  Hmm. I don't really see how it looks like log structured storage.. But 
  you do know that multi-dbox is kind of a maildir/mbox hybrid, right?
 
 Well the access is largely append only, with some deletes and noise at 
 the writing end, but largely the older storage stays static with much 
 longer gaps between deletes (and extremely infrequent edits)

Ah, right. I guess if you think about it from a single user's mails
point of view.

 So maildir is optimised really for deletes, but improves random access 
 for a subset of operations.  Mailbox is optimised for writes and seems 
 like it's generally fast for most operations except deletes (people do 
 worry about having a lot of eggs in one basket, but I think this is 
 really a symptom of other problems at work).  Mailbox also has improved 
 packing for small messages and probably has improved cache locality on 
 certain read patterns

Yes, this is why I'm also using mbox on dovecot.org for mailing list
archives.

 So one obvious hybrid would be a mailbox type structure which perhaps 
 splits messages up into variable sized sub mailboxes based on various 
 criteria, perhaps including message age, type of message or message 
 size...?  The rapid write delete would happen at the head, perhaps even 
 as a maildir layout and gradually the storage would become larger and 
 ever more compressed mailboxes as the age/frequency of access/etc declines.
 
 Perhaps this is exactly dbox?

Something like that. In dbox you have one storage directory containing
all mailboxes' mails (so that copying can be done by simple index
updates). Then you have a bunch of files, each about n MB (configurable,
2 MB by default). Expunging initially only marks the message as expunged
in index. Then later (or immediately, configurable) you run a cronjob
that goes through all dboxes and actually removes the used space by
recreating those dbox files.

 It would also be interesting to consier separate message headers from 
 body content.  Have heavy localisation of message headers, and slower 
 higher latency access to the message body.  Would this improve access 
 speeds in general?  

Probably not much. Usually I think clients download a specific set of
headers, and those can be looked up from dovecot.index.cache file.
Although if a new header is looked up from all messages that's not in
cache already, it would be faster to go through headers if they were
packed together separately. But then again that would make it maybe a
bit slower to download full message, since it's split to two places.

I don't really know, but my feeling is that it wouldn't benefit all that
much.

 Also the mime structure could be torn apart to store 
 attachments individually - the motivation being single instance storage 
 of large attachments with identical content...  Anyway, these seem like 
 very speculative directions...

Yes, this is also something in dbox's far future plans.

  I haven't really done any explicit benchmarks, but there are a few 
  reasons why I think low-latency for 

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Ed W


CouchDB seems like it would still be more difficult than necessary to 
scale. I'd really just want something that distributes the load and 
disk usage evenly across all servers and allows easily plugging in 
more servers and it automatically rebalances the load. CouchDB seems 
like much of that would have to be done manually (or building scripts 
to do it).
  
Ahh fair enough - I thought it being massively multi-master would allow 
simply querying different machines for different users.  Not a perfect 
scale-out, but good enough for a whole class of requirements...



If users' all mails are stuck on a particular cluster of servers, it's
possible that suddenly several users in those servers starts increasing
their disk load or disk usage and starts killing the performance /
available space for others. If a user's mails were spread across 100
servers, this would be much less likely.
  


Sure - I'm not a couchdb expert, but I think the point is that we would 
need to check the replication options because you would simply balance 
the requests across all the servers holding those users' data.  I'm kind 
of assuming that data would be replicated across more than one server 
and there would be some way of choosing which server to use for a given user


I only know couchdb to the extent of having glanced at the website some 
time back, but I liked the way it looks and thinks like Lotus Notes (I 
did love building things using that tool about 15 years ago - the 
replication was just years ahead of it's time.  The robustness was 
extraordinary and I remember when the IRA blew up a chunk of Manchester 
(including one of our servers) that everyone just went home and started 
using the Edinburgh or London office servers and carried on as though 
nothing happened...)


Actually it's materialised views are rather clever also...

  
Hmm. I don't really see how it looks like log structured storage.. But 
you do know that multi-dbox is kind of a maildir/mbox hybrid, right?
  
Well the access is largely append only, with some deletes and noise at 
the writing end, but largely the older storage stays static with much 
longer gaps between deletes (and extremely infrequent edits)



Ah, right. I guess if you think about it from a single user's mails
point of view.
  


Well, single folder really


So maildir is optimised really for deletes, but improves random access 
for a subset of operations.  Mailbox is optimised for writes and seems 
like it's generally fast for most operations except deletes (people do 
worry about having a lot of eggs in one basket, but I think this is 
really a symptom of other problems at work).  Mailbox also has improved 
packing for small messages and probably has improved cache locality on 
certain read patterns



Yes, this is why I'm also using mbox on dovecot.org for mailing list
archives.
  


Actually I use maildir, but apart from delete performance which is 
usually rare, mailbox seems better for nearly all use patterns


Seems like if it were possible to solve delete performance then 
mailbox becomes the preferred choice for many requirements (also lets 
solve the backup problem where the whole file changes every day)



So one obvious hybrid would be a mailbox type structure which perhaps 
splits messages up into variable sized sub mailboxes based on various 
criteria, perhaps including message age, type of message or message 
size...?  The rapid write delete would happen at the head, perhaps even 
as a maildir layout and gradually the storage would become larger and 
ever more compressed mailboxes as the age/frequency of access/etc declines.


Perhaps this is exactly dbox?



Something like that. In dbox you have one storage directory containing
all mailboxes' mails (so that copying can be done by simple index
updates). Then you have a bunch of files, each about n MB (configurable,
2 MB by default). Expunging initially only marks the message as expunged
in index. Then later (or immediately, configurable) you run a cronjob
that goes through all dboxes and actually removes the used space by
recreating those dbox files.
  


Yeah, sounds good.

You might consider some kind of head optimisation, where we can 
already assume that the latest chunk of mails will be noisy and have a 
mixture of deletes/appends, etc.  Typically mail arrives, gets responded 
to, gets deleted quickly, but I would *guess* that if a mail survives 
for XX hours in a mailbox then likely it's going to continue to stay 
there for quite a long time until some kind of purge event happens (user 
goes on a purge, archive task, etc)



Sounds good anyway


Oh, have you considered some optional api calls in the storage API?  
The logic might be to assume that someone wanted to do something clever 
and split the message up in some way, eg store headers separately to 
bodies or bodies carved up into mime parts.  The motivation would be if 
there was a certain access pattern to optimise.  Eg for an SQL 

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 18:42 +0100, Ed W wrote:
  Something like that. In dbox you have one storage directory containing
  all mailboxes' mails (so that copying can be done by simple index
  updates). Then you have a bunch of files, each about n MB (configurable,
  2 MB by default). Expunging initially only marks the message as expunged
  in index. Then later (or immediately, configurable) you run a cronjob
  that goes through all dboxes and actually removes the used space by
  recreating those dbox files.

 
 Yeah, sounds good.
 
 You might consider some kind of head optimisation, where we can
 already assume that the latest chunk of mails will be noisy and have a
 mixture of deletes/appends, etc.  Typically mail arrives, gets
 responded to, gets deleted quickly, but I would *guess* that if a mail
 survives for XX hours in a mailbox then likely it's going to continue
 to stay there for quite a long time until some kind of purge event
 happens (user goes on a purge, archive task, etc)

If disk space usage isn't such a huge problem, I think the nightly
purges solve this issue too. During the day user may get mails and
delete them, and at night the deleted mails are purged. Perhaps it could
help a bit if new mails were all stored in separate file(s) and at night
then appended to some larger existing file, but that optimization can be
left until later. :)

 Oh, have you considered some optional api calls in the storage API?
 The logic might be to assume that someone wanted to do something
 clever and split the message up in some way, eg store headers
 separately to bodies or bodies carved up into mime parts.  The
 motivation would be if there was a certain access pattern to optimise.
 Eg for an SQL database it may well be sensible to split headers and
 the message body in order to optimise searching - the current API may
 not take advantage of that?  

Well, files have paths. I think the storage backend can determine from
that what type the data is. So if you're writing to mails/foo/bar/123 it
means you're storing a message with ID 123 to mailbox foo/bar. It
could then internally parse the message and store its header/body/mime
separately.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Ed W



Oh, have you considered some optional api calls in the storage API?
The logic might be to assume that someone wanted to do something
clever and split the message up in some way, eg store headers
separately to bodies or bodies carved up into mime parts.  The
motivation would be if there was a certain access pattern to optimise.
Eg for an SQL database it may well be sensible to split headers and
the message body in order to optimise searching - the current API may
not take advantage of that?  



Well, files have paths. I think the storage backend can determine from
that what type the data is. So if you're writing to mails/foo/bar/123 it
means you're storing a message with ID 123 to mailbox foo/bar. It
could then internally parse the message and store its header/body/mime
separately.
  



But would the storage be used optimally if there was a requirement to 
read in all headers from all emails, say in order to build the cache of 
messages on Subject, or what about a backend which has some sort of 
search capability that we could usefully leverage? It's worth 
considering anyway because this looks like a design to remote the main 
storage from the imap server side and scale out (massively) so network 
capacity might be worth planning for being a limited resource?


Does it make sense to push some of the understanding of a message 
structure down to the storage backend? Perhaps it could be in some way 
optional with a more bruteforce option available on the dovecot side?  
ie like fuse, implement what you need and not more?


Ed W


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Daniel L. Miller

Timo Sirainen wrote:
Also the mime structure could be torn apart to store 
attachments individually - the motivation being single instance storage 
of large attachments with identical content...  Anyway, these seem like 
very speculative directions...



Yes, this is also something in dbox's far future plans.
  
Speaking as a pathetic little admin of a small site of 20 users, my 
needs for replication  scalability are quite minor.  However, 
single-instance storage would be a miracle of biblical proportions.  Has 
any progress been made on this?  Do you have a roadmap for how you plan 
on implementing it?


I don't know if you've considered this at all - this was my first thought:

If you're able to store a message with the attachments separately, then 
you can come up with an attachment database (not meaning to imply SQL 
backend).  Then after breaking the message up into message + 
attachments, you scan the attachment database to see if it is already 
present prior to saving it.  This could mean that not only could we save 
on the huge space wasted by idiots merrily forwarding large attachments 
to multiple people, but even received mails with embedded graphical 
signatures would benefit.

--
Daniel


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 19:19 +0100, Ed W wrote:
 I actually thought your idea of having a bunch of cut down IMAP type 
 servers as the backend storage talking to a bunch of beefier frontend 
 servers was quite an interesting idea!
 
 Certainly though a simplification of the on-disk API would encourage new 
 storage engines, so perhaps a three tier infrastructure is worth 
 considering? (Frontend, intelligent backend, storage)

I guess this is something similar to what I wrote in my v3.0
architecture mail. This new FS abstraction solves some of those
problems that v3.0 was supposed to solve, so I'm not that excited about
it anymore. But sure, maybe some day. :) For now I'm anyway more
interested about getting a simple FS abstraction done.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 11:35 -0700, Daniel L. Miller wrote:
 Timo Sirainen wrote:
  Also the mime structure could be torn apart to store 
  attachments individually - the motivation being single instance storage 
  of large attachments with identical content...  Anyway, these seem like 
  very speculative directions...
  
 
  Yes, this is also something in dbox's far future plans.

 Speaking as a pathetic little admin of a small site of 20 users, my 
 needs for replication  scalability are quite minor.  However, 
 single-instance storage would be a miracle of biblical proportions.  Has 
 any progress been made on this?  

Do you need per-MIME part single instance storage, or would per-email be
enough? Since the per-email can already done with hard links.

 Do you have a roadmap for how you plan on implementing it?

I've written about it a couple of times I think, but no specific plans.
Something about using hashes anyway.

 I don't know if you've considered this at all - this was my first thought:
 
 If you're able to store a message with the attachments separately, then 
 you can come up with an attachment database (not meaning to imply SQL 
 backend).  Then after breaking the message up into message + 
 attachments, you scan the attachment database to see if it is already 
 present prior to saving it.  This could mean that not only could we save 
 on the huge space wasted by idiots merrily forwarding large attachments 
 to multiple people, but even received mails with embedded graphical 
 signatures would benefit.

Yes, that's pretty much how I thought about it. It's anyway going to be
dbox-only feature. Would be way too much trouble with other formats.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Ed W

Daniel L. Miller wrote:

Timo Sirainen wrote:
Also the mime structure could be torn apart to store attachments 
individually - the motivation being single instance storage of large 
attachments with identical content...  Anyway, these seem like very 
speculative directions...



Yes, this is also something in dbox's far future plans.
  
Speaking as a pathetic little admin of a small site of 20 users, my 
needs for replication  scalability are quite minor.  However, 
single-instance storage would be a miracle of biblical proportions.  
Has any progress been made on this?  Do you have a roadmap for how you 
plan on implementing it?


I don't know if you've considered this at all - this was my first 
thought:


If you're able to store a message with the attachments separately, 
then you can come up with an attachment database (not meaning to imply 
SQL backend).  Then after breaking the message up into message + 
attachments, you scan the attachment database to see if it is already 
present prior to saving it.  This could mean that not only could we 
save on the huge space wasted by idiots merrily forwarding large 
attachments to multiple people, but even received mails with embedded 
graphical signatures would benefit.


It would be interesting to quickly script something in perl (see one of 
the Mime parsers) to simply scan every email on your system, do an MD5 
of each mime part, then stick this in a dictionary (with the size) and 
count the number of hits greater than one (ie duplicate parts).  Count 
the bytes saved and share the script so we can all have a play


I do like the idea of single instance storage, but I'm actually willing 
to bet it makes only a few percent difference in storage cost for the 
majority of mail servers (I dare say your mileage will vary, but my 
point was to benchmark it)


I don't mean this as a negative, but more that I nearly scripted this a 
couple of months back for my own needs and then ran out of time.  I 
think it won't be more than 50 lines of perl and would be interesting to 
see how people's numbers vary?


Ed W


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Charles Marcus
On 8/12/2009, Timo Sirainen (t...@iki.fi) wrote:
 Do you need per-MIME part single instance storage, or would per-email be
 enough? Since the per-email can already done with hard links.

Our users are constantly in-line forwarding the same emails with (20+MB)
attachment(s) to different people, but completely altering the body
content, so we would definitely need per mime-part, since only the large
binary attachments would be identical.

So, I would also regard this as a miracle (dunno about biblical
proportions, but close), as long as it applies server wide - ie, all
domains hosted by one particular dovecot instance.

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Charles Marcus
On 8/12/2009, Timo Sirainen (t...@iki.fi) wrote:
 Do you need per-MIME part single instance storage, or would per-email be
 enough? Since the per-email can already done with hard links.

The only thing I can find about this on the wiki is where it says single
instance attachment storage (for dbox) is planned. Is how to accomplish
single instance email storage documented anywhere? And is this reliable
enough to use on a production system?

The reason I ask is, this would solve *one* of our problems, namely, my
having to limit attachments on our mail lists. Since these emails would
be identical, I could start allowing large attachments to them and there
would be only one actual message stored with the subsequent deliveries
being hard links?

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Charles Marcus
On 8/12/2009, Daniel L. Miller (dmil...@amfes.com) wrote:
 dbox-only is fine.  I could care less about the storage method chosen
 - filesystem, db, encrypted, whatever - but I believe the impact on
 storage - and possibly indexes  searching - would be huge.

It would be huge for us and anyone else that deals with a lot of large
attachments (we're in the advertising industry).

 On the personal greedy side, if you want to see a mass corporate
 migration to Dovecot, with potential service contracts - that would
 be a feature worth talking about.  I can see IT manager's eyes light
 up at hearing about such a item

Mine are shining right now... ;)

 - and I've never heard of any other mail server supporting such a
 thing.

Exchange does... and is the single and only reason I have *considered*
switching to it shudder in the past few years...

-- 

Best regards,

Charles


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 15:42 -0400, Charles Marcus wrote:
  - and I've never heard of any other mail server supporting such a
  thing.
 
 Exchange does... and is the single and only reason I have *considered*
 switching to it shudder in the past few years...

I heard that the next Exchange version drops that feature, because it
typically causes more disk I/O when reading. I don't know if it's still
possible to enable it optionally though.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 15:33 -0400, Charles Marcus wrote:
 On 8/12/2009, Timo Sirainen (t...@iki.fi) wrote:
  Do you need per-MIME part single instance storage, or would per-email be
  enough? Since the per-email can already done with hard links.
 
 The only thing I can find about this on the wiki is where it says single
 instance attachment storage (for dbox) is planned. Is how to accomplish
 single instance email storage documented anywhere? And is this reliable
 enough to use on a production system?

Two possible ways:

a) Just write a script to find identical mails and replace them with
hard links to the same file. :)

b) Use deliver -p file for delivering mails. You'll probably need to
write some kind of a script for delivering mails, so that when it gets
called with multiple recipients it can write the mail to a temp file and
call deliver -p for each recipient using the same file.

So yeah, either one are as reliable as the script :)


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Ed W

Timo Sirainen wrote:

On Wed, 2009-08-12 at 15:42 -0400, Charles Marcus wrote:
  

- and I've never heard of any other mail server supporting such a
thing.
  

Exchange does... and is the single and only reason I have *considered*
switching to it shudder in the past few years...



I heard that the next Exchange version drops that feature, because it
typically causes more disk I/O when reading. I don't know if it's still
possible to enable it optionally though.

  


It also had a bunch of limitations, it was basically only single 
instance for CC recipients on a message (more or less).  Quite a lot of 
things such as certain types of virus scanning would (I think) easily 
disable the single instance storage also?


So I doubt it would help in most of the cases mentioned here, ie each 
time it was re-forwarded internally it would not be single instanced


I still think it would be instructive to do some benchmarks though - 
often these things look good on paper, but are surprisingly less 
effective (given the implementation cost)  when measured.  I'm not 
disagreeing, just would be interested to see some numbers...


I think perl's Mimetools would make it pretty easy to build something 
which scanned all files and created a hash of all interesting 
attachments.  Quite possibly there is an even more clever way to get the 
same through misusing some Dovecot feature?


Good luck!

Ed W


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Steve

 Original-Nachricht 
 Datum: Wed, 12 Aug 2009 12:34:40 -0700
 Von: Daniel L. Miller dmil...@amfes.com
 An: Dovecot Mailing List dovecot@dovecot.org
 Betreff: Re: [Dovecot] Scalability plans: Abstract out filesystem and make it 
 someone else\'s problem

 Timo Sirainen wrote:
  On Wed, 2009-08-12 at 11:35 -0700, Daniel L. Miller wrote:

  Timo Sirainen wrote:
  
  Also the mime structure could be torn apart to store 
  attachments individually - the motivation being single instance
 storage 
  of large attachments with identical content...  Anyway, these seem
 like 
  very speculative directions...
  
  
  Yes, this is also something in dbox's far future plans.


  Speaking as a pathetic little admin of a small site of 20 users, my 
  needs for replication  scalability are quite minor.  However, 
  single-instance storage would be a miracle of biblical proportions. 
 Has 
  any progress been made on this?  
  
 
  Do you need per-MIME part single instance storage, or would per-email be
  enough? Since the per-email can already done with hard links.

 Definitely per MIME part.
  Do you have a roadmap for how you plan on implementing it?
  
 
  I've written about it a couple of times I think, but no specific plans.
  Something about using hashes anyway.
 

  I don't know if you've considered this at all - this was my first
 thought:
 
  If you're able to store a message with the attachments separately, then
  you can come up with an attachment database (not meaning to imply SQL 
  backend).  Then after breaking the message up into message + 
  attachments, you scan the attachment database to see if it is already 
  present prior to saving it.  This could mean that not only could we
 save 
  on the huge space wasted by idiots merrily forwarding large attachments
  to multiple people, but even received mails with embedded graphical 
  signatures would benefit.
  
 
  Yes, that's pretty much how I thought about it. It's anyway going to be
  dbox-only feature. Would be way too much trouble with other formats.

 dbox-only is fine.  I could care less about the storage method chosen - 
 filesystem, db, encrypted, whatever - but I believe the impact on 
 storage - and possibly indexes  searching - would be huge.
 
 On the personal greedy side, if you want to see a mass corporate 
 migration to Dovecot, with potential service contracts - that would be a 
 feature worth talking about.  I can see IT manager's eyes light up at 
 hearing about such a item - and I've never heard of any other mail 
 server supporting such a thing.

IBM Lotus Domino has that feature since ages (they call it shared mail). And 
they don't have that just for normal mails but for archives as well (called 
single instance store). This feature was first introduced in cc:Mail and then 
got integrated into Domino and is still there and even extended to work with 
various backends (like the new DB2 backend). Microsoft copied that concept from 
them (from my viewpoint the way how MS has done it in the past was horrible. I 
think newer versions work better but I am not sure).

From my experience in doing messaging since 2 decades I can tell you that it 
is not worth doing single instance store (or how ever you call it). Storage is 
ultra cheep these days and backup systems are so fast that all the benefits 
which where valid some years ago are gone today.

It might rock your geek heart to implement something like that but doing the 
math on costs versus benefits will soon or later show you that today it's not 
worth doing it.


 --
 Daniel

Steve

-- 
Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate
für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Daniel L. Miller

Steve wrote:


dbox-only is fine.  I could care less about the storage method chosen - 
filesystem, db, encrypted, whatever - but I believe the impact on 
storage - and possibly indexes  searching - would be huge.


On the personal greedy side, if you want to see a mass corporate 
migration to Dovecot, with potential service contracts - that would be a 
feature worth talking about.  I can see IT manager's eyes light up at 
hearing about such a item - and I've never heard of any other mail 
server supporting such a thing.




IBM Lotus Domino has that feature since ages (they call it shared mail). And 
they don't have that just for normal mails but for archives as well (called 
single instance store). This feature was first introduced in cc:Mail and then 
got integrated into Domino and is still there and even extended to work with 
various backends (like the new DB2 backend). Microsoft copied that concept from 
them (from my viewpoint the way how MS has done it in the past was horrible. I 
think newer versions work better but I am not sure).

From my experience in doing messaging since 2 decades I can tell you that it 
is not worth doing single instance store (or how ever you call it). Storage is 
ultra cheep these days and backup systems are so fast that all the benefits which 
where valid some years ago are gone today.

It might rock your geek heart to implement something like that but doing the 
math on costs versus benefits will soon or later show you that today it's not 
worth doing it.
I have no experience with Domino, but I just did a Google for lotus 
domino shared mail and read the brief on lotus.com.  Based on what I 
read, it has potential - only splits message headers from bodies and 
stores the bodies as complete images, without separating attachments.  
That helps reduce the load when somebody blasts out a flier to everyone 
in the company in a single message - but I'm asking for something more 
ambitious.


If every attachment in a given message is individually scanned to 
generate some unique identifier, and that identifier then used to 
determine whether or not it exists in the database - this could have 
HUGE effects.  This now addresses not just the simple broadcast - but 
some really crazy possibilities.


User A receives a message with an attachment (like a product brochure), 
likes it, and forwards it to Users B-Z.
User F recognizes that product, but has a counter-proposal, so he 
attaches another brochure and replies to A-Z.  Being an idiot, the 
original attachment is still kept in the reply.

User H forwards this message to a buddy at another company for discussion.
[...time passes...]
Three weeks later, User 101 at the other company gets back from 
vacation, has just received a message with the original brochure.  He 
forwards it to User A (who started this mess).
User A, being a total dimwit, doesn't recognize that he already spread 
this junk throughout the company last month - so he broadcasts it again.


Under the structure I've proposed, net storage consumed by the 
attachments should be one copy of attachment 1, and one copy of 
attachment two, plus headers and any comments in the messages times the 
number of recipients.  Domino would store one copy of attachment 1, then 
a copy of attachment 1 + attachment 2, then another copy of attachment 1.


This is a minor example - but I just wanted to show SOMETHING to justify 
the effort.


As far as cheap storage - I agree costs are a fraction of what they once 
were.  But by reducing the amount stored, consider the tradeoffs of 
reduced caching, smaller differential backups, and reduced archival 
costs (off-site storage costs often calculated per GB), just to name a 
few.  To me the only down side (other than requiring Timo to invest more 
blood, sweat,  tears in this project) is how much this costs in message 
READ time.  For me, typical user interaction is reading.  As I believe 
previously mentioned, if the server implements some type of delayed 
delete function, then delete times are not a concern.  And write times 
are also (I think) a minor concern.  But the primary issue is how fast 
can we retrieve a message + attachments and stream it to the client.  It 
seems to be header lists won't be impacted, so simply pointing the mail 
client at the server to see a list of mail shouldn't change at all.  So 
then the question is the potential latency from when a user selects a 
message to when it appears on their screen.  Will the time spent 
searching the disk, and assembling the message, be significant when 
compared with the network communication between server  client?


--
Daniel


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 14:54 -0700, Daniel L. Miller wrote:
 If every attachment in a given message is individually scanned to 
 generate some unique identifier, and that identifier then used to 
 determine whether or not it exists in the database - this could have 
 HUGE effects.  This now addresses not just the simple broadcast - but 
 some really crazy possibilities.

Oh BTW. I think dbmail 2.3 does that. Then again I haven't yet seen a
stable dbmail version. But looks like they've released 2.3.6 recently
that I haven't tested yet.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen
On Wed, 2009-08-12 at 18:18 -0400, Timo Sirainen wrote:
 Oh BTW. I think dbmail 2.3 does that. Then again I haven't yet seen a
 stable dbmail version. But looks like they've released 2.3.6 recently
 that I haven't tested yet.

Looks like it even does single instance header values:

 The header caching tables used since 2.2 have been replaced with a new
 schema, optimized for a much smaller storage footprint, and therefor
 faster access. Headers are now cached using a single-instance storage
 pattern, similar to the one used for the message parts. This change
 also introduces for the first time the appearance of views in the
 database, which is somewhat experimental because of some uncertainties
 with regard to the possible performance impact this may have.

But somehow I think the performance isn't going to be very good for
downloading the full header if it has to piece it together from lots of
fields stored all around the database.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Daniel L. Miller

Timo Sirainen wrote:

On Wed, 2009-08-12 at 18:18 -0400, Timo Sirainen wrote:
  

Oh BTW. I think dbmail 2.3 does that. Then again I haven't yet seen a
stable dbmail version. But looks like they've released 2.3.6 recently
that I haven't tested yet.



Looks like it even does single instance header values:
  
LOL - I started off hijacking this thread for SIS - and now you just 
invited the next one:  Have you done, or are you aware of, recent 
comparisons between Dovecot  dbmail?  I'd like to think Dovecot is 
faster, more stable, more feature-rich, and less fattening...


I don't WANT dbmail!
  

The header caching tables used since 2.2 have been replaced with a new
schema, optimized for a much smaller storage footprint, and therefor
faster access. Headers are now cached using a single-instance storage
pattern, similar to the one used for the message parts. This change
also introduces for the first time the appearance of views in the
database, which is somewhat experimental because of some uncertainties
with regard to the possible performance impact this may have.



But somehow I think the performance isn't going to be very good for
downloading the full header if it has to piece it together from lots of
fields stored all around the database.
  
Do you have performance concerns for what we've been discussing for SIS 
in Dovecot?


We can spin off some other threads if you'd prefer to return to your 
original question - but I guess the question on everybody's (well, at 
least mine) mind right now is will YOU try to implement SIS in the near 
future?  Regardless of the backend used?


--
Daniel


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Daniel L. Miller
Ha!  Fooled you!  I'm going to reply to the original question instead of 
SIS!


Timo Sirainen wrote:

 * Index files are really more like memory dumps. They're already in an
optimal format for keeping them in memory, so they can be just mmap()ed
and used. Doing some kind of translation to another format would just
make it more complex and slower.

 * Index and mail data is very different. Index data is accessed
constantly and it must be very low latency or performance will be
horrible. It practically should be in memory in local machine and there
shouldn't normally be any network lookups when accessing it.
  

Ok, I lied.  I'm going to start something new.

Do the indexes contain any of the header information?  In particular, 
since I know nothing of the communication between IMAP clients  servers 
in general, is the information that is shown in typical client mail 
lists (subject, sender, date, etc.) stored in the indexes?  I guess I'm 
asking if any planned changes will have an impact in retrieving message 
lists in any way.


--
Daniel


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-12 Thread Timo Sirainen

On Aug 12, 2009, at 6:54 PM, Daniel L. Miller wrote:


Do the indexes contain any of the header information?


Yes.

 In particular, since I know nothing of the communication between  
IMAP clients  servers in general, is the information that is shown  
in typical client mail lists (subject, sender, date, etc.) stored in  
the indexes?


Yes. Dovecot adds to cache file those headers that the client requests.

 I guess I'm asking if any planned changes will have an impact in  
retrieving message lists in any way.


Usually not. Unless client fetches the entire header. Some do I think,  
but usually not.




Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Seth Mattinen
Timo Sirainen wrote:
 On Aug 11, 2009, at 12:41 AM, Seth Mattinen wrote:
 
 Nothing forces you to switch from maildir, if you're happy with it :)
 But if you want to support millions of users, it's simpler to distribute
 the storage and disk I/O evenly across hundreds of servers using a
 database that was designed for it. And by databases I mean here some of
 those key/value-like databases, not SQL. (What's a good collective name
 for those dbs anyway? BASE and NoSQL are a couple names I've seen.)



 Why is a database a better choice than a clustered filesystem?
 
 Show me a clustered filesystem that can guarantee that each file is
 stored in at least 3 different data centers and can scale linearly by
 simply adding more servers (let's say at least up to thousands).

Easy, AFS. It is known to support tens of thousands of clients [1] and
it's not exactly new. Like supporting the quirks of NFS, the quirks of a
clustered filesystem could be found and dealt with, too.

Key/value databases are hardly a magic bullet for redundancy. You don't
get 3 copies in different datacenters by simply switching to a
database-style storage.

[1]
http://www-conf.slac.stanford.edu/AFSBestPractices/Slides/MorganStanley.pdf


 Clustered filesystems are also complex. They're much more complex than
 what Dovecot really requires.
 

I mention it because you stated wanting to outsource the storage
portion. The complexity of whatever database engine you choose or
supporting a clustered filesystem (like NFS) is a wash since you're not
maintaining either one personally.

~Seth


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Timo Sirainen

On Aug 11, 2009, at 2:16 AM, Seth Mattinen wrote:


Show me a clustered filesystem that can guarantee that each file is
stored in at least 3 different data centers and can scale linearly by
simply adding more servers (let's say at least up to thousands).


Easy, AFS. It is known to support tens of thousands of clients [1] and
it's not exactly new. Like supporting the quirks of NFS, the quirks  
of a

clustered filesystem could be found and dealt with, too.


I was more thinking about thousands of servers, not clients. Each  
server should contribute to the amount of storage you have. Buying  
huge storages is more expensive. Also it would be nice if you could  
just keep plugging in more servers to get more storage space, disk I/O  
and CPU and the system would just automatically reconfigure itself to  
take advantage of those. I can't really see any of that happening  
easily with AFS.


Key/value databases are hardly a magic bullet for redundancy. You  
don't

get 3 copies in different datacenters by simply switching to a
database-style storage.


Some (several?) of them can be somewhat easily configured to support  
that. (That's what their web pages say, anyway.)


Clustered filesystems are also complex. They're much more complex  
than

what Dovecot really requires.


I mention it because you stated wanting to outsource the storage
portion. The complexity of whatever database engine you choose or
supporting a clustered filesystem (like NFS) is a wash since you're  
not

maintaining either one personally.


I also want something that's cheap and easy to scale. Sure, people who  
already have NFS/AFS/etc. systems can keep using Dovecot with the  
filesystem backends, but I don't think it's the cheapest or easiest  
choice. There's a reason why e.g. Amazon S3 isn't running on top of  
them.


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Robert Schetterer
Timo Sirainen schrieb:
 On Aug 11, 2009, at 12:41 AM, Seth Mattinen wrote:
 
 Nothing forces you to switch from maildir, if you're happy with it :)
 But if you want to support millions of users, it's simpler to distribute
 the storage and disk I/O evenly across hundreds of servers using a
 database that was designed for it. And by databases I mean here some of
 those key/value-like databases, not SQL. (What's a good collective name
 for those dbs anyway? BASE and NoSQL are a couple names I've seen.)



 Why is a database a better choice than a clustered filesystem?
 
 Show me a clustered filesystem that can guarantee that each file is
 stored in at least 3 different data centers and can scale linearly by
 simply adding more servers (let's say at least up to thousands).
 
 Clustered filesystems are also complex. They're much more complex than
 what Dovecot really requires.
 

i like the idea of sql based mail services
whatever your  choice is, use of cluster file systems stays ever,
but with databased setups it should much more easy to
have redudant mailstores, i have all possible stuff quota, acl etc in a
database yet, incl spamassassin, greylisting, webmail the only thing
which is left ,is the mail store, it would be great if there would be
the possibility to have that, if there are no big disadvantages
like poor performance etc with it

there is http://www.dbmail.org/
has sombody ever used it ?
so it can be compared
-- 
Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Seth Mattinen
Robert Schetterer wrote:
 Timo Sirainen schrieb:
 On Aug 11, 2009, at 12:41 AM, Seth Mattinen wrote:

 Nothing forces you to switch from maildir, if you're happy with it :)
 But if you want to support millions of users, it's simpler to distribute
 the storage and disk I/O evenly across hundreds of servers using a
 database that was designed for it. And by databases I mean here some of
 those key/value-like databases, not SQL. (What's a good collective name
 for those dbs anyway? BASE and NoSQL are a couple names I've seen.)


 Why is a database a better choice than a clustered filesystem?
 Show me a clustered filesystem that can guarantee that each file is
 stored in at least 3 different data centers and can scale linearly by
 simply adding more servers (let's say at least up to thousands).

 Clustered filesystems are also complex. They're much more complex than
 what Dovecot really requires.

 
 i like the idea of sql based mail services
 whatever your  choice is, use of cluster file systems stays ever,
 but with databased setups it should much more easy to
 have redudant mailstores, i have all possible stuff quota, acl etc in a
 database yet, incl spamassassin, greylisting, webmail the only thing
 which is left ,is the mail store, it would be great if there would be
 the possibility to have that, if there are no big disadvantages
 like poor performance etc with it
 
 there is http://www.dbmail.org/
 has sombody ever used it ?
 so it can be compared


It wouldn't be an SQL database - it's not really suitable for this kind
of thing at the scale Timo is proposing.

~Seth


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread ja nein
I was more thinking about thousands of servers, not clients. Each server should 
contribute to the amount of storage you have. Buying huge storages is more 
expensive. Also it would be nice if you could just keep plugging in more 
servers to get more storage space, disk I/O and CPU and the system would just 
automatically reconfigure itself to take advantage of those. I can't really see 
any of that happening easily with AFS.

Well, me too. But there are interesting (and working) solutions like e.g. 
GlusterFS...

 I mention it because you stated wanting to outsource the storage
 portion. The complexity of whatever database engine you choose or
 supporting a clustered filesystem (like NFS) is a wash since you're not
 maintaining either one personally.

I also want something that's cheap and easy to scale. Sure, people who already 
have NFS/AFS/etc. systems can keep using Dovecot with the filesystem backends, 
but I don't think it's the cheapest or easiest choice. There's a reason why 
e.g. Amazon S3 isn't running on top of them.


I think the basic behind the initial idea, which I like very much, is to have a 
choice between redundancy/scalability and easiness of running a platform.

In my opinion there isn't the perfect solution which addresses all of above in 
the best way. I think that's why there are so many different solutions out 
there. Anyway, having indexes centralized in either form of a database would 
be a nice solution (and very important: easy to run in case of SQL!) for not 
all, but many installations. If the speed penalty and coding penalties/efforts 
aren't that much, it would be worth to implement solutions like SQL-based index 
storage, too. And everyone is/would be free to decide which one would be the 
best for his platform/environment.

Huge installations with servers  50 will always be a kind of a special 
solution and won't be built out of the box. Dovecot can just help in having 
good alternatives of storing all kind of lock-dependant stuff in different ways 
(files/memory/databases).

Regards,
Sebastian


  

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Steffen Kaiser

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 10 Aug 2009, Timo Sirainen wrote:


4. Implement a multi-master filesystem backend for index files. The idea
would be that all servers accessing the same mailbox must be talking to
each others via network and every time something is changed, push the
change to other servers. This is actually very similar to my previous
multi-master plan. One of the servers accessing the mailbox would still
act as a master and handle conflict resolution and writing indexes to
disk more or less often.


What I don't understand here is:

_One_ server is the master, which owns the indexes locally?
Oh, 5. means that this particular server is initiating the write, right?

You spoke about thousends of servers, if one of them opens a mailbox, it 
needs to query all (thousends - 1) servers, which of them is probably the 
master of this mailbox. I suppose you need a home location server, which 
other servers connect to, in order to get server currently locking (aka 
acting as master for) this mailbox.


GSM has some home location register pointing to the base station currently 
managing the user info, because the GSM device is in its reach.


There is also another point I'm wondering about:
index files are really more like memory dumps, you've wrote. so if you 
cluster thousends of servers together you'll most probably have different 
server architectures, say 32bit vs. 64bit, CISC vs. RISC, big vs. little 
endian, ASCII vs. EBCDIC :). To share these memory dumps without another 
abstraction layer wouldn't work.



5. Implement filesystem backend for dbox and permanent index storage
using some scalable distributed database, such as maybe Cassandra. This


Although I like the eventually consistent part, I wonder about the 
Java-based stuff of Cassandra.



is the part I've thought the least about, but it's also the part I hope
to (mostly) outsource to someone else. I'm not going to write a
distributed database from scratch..


I wonder if the index-backend in 4. and 5. shouldn't be the same.

===

How many work is it to handle the data in the index files?
What if any server forwards changes to the master and recieves changes 
from the master to sync its local read-only cache? So you needn't handle 
conflicts (except when network was down) and writes are consistent 
originated from this single master server. The actual mail data is 
accessed via another API.


When the current master does no longer need to access the mailbox, it 
could hand over the master stick to another server currently accessing 
the mailbox.


Bye,

- -- 
Steffen Kaiser

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iQEVAwUBSoGA6XWSIuGy1ktrAQKGjggAh9Yjzy2oFI2H8MS2rppm/ug2HWO+9PGX
aTRrzNzj2wTScAL1NrFZrN8Mlc7qK2YfH3rXDbM5Mcw/eC67VQ2P2XcetTY7h5XK
RxFqk5+h3Q06Jiwl0IFQyCxkRzs4bK6cZegjAfSViDfQTx8iQhvXHxioPLvIiFQH
D3lOd7+QUxOLKJyAxejjDM5ez/9OUFXZF9WeWrDGpQYES5HVNND3T288uBwWx5zJ
hwqQI8qR3Fwu9VRSDLpvCx1DjQWGOT7x6DfIaKg2j6IvvSTpH2dMsNg0M3YmLsvY
JyreDtqMlZDLclg00ELx0ORgQVHN5eQpOs/XgmFF0+YBQvAO6mtrUw==
=1GC8
-END PGP SIGNATURE-


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Eric Jon Rostetter

Quoting Seth Mattinen se...@rollernet.us:


Queue directories and clusters don't
mix well, but a read-heavy maildir/dbox environment shouldn't suffer the
same problem.


Why don't queue directories and clusters mix well?  Is this a performance
issue only, or something worse?


~Seth


--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

This message is provided AS IS without warranty of any kind,
either expressed or implied.  Use this message at your own risk.


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Timo Sirainen

On Aug 11, 2009, at 10:32 AM, Steffen Kaiser wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, 10 Aug 2009, Timo Sirainen wrote:

4. Implement a multi-master filesystem backend for index files. The  
idea
would be that all servers accessing the same mailbox must be  
talking to

each others via network and every time something is changed, push the
change to other servers. This is actually very similar to my previous
multi-master plan. One of the servers accessing the mailbox would  
still

act as a master and handle conflict resolution and writing indexes to
disk more or less often.


What I don't understand here is:

_One_ server is the master, which owns the indexes locally?
Oh, 5. means that this particular server is initiating the write,  
right?


Yes, only one would be writing to the shared storage.

You spoke about thousends of servers, if one of them opens a  
mailbox, it needs to query all (thousends - 1) servers, which of  
them is probably the master of this mailbox. I suppose you need a  
home location server, which other servers connect to, in order to  
get server currently locking (aka acting as master for) this mailbox.


Yeah, keeping track of this information is probably the most difficult  
part. But surely it can be done faster than with (thousands-1)  
queries :)



There is also another point I'm wondering about:
index files are really more like memory dumps, you've wrote. so if  
you cluster thousends of servers together you'll most probably have  
different server architectures, say 32bit vs. 64bit, CISC vs. RISC,  
big vs. little endian, ASCII vs. EBCDIC :). To share these memory  
dumps without another abstraction layer wouldn't work.


Nah, x86 is all there is ;) Dovecot has been fine so far with this  
same design. I think only once I've heard that someone wanted to run  
both little and big endian machines with shared NFS storage. 32 vs. 64  
bit doesn't matter though, indexes have been bitness-independent since  
v1.0.rc9.


I was tried to make the code use the same endianess everywhere, but  
the code quickly became so ugly that I decided to just drop it. But  
who knows, maybe some day. :)



5. Implement filesystem backend for dbox and permanent index storage
using some scalable distributed database, such as maybe Cassandra.  
This


Although I like the eventually consistent part, I wonder about the  
Java-based stuff of Cassandra.


I'm not yet sure what database exactly to use. I'm not really familiar  
with any of them, except the Amazon Dynamo whitepaper that I read, and  
that seemed perfect to me. Cassandra still seems to lack some features  
that I think are needed.


is the part I've thought the least about, but it's also the part I  
hope

to (mostly) outsource to someone else. I'm not going to write a
distributed database from scratch..


I wonder if the index-backend in 4. and 5. shouldn't be the same.


You mean the permanent index storage? Yes, it probably should be the  
same in 4 and 5. 4 just has that in-memory layer in the middle.



How many work is it to handle the data in the index files?
What if any server forwards changes to the master and recieves  
changes from the master to sync its local read-only cache? So you  
needn't handle conflicts (except when network was down) and writes  
are consistent originated from this single master server. The actual  
mail data is accessed via another API.


When the current master does no longer need to access the mailbox,  
it could hand over the master stick to another server currently  
accessing the mailbox.


http://dovecot.org/tmp/replication-plan.txt explains how I previously  
thought about the index replication to work, and I think it'll still  
work pretty nicely with the index FS backend too. I guess it could  
mostly work like sending everything to master, although for some  
changes it wouldn't really be necessary. I'll need to rethink the plan  
for this I guess.


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Timo Sirainen
On Tue, 2009-08-11 at 09:38 -0700, Seth Mattinen wrote:
  Why don't queue directories and clusters mix well?  Is this a performance
  issue only, or something worse?
  
 
 It depends on the locking scheme used by the filesystem. Working queue
 directories (the ones where stuff comes and goes rapidly) is best suited
 for a local FS anyway.

And when a server and its disk dies, the emails get lost :(



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-11 Thread Eric Jon Rostetter

Quoting Timo Sirainen t...@iki.fi:


It depends on the locking scheme used by the filesystem. Working queue
directories (the ones where stuff comes and goes rapidly) is best suited
for a local FS anyway.


And when a server and its disk dies, the emails get lost :(


It would appear he is not talking about a /var/spool/mail type queue/spool,
but the queues where the MTA/AV/Anti-Spam/etc process the mail.

For the most part, on machine crash, this will always result in the mail
being lost or resent (resent if it hasn't confirmed the acceptance of the
message yet).  If done with battery backup, the risk is less, but since
most filesystems (local or remote) cache writes in memory, the chances you
will lose the mail is high in any case (if still cached in memory).

I agree that for smaller mail systems, the processing queues
are best on local fs or in memory (memory for AV/Anti-Spam, local disk
for MTA processing).  The delivery queues (where the message awaits delivery
or is delivered) are best on some other file system (mirrored, distributed,
etc).

For a massively scaled system, there may be sufficient performance to
put the queues elsewhere.  But on a small system, with 90% of the mail
being spam/virus/malware, performance will usually dictate local/memory
file systems for such queues...

--
Eric Rostetter
The Department of Physics
The University of Texas at Austin

This message is provided AS IS without warranty of any kind,
either expressed or implied.  Use this message at your own risk.


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-10 Thread Seth Mattinen

Timo Sirainen wrote:

This is something I figured out a few months ago, mainly because this
one guy at work (hi, Stu) kept telling me my multi-master replication
plan sucked and we should use some existing scalable database. (I guess
it didn't go exactly like that, but that's the result anyway.)



Ick, some people (myself included) hate the idea of storing mail in a 
database versus simple and almost impossible to screw up plain text 
files of maildir. Cyrus already does the whole mail-in-database thing.


~Seth


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-10 Thread Timo Sirainen
On Mon, 2009-08-10 at 14:33 -0700, Seth Mattinen wrote:
 Timo Sirainen wrote:
  This is something I figured out a few months ago, mainly because this
  one guy at work (hi, Stu) kept telling me my multi-master replication
  plan sucked and we should use some existing scalable database. (I guess
  it didn't go exactly like that, but that's the result anyway.)
  
 
 Ick, some people (myself included) hate the idea of storing mail in a 
 database versus simple and almost impossible to screw up plain text 
 files of maildir. 

Nothing forces you to switch from maildir, if you're happy with it :)
But if you want to support millions of users, it's simpler to distribute
the storage and disk I/O evenly across hundreds of servers using a
database that was designed for it. And by databases I mean here some of
those key/value-like databases, not SQL. (What's a good collective name
for those dbs anyway? BASE and NoSQL are a couple names I've seen.)

 Cyrus already does the whole mail-in-database thing.

No, Cyrus's mail database is very similar to how Dovecot works. Both
have somewhat similar index files, both store one mail/file (with
dbox/maildir). But Cyrus then also has some additional databases that
screw up things..


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-10 Thread Seth Mattinen
Timo Sirainen wrote:
 On Mon, 2009-08-10 at 14:33 -0700, Seth Mattinen wrote:
 Timo Sirainen wrote:
 This is something I figured out a few months ago, mainly because this
 one guy at work (hi, Stu) kept telling me my multi-master replication
 plan sucked and we should use some existing scalable database. (I guess
 it didn't go exactly like that, but that's the result anyway.)

 Ick, some people (myself included) hate the idea of storing mail in a 
 database versus simple and almost impossible to screw up plain text 
 files of maildir. 
 
 Nothing forces you to switch from maildir, if you're happy with it :)
 But if you want to support millions of users, it's simpler to distribute
 the storage and disk I/O evenly across hundreds of servers using a
 database that was designed for it. And by databases I mean here some of
 those key/value-like databases, not SQL. (What's a good collective name
 for those dbs anyway? BASE and NoSQL are a couple names I've seen.)
 


Why is a database a better choice than a clustered filesystem? It seems
that you're adding a huge layer of complexity (a database) for something
that's already solved (clusters). Queue directories and clusters don't
mix well, but a read-heavy maildir/dbox environment shouldn't suffer the
same problem.

~Seth


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-10 Thread Curtis Maloney

Seth Mattinen wrote:
Ick, some people (myself included) hate the idea of storing mail in a 
database versus simple and almost impossible to screw up plain text 
files of maildir. Cyrus already does the whole mail-in-database thing.


Why do you think 'maildir' isn't a database?

Or to you does 'database' only mean SQL database?

A database is a collection of information that is organized so that 
it can easily be accessed, managed, and updated.


--
Curtis Maloney


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-10 Thread Seth Mattinen
Curtis Maloney wrote:
 Seth Mattinen wrote:
 Ick, some people (myself included) hate the idea of storing mail in a
 database versus simple and almost impossible to screw up plain text
 files of maildir. Cyrus already does the whole mail-in-database thing.
 
 Why do you think 'maildir' isn't a database?
 
 Or to you does 'database' only mean SQL database?
 

Please, don't put words in my mouth. I'm not stupid.

~Seth


Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-08-10 Thread Timo Sirainen

On Aug 11, 2009, at 12:41 AM, Seth Mattinen wrote:


Nothing forces you to switch from maildir, if you're happy with it :)
But if you want to support millions of users, it's simpler to  
distribute

the storage and disk I/O evenly across hundreds of servers using a
database that was designed for it. And by databases I mean here  
some of
those key/value-like databases, not SQL. (What's a good collective  
name

for those dbs anyway? BASE and NoSQL are a couple names I've seen.)




Why is a database a better choice than a clustered filesystem?


Show me a clustered filesystem that can guarantee that each file is  
stored in at least 3 different data centers and can scale linearly by  
simply adding more servers (let's say at least up to thousands).


Clustered filesystems are also complex. They're much more complex than  
what Dovecot really requires.