subject:"\[Dovecot\] Scalability plans\: Abstract out filesystem and make it someone else's problem"

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-11 Thread Ed W


On 10/03/2010 21:19, Timo Sirainen wrote:

On 10.8.2009, at 20.01, Timo Sirainen wrote:

   

(3.5. Implement async I/O filesystem backend.)
 

You know what I found out today? Linux doesn't support async IO for regular 
buffered files. I had heard there were issues, but I thought it was mainly 
about some annoying APIs and such. Anyone know if some project has successfully 
figured out some usable way to do async disk IO? The possibilities seem to be:

a) Use Linux's native AIO, which requires direct-io for files. This *might* not 
be horribly bad for mail files. After all, same mail is rarely read multiple 
times. Except when parsing its headers first and then its body. Maybe the 
process could do some internal buffering?..

I guess no one ever tried my posix_fadvise() patch? The idea was that it would 
tell the kernel after closing a mail file that it's no longer needed in memory, 
so kernel could remove it from page cache. I never heard any positive or 
negative comments about how it affected performance.. 
http://dovecot.org/patches/1.1/fadvise.diff

b) Use threads, either via some library or implement yourself. Each thread of 
course uses some extra memory. Also enabling threads causes glibc to start 
using a thread-safe version of malloc() (I think?), which slows things down 
(unless that can be avoided, maybe by using clone() directly instead of 
pthreads?).

c) I read someone's idea about using posix_fadvise() and fincore() functions to somehow 
make it kind of work, usually, maybe. I'm not sure if there's a practical way 
to make them work though. And of course I don't think fincore() has even been accepted by 
Linus yet.

   


Perhaps mail this question to the kernel list, stand back and watch it 
ignite?


Ed

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-11 Thread Sebastian Färber

b) Use threads, either via some library or implement yourself. Each thread of 
course uses some extra memory. Also enabling threads causes glibc to start 
using a thread-safe version of malloc() (I think?), which slows things down 
(unless that can be avoided, maybe by using clone() directly instead of 
pthreads?).

Perhaps libeio (http://software.schmorp.de/pkg/libeio.html) is a good
starting point?
I don't have any experience with it but it's used by node.js
(http://nodejs.org/) for the async I/O stuff.

-Sebastian

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-10 Thread Timo Sirainen

On 10.8.2009, at 20.01, Timo Sirainen wrote:

 (3.5. Implement async I/O filesystem backend.)

You know what I found out today? Linux doesn't support async IO for regular 
buffered files. I had heard there were issues, but I thought it was mainly 
about some annoying APIs and such. Anyone know if some project has successfully 
figured out some usable way to do async disk IO? The possibilities seem to be:

a) Use Linux's native AIO, which requires direct-io for files. This *might* not 
be horribly bad for mail files. After all, same mail is rarely read multiple 
times. Except when parsing its headers first and then its body. Maybe the 
process could do some internal buffering?..

I guess no one ever tried my posix_fadvise() patch? The idea was that it would 
tell the kernel after closing a mail file that it's no longer needed in memory, 
so kernel could remove it from page cache. I never heard any positive or 
negative comments about how it affected performance.. 
http://dovecot.org/patches/1.1/fadvise.diff

b) Use threads, either via some library or implement yourself. Each thread of 
course uses some extra memory. Also enabling threads causes glibc to start 
using a thread-safe version of malloc() (I think?), which slows things down 
(unless that can be avoided, maybe by using clone() directly instead of 
pthreads?).

c) I read someone's idea about using posix_fadvise() and fincore() functions to 
somehow make it kind of work, usually, maybe. I'm not sure if there's a 
practical way to make them work though. And of course I don't think fincore() 
has even been accepted by Linus yet.

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2010-03-10 Thread Stan Hoeppner

Timo Sirainen put forth on 3/10/2010 3:19 PM:
 On 10.8.2009, at 20.01, Timo Sirainen wrote:
 
 (3.5. Implement async I/O filesystem backend.)
 
 You know what I found out today? Linux doesn't support async IO for regular 
 buffered files. I had heard there were issues, but I thought it was mainly 
 about some annoying APIs and such. Anyone know if some project has 
 successfully figured out some usable way to do async disk IO? The 
 possibilities seem to be:
 
 a) Use Linux's native AIO, which requires direct-io for files. This *might* 
 not be horribly bad for mail files. After all, same mail is rarely read 
 multiple times. Except when parsing its headers first and then its body. 
 Maybe the process could do some internal buffering?..
 
 I guess no one ever tried my posix_fadvise() patch? The idea was that it 
 would tell the kernel after closing a mail file that it's no longer needed in 
 memory, so kernel could remove it from page cache. I never heard any positive 
 or negative comments about how it affected performance.. 
 http://dovecot.org/patches/1.1/fadvise.diff
 
 b) Use threads, either via some library or implement yourself. Each thread of 
 course uses some extra memory. Also enabling threads causes glibc to start 
 using a thread-safe version of malloc() (I think?), which slows things down 
 (unless that can be avoided, maybe by using clone() directly instead of 
 pthreads?).
 
 c) I read someone's idea about using posix_fadvise() and fincore() functions 
 to somehow make it kind of work, usually, maybe. I'm not sure if there's a 
 practical way to make them work though. And of course I don't think fincore() 
 has even been accepted by Linus yet.
 

Considering the extent to which Linus hates O_DIRECT, I would think if he
was a fan of Async I/O at all, he'd have pushed its use via the buffer
cache.  Given that Async I/O is implemented via O_DIRECT I'd say Linus isn't
a fan of Async I/O either.  I've not read anything Linus has written on
Async I/O, if he even has, I'm merely making an educated guess based on the
current implementation of Async I/O in Linux.

-- 
Stan

Re: [Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

2009-09-28 Thread paulmon

On Mon, 2009-08-10 at 14:33 -0700, Seth Mattinen wrote:

Nothing forces you to switch from maildir, if you're happy with it :)
But if you want to support millions of users, it's simpler to distribute
the storage and disk I/O evenly across hundreds of servers using a
database that was designed for it. And by databases I mean here some of
those key/value-like databases, not SQL. (What's a good collective name
for those dbs anyway? BASE and NoSQL are a couple names I've seen.)

Timo, I've been thinking the same exact thing as you lately. As mail starts
to move away from traditional pop3 users to more online storage in the
form of webmail the scalability of maildir for large multi giabyte mailboxes
goes out the window, loading cur in that type of scenario takes WAY too
long. Gmail on Maildir isn't possible. I can't speak for anyone else buy
my users are moving into webmail, POP users are becoming rare.

My current thinking is a key/value store as you've proposed. Something like
Hadoop components or Project Voldamort. Voldamort might be a better fit
from what I've read. The main issue here is applications such as local
delivery as well as pop/imap access would need to be rewritten to support
this. Obviously creating a Hadoop or Voldamort aware local delivery agent
means being able to stay away from writing a complete MTA, likewise if one
treats IMAP as the main way of accessing a mailbox (proxies for POP3 for
example) then a new local delivery agent and IMAPd with key/value smarts
would all that would be needed to create this system.

My current thinking if having the local delivery break messages up into
their component pieces, headers, from address, to address, spam scores, body
etc into various key:value relationships. Combine this with the replication
support of systems such as Hadoop or Voldamort and you end up with a
massively scalable based on commodity hardware. You get rid of RAID
completely, remove NFS servers and replace with a cluster of beige boxes
with ~4 drives each. Redundancy is handled by the native replication in the
key:value application itself (Voldamort for example can replicate upto 3
times) on each machine, so yes, you would store a single message more than
once but if each of your beige box storage systems have 4*2TB drives your
cost of storage is far less than the cost of traditional NFS server
manufacturers.

Anyways, this is just something that's currently floating in my head...

Paul

--
View this message in context:
http://www.nabble.com/Scalability-plans%3A-Abstract-out-filesystem-and-make-it-someone-else%27s-problem-tp24903458p25645652.html
Sent from the Dovecot mailing list archive at Nabble.com.

66 matches

Mail list logo