Paul J Stevens wrote:
I'm taking this to the dev list....

Matthew T. O'Connor wrote:
This sounds scary to me.  Obviously it's not my project and I'm not the
one coding, I'm just worried as an admin that depends on DBMail that we
are going to open up a large can of worms for dubious gain.

Please be asured, I'm not yet convinced we should go for a threaded
model either.

Good to hear, and again, I'm not against threading, I'm just making the point that it's not a clear win, and not something to be entered into lightly.

Aaron and I started looking for a lib for doing connection pooling to
the database.  As far as I can tell (http://www.tildeslash.com/libzdb)
is the only viable lib which does this. Using such a setup however would
require threading. And as everyone keeps telling us, that is no trivial
matter.

Interesting, I'm not familiar with libzdb, but a quick glance at the site, it does look like it might work for us.

Another option to scale up the database connections would be a worker
model where one master process handles all incoming connections using
select or libevent or something similar, and hands over clients to one
of a limited number of forked workers which maintain the database
connections.

The worker processes would be very much like the current forked
children. The master process would have to do multiplexed IO, and need a
callback infrastructure.

This is somewhat like what I had in mind, whether you use libevent or write it yourself, that is how a multi-process system should work IMHO.

This approach is actually my favorite since we don't need to make the
whole codebase thread safe to do it. It also feels more like an
incremental change, rather than a complete redesign of the processing
model, which is good imo.

Agreed, evolution not revolution.

So I guess this will come down to one thing only: the one doing the
coding gets to be the one who decides this issue.

True enough.

In either case (threading versus workers) the issue of IPC between
running dbmail frontends sharing the same backend remains if we want to
do IDLE/NOTIFY/etc. Since the database is where they all come together,
it'll *have* to be the database where information about changes in
mailboxes is shared and exchanged. Caching such information using
memcached is just one way to address to performance impact accessing
that information will have.


It seems to me there are two main issues, IPC for things like IDLE/NOTIFY and scaling up better.

IPC Example for discussion: A new email comes in via dbmail-smtp or lmtp and is delivered to mailbox 50, we need to notify all processes that are monitoring mailbox 50 that a new message has arrived. I suppose a poor-mans implementation of this would be to have each process that has been asked to IDLE to check the mailbox for new messages every 60 seconds or so, if anything has changed, then it notifies the IMAP client. This sounds like a lot of extra load, but since all everyone would have to go to the database anyway when the IMAP client requests the new message, perhaps isn't not really that much worse. Using this implementation, we don't really need IPC, each process watches the DB by itself. In this type of setup, memcached might do alot to reduce DB load. Another thought is that at least in PostgreSQL there is a NOTIFY mechanism that I believe can be used to notify a client of a DB event.

Also, we could have DBMail be more frugal with DB connections, that is close them when not actively doing something with the DB, this will reduce the overall number of concurrent DB connections. If the startup time for new connections proves to be a problem for you, you can setup something like pgpool to create a pool of active connections. This also dovetails nicely with memcached since you would always look first in the cache for data and only connect to the database when you don't find what you are looking for in the cache.
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Reply via email to