Paul J Stevens wrote:
I'm taking this to the dev list....
Matthew T. O'Connor wrote:
This sounds scary to me. Obviously it's not my project and I'm not the
one coding, I'm just worried as an admin that depends on DBMail that we
are going to open up a large can of worms for dubious gain.
Please be asured, I'm not yet convinced we should go for a threaded
model either.
Good to hear, and again, I'm not against threading, I'm just making the
point that it's not a clear win, and not something to be entered into
lightly.
Aaron and I started looking for a lib for doing connection pooling to
the database. As far as I can tell (http://www.tildeslash.com/libzdb)
is the only viable lib which does this. Using such a setup however would
require threading. And as everyone keeps telling us, that is no trivial
matter.
Interesting, I'm not familiar with libzdb, but a quick glance at the
site, it does look like it might work for us.
Another option to scale up the database connections would be a worker
model where one master process handles all incoming connections using
select or libevent or something similar, and hands over clients to one
of a limited number of forked workers which maintain the database
connections.
The worker processes would be very much like the current forked
children. The master process would have to do multiplexed IO, and need a
callback infrastructure.
This is somewhat like what I had in mind, whether you use libevent or
write it yourself, that is how a multi-process system should work IMHO.
This approach is actually my favorite since we don't need to make the
whole codebase thread safe to do it. It also feels more like an
incremental change, rather than a complete redesign of the processing
model, which is good imo.
Agreed, evolution not revolution.
So I guess this will come down to one thing only: the one doing the
coding gets to be the one who decides this issue.
True enough.
In either case (threading versus workers) the issue of IPC between
running dbmail frontends sharing the same backend remains if we want to
do IDLE/NOTIFY/etc. Since the database is where they all come together,
it'll *have* to be the database where information about changes in
mailboxes is shared and exchanged. Caching such information using
memcached is just one way to address to performance impact accessing
that information will have.
It seems to me there are two main issues, IPC for things like
IDLE/NOTIFY and scaling up better.
IPC Example for discussion: A new email comes in via dbmail-smtp or lmtp
and is delivered to mailbox 50, we need to notify all processes that are
monitoring mailbox 50 that a new message has arrived.
I suppose a poor-mans implementation of this would be to have each
process that has been asked to IDLE to check the mailbox for new
messages every 60 seconds or so, if anything has changed, then it
notifies the IMAP client. This sounds like a lot of extra load, but
since all everyone would have to go to the database anyway when the IMAP
client requests the new message, perhaps isn't not really that much
worse. Using this implementation, we don't really need IPC, each
process watches the DB by itself. In this type of setup, memcached
might do alot to reduce DB load. Another thought is that at least in
PostgreSQL there is a NOTIFY mechanism that I believe can be used to
notify a client of a DB event.
Also, we could have DBMail be more frugal with DB connections, that is
close them when not actively doing something with the DB, this will
reduce the overall number of concurrent DB connections. If the startup
time for new connections proves to be a problem for you, you can setup
something like pgpool to create a pool of active connections. This also
dovetails nicely with memcached since you would always look first in the
cache for data and only connect to the database when you don't find what
you are looking for in the cache.
_______________________________________________
Dbmail-dev mailing list
[email protected]
http://twister.fastxs.net/mailman/listinfo/dbmail-dev