Just a quick note, that you can ignore my previous mail it was my fault
apparently.
my dbmail-imapd seems to not "crash" anymore with that commit - i did
leave my phone's wireless on over night again and let maildroid sync
(which is what crashed it pretty reliably by ~ 03:00 in the morning so
far).
Duno about your change but 0.2 sec seems to be a long time for lots of
concurrent clients.
I just looked at ZeroMQ. sounds interesting (and part of it somehow
reminds me a bit of the twisted framework (python) ^^)
Are you planning to use that anyway or is that just some thought you
were playing with?
I am curious how many changes this would require to the code or if the
code currently is "modular" enough to replace the recv/send easily.
On 2013-09-11 16:18, Paul J Stevens wrote:
On 09/11/2013 12:21 PM, Harald Leithner wrote:
Hi,
I'm running 4c23432cc270554557f9e130331214d81164131b since release.
In the last 24 Hours my monitoring service needs to restart imapd 6
times and pop3d 2 times because both services are unreachable. I'm not
sure why it get so much worse since e84cfd46a08a7c1fa8 with this
commit
I have "only" one or 2 forced restarts per 25 Hours. And none at pop3.
I've reverted that last change, and have replaced the self-pipe
mechanism with a heartbeat event:
00fc5c62eeccb87459beecfe76247de4dc961a4c
Some background;
Libevent doesn't really support IO in multiple threads. Basically only
one thread is allowed to do any IO connected with events. DBMail uses
an
async queue to send messages from worker threads to the main thread,
which then pushes them to clients to avoid doing network IO in the
worker threads. That works great, but the main thread also needs a
mechanism to be notified of any messages waiting in the queue.
Until now dbmail-3 used a self-pipe where the events on the pipe were
used to notify the main thread of waiting messages. But this implied IO
in the worker threads: writing a single byte on the pipe after pushing
a
message to the queue.
Given your problems, I've come to the conclusion that this is an
invalid
approach, considering libevent's limitations.
Normal synchronisation mechanisms, like pthread_cond_t don't apply
here,
because we cannot suspend the main thread waiting for messages.
So instead I've now pushed a different approach; generate a timeout
event every 0.2 seconds in the main thread which interrupts the main
thread to check for pending messages. This is what I call the
heartbeat,
for want of a better term.
I have a feeling this will affect throughput very slightly. It doesn't
feel like a very elegant solution, but it works, and we no longer
violate libevent principles.
Until I integrate ZeroMQ, or I come up with a different solution, this
will have to do.
All this doesn't really explain the lock-ups of pop3d. So if it happens
to pop3d again: please try to generate a strace log, so I can at least
get an idea of what and when it happens.
thanks
_______________________________________________
DBmail mailing list
[email protected]
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail