On 09/11/2013 12:21 PM, Harald Leithner wrote:
> Hi,
>
> I'm running 4c23432cc270554557f9e130331214d81164131b since release.
>
> In the last 24 Hours my monitoring service needs to restart imapd 6
> times and pop3d 2 times because both services are unreachable. I'm not
> sure why it get so much worse since e84cfd46a08a7c1fa8 with this commit
> I have "only" one or 2 forced restarts per 25 Hours. And none at pop3.
I've reverted that last change, and have replaced the self-pipe
mechanism with a heartbeat event:
00fc5c62eeccb87459beecfe76247de4dc961a4c
Some background;
Libevent doesn't really support IO in multiple threads. Basically only
one thread is allowed to do any IO connected with events. DBMail uses an
async queue to send messages from worker threads to the main thread,
which then pushes them to clients to avoid doing network IO in the
worker threads. That works great, but the main thread also needs a
mechanism to be notified of any messages waiting in the queue.
Until now dbmail-3 used a self-pipe where the events on the pipe were
used to notify the main thread of waiting messages. But this implied IO
in the worker threads: writing a single byte on the pipe after pushing a
message to the queue.
Given your problems, I've come to the conclusion that this is an invalid
approach, considering libevent's limitations.
Normal synchronisation mechanisms, like pthread_cond_t don't apply here,
because we cannot suspend the main thread waiting for messages.
So instead I've now pushed a different approach; generate a timeout
event every 0.2 seconds in the main thread which interrupts the main
thread to check for pending messages. This is what I call the heartbeat,
for want of a better term.
I have a feeling this will affect throughput very slightly. It doesn't
feel like a very elegant solution, but it works, and we no longer
violate libevent principles.
Until I integrate ZeroMQ, or I come up with a different solution, this
will have to do.
All this doesn't really explain the lock-ups of pop3d. So if it happens
to pop3d again: please try to generate a strace log, so I can at least
get an idea of what and when it happens.
thanks
--
________________________________________________________________
Paul J Stevens pjstevns @ gmail, twitter, skype, linkedin
* Premium Hosting Services and Web Application Consultancy *
www.nfg.nl/[email protected]/+31.85.877.99.97
________________________________________________________________
_______________________________________________
DBmail mailing list
[email protected]
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail