[Citadel Development] (no subject)

dothebart Tue, 05 Jun 2012 11:23:08 -0700

Tue Jun 05 2012 11:22:54 EDT from IGnatius T Foobar @ Uncensored

Oh yeah, why don't we just add Microsoft Exchange as a dependency while we're at it?

More threads sounds like a good idea. I was doing some reading about the relative merits of event driven code vs. just having a lot of threads. It's been well argued that by simply giving every activity its own thread, you let the operating system do on its own what you were trying to do manually, and the computer can do it better than you can.

Remember how much cleaner everything got when I ripped out davew's overly complex thread code? He basically had an entire scheduler. We're not writing an operating system here.

ok, from reading through the syslog, Jun 2 07:27:01 was the point where the last activity from a housekeeping thread was logged.

Jun 2 07:37:05 was the last timeout from all jobs scheduled from there, so I realy don't think that there was something blocked inside of the eventqueue thread.

searching for 'processing outbound queue' which is output by the smtp-queue unveils that.

since my tries to attach to citserver are probably after the restart, we can't say whether there was a thread being housekeeper blocked, or whether there was no more calling of do_housekeeping().

whatever the reason was, its been related to either one of the housekeeping sub-jobs being blocked, or the housekeeper facility in itself not working.

so it comes down to 'everything much cleaner' introducing some race condition, or some parts of the housekeeper itself having a race condition.

Feeding jobs into the event-queue is a signal through a non-blocking pipe with the libev function ev_async_send() which is nonblocking.

it is however protected by mutexes, which could have a race condition.

alternatively some other part of the housekeeping / queue / indexer could be blocking.

i.e. the citadel networker has a mutex on the list of active server internetworking, or for the access to the netconfig.

since my last commits fixed a bug of read_network_map() (which was basicaly always returning a NULL-pointer in advance) I think we have some follow up of this here.

the first follow up was a crash related to the pointer still being in use after freeing it.

maybe the second is the possibility of a deadlock.

the networking code was written in a non thread-safe manner with global variables which sometimes caused crashes on uncensored related to citadel client sessions and the networker/housekeeping thread accessing / freeing these vars concurently. I've changed that with 7.8x, and introduced the above mentioned NULL-Pointer bug.

[Citadel Development] (no subject)

Reply via email to