Juha,

I have finally been able to review the material that came with this bug
report. Thanks for all the good info, but it looks everything was
related to the $AllowedSender bug, not to the race condition (which I,
too, think exists).

... more inline below...

On Sat, 2009-01-10 at 20:12 +0200, Juha Koho wrote:
> For this issue (number 2.) I believe it could be a thread
> synchronization issue. The client that has had these problems is a
> quad core system and I installed other single core system with exactly
> the same configuration not running the recompiled version and it has
> been working perfectly since I installed it for at least a week ago.

Definitely. I am trying to track down a nasty race condition (I think it
is one) for a while now. It seems to occur only on machines with at
least four cores and not always. I unfortunately can not reproduced it
myself. This partly due to insufficient hardware, but when I got a
machine for a while, I was able to see the issue only once or twice, but
very, very random and I could not draw any conclusion before I needed to
return the machine. There are few other reports, but for none of them I
have been able to obtain any information that points to the culprit. I
hope we can make better success in your case.

>  I
> don't think these issues are related either because my client used to
> crash at random times and not during reload.

Right, this one is different.

> 
> By the way. I'm actually using TCP to forward messages and I haven't
> tried UDP yet.

This doesn't seem to make a difference. I think I have tracked down it
to either the code that creates or destructs the message object, but not
being able to reliably reproduce, this is just an educated guess. So the
input may make a difference (but I don't think so).

The primary question I have at this time is if you can reproduce the bug
without the $AllowedSender directive (or with the patch I created for
the cloned bug). If so, that would be a very good thing. From there, we
would need to change the config to see if it disappears if some settings
are changed (I am a bit sceptic about the async queue). That than could
lead us to the right path, even when not being able to apply any debug
settings. Oh - did I mention that the bug almost instantly disappears if
rsyslog is compiled for debugging. I initially thought that is an
artifact of limited concurrency due to debug calls, but now I tend to
believe that it actually is due to reduced speed - so on a 8-core system
we may have the issue even with debug mode (someone with a 8 way system
out there? ;)).

I guess the bug is quite basic, but it is very hard to find it not being
able to reproduce it at will or at least once a day and in debug mode...

Feedback appreciated,
Rainer




-- 
To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to