I got a core file with 4.2.0

I did git checkout -f v4.2.0 configure --enable-imfile and installed the 
result.

I will go through the core file either later tonight or in the morning.

in this case it did take a while for it to die. (over an hour)

David Lang

On Mon, 31 Aug 2009, Rainer Gerhards wrote:

> Date: Mon, 31 Aug 2009 12:50:49 +0200
> From: Rainer Gerhards <[email protected]>
> Reply-To: rsyslog-users <[email protected]>
> To: rsyslog-users <[email protected]>
> Subject: Re: [rsyslog] abort in 4.2.1
> 
> On Fri, 2009-08-28 at 14:55 -0700, [email protected] wrote:
>> On Fri, 28 Aug 2009, Rainer Gerhards wrote:
>>> Also, it would be good if you could --enable-rtinst --enable-debug and try
>>> out that version on your machine. I am a bit concerned about the speed of 
>>> the
>>> resulting executable, it may be too slow. You do not need to run it in debug
>>> mode itself. These option (especially--enable-debug) will activate in-depth
>>> runtime checks (assert, will abort when something wrong happens) and my hope
>>> is that they will catch the bug closer to the root cause. If so, I would 
>>> need
>>> the gdb abort info (actually enabling debug output would be an option some
>>> time later).
>>>
>>> Please let me know what would be OK with you.
>>
>> I will give this a try.
>>
>> I was going to suggest that since we have the message getting corrupted it
>> may make sense to make a temporary branch that has multiple message
>> buffers and at various times through the message processing it makes a
>> copy of the emssage to the buffer. when the system crashes I will be able
>> to look at the core and see where the message is getting corrupted.
>
> David, I fear it is even more complicated than that. It looks like not
> only the message got corrupted but the message object itself. There are
> already two copies of some of the message elements, and they also look
> inconsistent - except, if we really had a null message, that is one with
> no content at all (and generating a message object from a null message,
> I think, would be a bug in itself - but I am sure there are no such
> messages in your actual traffic). If you think there could be a real
> null message, I'd follow that path (will probably do so in any case...).
>
> I think that what really happens is that some part of the code runs
> wild, thus invalidating some random part of the main memory. At some
> times, it hits queue structures (or the message object that is held by
> them) and if so, we will see the abort you experience. With that
> scenario, duplicating the message buffer does not really help, because
> looking at the corrupted message object would not provide any additional
> information.
>
> However, if that's easy enough to reproduce, it would probably be good
> if you could send me the core analysis (the backtrace and the print
> statements) from a few (five maybe?) independent aborts. Maybe they show
> a pattern. It would probably best to send them via private mail, as I am
> not sure if they disclose more than they should.
>
>>
>> I will see about doing a tcpdump at the time that I do this and send it to
>> you (I'll need to check with management, but since we have a contract in
>> place for other reasons I think we can do this)
>>
>
> That would probably be a good thing. I've made some progress with my
> testing tool, and I have created a basic version right now. Probably not
> good enough to mimic your traffic pattern, but closer. I am doing a test
> run for quite some time now, unfortunately so far without abort.
>
> Note that I run into the trouble with UDP - even though I've put some
> one-ms sleeps into the code, I lose a lot of messages, as it looks even
> before they hit the wire. It's always real trobulesome to test with
> UDP...
>
> Rainer
>> I can't do this late on a friday, but I should be able to do this monday
>> afternoon.
>>
>> David Lang
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to