On Mon, 31 Aug 2009, Rainer Gerhards wrote: > quick question: do you have name resolution enabled on the system in > question? I am asking because I just got a valgrind violation my lab (but not > an abort yet) that points into the name resolution area.
no, I run this with -x David Lang > Rainer > >> -----Original Message----- >> From: [email protected] [mailto:rsyslog- >> [email protected]] On Behalf Of Rainer Gerhards >> Sent: Monday, August 31, 2009 12:51 PM >> To: rsyslog-users >> Subject: Re: [rsyslog] abort in 4.2.1 >> >> On Fri, 2009-08-28 at 14:55 -0700, [email protected] wrote: >>> On Fri, 28 Aug 2009, Rainer Gerhards wrote: >>>> Also, it would be good if you could --enable-rtinst --enable-debug >> and try >>>> out that version on your machine. I am a bit concerned about the >> speed of the >>>> resulting executable, it may be too slow. You do not need to run it >> in debug >>>> mode itself. These option (especially--enable-debug) will activate >> in-depth >>>> runtime checks (assert, will abort when something wrong happens) >> and my hope >>>> is that they will catch the bug closer to the root cause. If so, I >> would need >>>> the gdb abort info (actually enabling debug output would be an >> option some >>>> time later). >>>> >>>> Please let me know what would be OK with you. >>> >>> I will give this a try. >>> >>> I was going to suggest that since we have the message getting >> corrupted it >>> may make sense to make a temporary branch that has multiple message >>> buffers and at various times through the message processing it makes >> a >>> copy of the emssage to the buffer. when the system crashes I will be >> able >>> to look at the core and see where the message is getting corrupted. >> >> David, I fear it is even more complicated than that. It looks like not >> only the message got corrupted but the message object itself. There are >> already two copies of some of the message elements, and they also look >> inconsistent - except, if we really had a null message, that is one >> with >> no content at all (and generating a message object from a null message, >> I think, would be a bug in itself - but I am sure there are no such >> messages in your actual traffic). If you think there could be a real >> null message, I'd follow that path (will probably do so in any >> case...). >> >> I think that what really happens is that some part of the code runs >> wild, thus invalidating some random part of the main memory. At some >> times, it hits queue structures (or the message object that is held by >> them) and if so, we will see the abort you experience. With that >> scenario, duplicating the message buffer does not really help, because >> looking at the corrupted message object would not provide any >> additional >> information. >> >> However, if that's easy enough to reproduce, it would probably be good >> if you could send me the core analysis (the backtrace and the print >> statements) from a few (five maybe?) independent aborts. Maybe they >> show >> a pattern. It would probably best to send them via private mail, as I >> am >> not sure if they disclose more than they should. >> >>> >>> I will see about doing a tcpdump at the time that I do this and send >> it to >>> you (I'll need to check with management, but since we have a contract >> in >>> place for other reasons I think we can do this) >>> >> >> That would probably be a good thing. I've made some progress with my >> testing tool, and I have created a basic version right now. Probably >> not >> good enough to mimic your traffic pattern, but closer. I am doing a >> test >> run for quite some time now, unfortunately so far without abort. >> >> Note that I run into the trouble with UDP - even though I've put some >> one-ms sleeps into the code, I lose a lot of messages, as it looks even >> before they hit the wire. It's always real trobulesome to test with >> UDP... >> >> Rainer >>> I can't do this late on a friday, but I should be able to do this >> monday >>> afternoon. >>> >>> David Lang >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com >> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

