Applied to four hosts where coredump had been spotted; rpm was built with the following instructions:
git clone git://git.adiscon.com/git/rsyslog.git cd rsyslog git checkout v5-stable ./autogen.sh cd .. mv rsyslog rsyslog-5.8.7 tar czf rsyslog-5.8.7-beta.tar.gz rsyslog-5.8.7 --exclude '.git*' rpmbuild -bs rsyslog.spec mock /usr/src/redhat/SRPMS/rsyslog-5.8.7-beta.el5.kc.src.rpm I will report back in two days. Thanks, Kaiwang 2012/1/9 Rainer Gerhards <[email protected]>: > Thanks for reporting and tracking down this bug. I was on vacation and have > just returned. I applied your patch, it is now merged: > > http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=791b16ce06d75944e338a6e5 > fa14c0394bde6f1d > > I'd appreciate if you could double-check that the git version is correct (you > never know...). I will release this version during the next days. > > Once again for your help! > > Rainer > >> -----Original Message----- >> From: [email protected] [mailto:rsyslog- >> [email protected]] On Behalf Of Kaiwang Chen >> Sent: Monday, January 09, 2012 9:29 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] rsyslog-5.8.6 log entry corruption >> >> To report that coredump and message corruption disappeared after the >> application of my patch; I guess it's fixed. >> >> Thanks, >> Kaiwang >> >> 2012/1/5 Kaiwang Chen <[email protected]>: >> > Hello, >> > >> > I check ChangeLog and found 5.8.4 fixed some bug related to race >> > condition as discussed in >> > http://bugzilla.adiscon.com/show_bug.cgi?id=275 >> > >> > with patch protecting modification in the getMSGID(): >> > http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=d66f7b95e081ecb35 >> > 6cc230685c5b7c90f8590b4 >> > >> > >> > I guess the situation is almost identical to what in my case: >> > corruption is always at the APPNAME field and there is, I believe, a >> > typo in line 1955 from getAPPNAME() effectively ruins the mutex >> > proctection: >> > 1947 /* rgerhards, 2005-11-24 >> > 1948 */ >> > 1949 char *getAPPNAME(msg_t *pM, sbool bLockMutex) >> > 1950 { >> > 1951 uchar *pszRet; >> > 1952 >> > 1953 assert(pM != NULL); >> > 1954 if(bLockMutex == LOCK_MUTEX) >> > 1955 MsgUnlock(pM); /* should be MsgLock(pM) */ >> > 1956 prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED); >> > 1957 if(pM->pCSAPPNAME == NULL) >> > 1958 pszRet = UCHAR_CONSTANT(""); >> > 1959 else >> > 1960 pszRet = rsCStrGetSzStrNoNULL(pM->pCSAPPNAME); >> > 1961 if(bLockMutex == LOCK_MUTEX) >> > 1962 MsgUnlock(pM); >> > 1963 return (char*)pszRet; >> > 1964 } >> > >> > Similar cases were spotted in getPROCID(), getStructuredData() and >> > getProgramName(). Hopefully the following patch against 5.8.6 tarball >> > should solve the problem, I will report back later. >> > --- rsyslog-5.8.6/runtime/msg.c.orig 2011-10-21 17:53:02.000000000 >> > +0800 >> > +++ rsyslog-5.8.6/runtime/msg.c 2012-01-05 21:54:26.594666510 +0800 >> > @@ -1609,7 +1609,7 @@ >> > >> > ISOBJ_TYPE_assert(pM, msg); >> > if(bLockMutex == LOCK_MUTEX) >> > - MsgUnlock(pM); >> > + MsgLock(pM); >> > preparePROCID(pM, MUTEX_ALREADY_LOCKED); >> > if(pM->pCSPROCID == NULL) >> > pszRet = UCHAR_CONSTANT(""); @@ -1846,7 +1846,7 @@ >> > { >> > uchar *pszRet; >> > >> > - MsgUnlock(pM); >> > + MsgLock(pM); >> > if(pM->pCSStrucData == NULL) >> > pszRet = UCHAR_CONSTANT("-"); >> > else >> > @@ -1894,7 +1894,7 @@ >> > uchar *pszRet; >> > >> > if(bLockMutex == LOCK_MUTEX) >> > - MsgUnlock(pM); >> > + MsgLock(pM); >> > prepareProgramName(pM, MUTEX_ALREADY_LOCKED); >> > if(pM->pCSProgName == NULL) >> > pszRet = UCHAR_CONSTANT(""); @@ -1952,7 +1952,7 @@ >> > >> > assert(pM != NULL); >> > if(bLockMutex == LOCK_MUTEX) >> > - MsgUnlock(pM); >> > + MsgLock(pM); >> > prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED); >> > if(pM->pCSAPPNAME == NULL) >> > pszRet = UCHAR_CONSTANT(""); >> > >> > >> > By the way, many crashes led to the convergence, for example //// db1 >> > core.26581 signal 11, Segmentation fault. >> > (gdb) p *pThis >> > $1 = { >> > pBuf = 0x7f7f5c000cd0 "sshd\177\177", >> > pszBuf = 0x7f7f5c000dd0 "sshd", >> > iBufSize = 4, >> > iStrLen = 4 >> > } >> > (gdb) where >> > #0 0x000000000041e6a9 in rsCStrGetSzStr (pThis=0x131cb10) at >> > stringbuf.c:334 >> > #1 0x000000000041b1bf in getAPPNAME (pM=0x131c140, bLockMutex=1 >> > '\001') at msg.c:1960 >> > #2 0x000000000041c0f8 in MsgGetProp (pMsg=0x131c140, >> pTpe=0x12ef350, >> > propID=115 's', pPropLen=0x43fb8f00, >> > pbMustBeFreed=0xfefefefefefefeff) at msg.c:2429 >> > #3 0x000000000043be88 in tplToString (pTpl=0x12eeca0, pMsg=0x131c140, >> > ppBuf=0x12f7910, pLenBuf=0x12f7930) at ../template.c:119 >> > #4 0x0000000000438498 in prepareBatch (pAction=0x12f72d0, >> > pBatch=0x12f7898, pbShutdownImmediate=<value optimized out>) >> > at ../action.c:738 >> > #5 processBatchMain (pAction=0x12f72d0, pBatch=0x12f7898, >> > pbShutdownImmediate=<value optimized out>) at ../action.c:1163 >> > #6 0x000000000043190d in ConsumerReg (pThis=0x12f73f0, >> > pWti=0x12f7870) at queue.c:1705 >> > #7 0x000000000042b0bd in wtiWorker (pThis=0x12f7870) at wti.c:315 >> > #8 0x000000000042abd8 in wtpWorker (arg=0x12f7870) at wtp.c:387 >> > #9 0x00000031d5a06367 in __nptl_setxid () from /lib64/libpthread.so.0 >> > #10 0x0000000000000000 in ?? () >> > >> > >> > >> > Thanks, >> > Kaiwang >> > >> > 2011/12/31 Kaiwang Chen <[email protected]>: >> >> Even worse case >> >> <6>1 2011-12-31T12:04:05.036812+08:00 squid6 kernel - - imklog >> >> 5.8.6, log source = /proc/kmsg started. >> >> ...several SYN flooding entries in the same second... >> >> *** glibc detected *** /sbin/rsyslogd: free(): invalid next size >> >> (fast): 0x00007fafb8000e00 *** >> >> *** glibc detected *** /sbin/rsyslogd: malloc(): memory corruption: >> >> 0x00007fafb8000e20 *** >> >> >> >> And possibly hours later the host was not responsive to ssh >> >> connection requests, and even TTY login attempts failed... >> >> Interestingly the debug log in another file continues a little beyond: >> >> >> >> Debug line with all properties: >> >> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', >> >> PRI: 47, syslogtag 'rsyslogd-pstats:', programname: >> >> 'rsyslogd-pstats', >> >> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-', >> >> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-', >> >> msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 ' >> >> escaped msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 ' >> >> inputname: impstats rawmsg: 'main Q[DA]: size=0 enqueued=0 full=0 >> maxqsize=0 ' >> >> >> >> Debug line with all properties: >> >> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', >> >> PRI: 47, syslogtag 'rsyslogd-pstats:', programname: >> >> 'rsyslogd-pstats', >> >> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-', >> >> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-', >> >> msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 ' >> >> escaped msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 ' >> >> inputname: impstats rawmsg: 'main Q: size=4 enqueued=30 full=0 >> maxqsize=5 ' >> >> >> >> Fortunately the squid service on that host was not badly affected. So >> >> I did graceful failover and reboot the host, everything became OK. >> >> Although there was no direct timestamp associated, I guess the >> >> problem occured soon after rsyslogd started. I also spotted several >> >> corrupted messages on other hosts when rsyslogd started. They are same >> version: >> >> >> >> # rsyslogd -v >> >> rsyslogd 5.8.6, compiled with: >> >> FEATURE_REGEXP: Yes >> >> FEATURE_LARGEFILE: No >> >> GSSAPI Kerberos 5 support: Yes >> >> FEATURE_DEBUG (debug build, slow code): No >> >> 32bit Atomic operations supported: Yes >> >> 64bit Atomic operations supported: Yes >> >> Runtime Instrumentation (slow code): No >> >> >> >> >> >> Thanks, >> >> Kaiwang >> >> >> >> 2011/12/31 Kaiwang Chen <[email protected]>: >> >>> Also spotted corrupted messages from imuxsock: >> >>> >> >>> Debug line with all properties: >> >>> FROMHOST: 'squid9', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid9', >> >>> PRI: 47, syslogtag 'rsyslogd-pstats:', programname: '<88>+?<99>2 >> >>> <88>+?<99>rssyslogd-pstats', APP-NAME: '', PROCID: '', MSGID: '-', >> >>> TIMESTAMP: 'Dec 30 21:46:14', STRUCTURED-DATA: '-', >> >>> msg: 'imuxsock: submitted=429 ratelimit.discarded=0 >> >>> ratelimit.numratelimiters=213 ' >> >>> escaped msg: 'imuxsock: submitted=429 ratelimit.discarded=0 >> >>> ratelimit.numratelimiters=213 ' >> >>> inputname: impstats rawmsg: 'imuxsock: submitted=429 >> >>> ratelimit.discarded=0 ratelimit.numratelimiters=213 ' >> >>> >> >>> >> >>> Thanks, >> >>> Kaiwang >> >>> >> >>> 2011/12/27 Kaiwang Chen <[email protected]>: >> >>>> 2011/12/25 <[email protected]>: >> >>>>> On Fri, 23 Dec 2011, Kaiwang Chen wrote: >> >>>>> >> >>>>>> Hi all, >> >>>>>> >> >>>>>> I found rsyslogd occasionally produced corrupted log entries like >> >>>>>> >> >>>>>> <6>1 2011-12-23T23:03:18.089938+08:00 db1 <D0>#001 D^kernel - - >> >>>>>> device eth0 entered promiscuous mode >> >>>>>> >> >>>>>> I believe that problem appeared in earlier versions including 5.8.2. >> >>>>>> Looks like corruption never occur before hostname field. Is it a >> >>>>>> reported bug? Any clue? >> >>>>> >> >>>>> >> >>>>> the question is if this problem is in the raw message being sent >> >>>>> to you, or is it something added by the rsyslog processing. >> >>>>> >> >>>>> I would suggest adding something along the following line. >> >>>>> >> >>>>> :rawmesg, :contains, "#001" /var/log/badmessages:RSYSLOG_DEBUG >> >>>>> >> >>>>> this will look for any messages with the hex 01 character in them >> >>>>> and spit out all the info that you can use about the message in to >> >>>>> the file /var/log/badmessages >> >>>>> >> >>>>> run this for a little while and look to see what the raw message >> >>>>> that is being received over the wire looks like. If it's bad, then >> >>>>> you need to look at the sender. If the raw message looks sane, but >> >>>>> rsyslog isn't handling it right, then we can troubleshoot from there. >> >>>> >> >>>> Looks like it's related to tcp transmission. On the original server >> >>>> that get input from unix socket, it is OK; while on the centeral >> >>>> log server, the raw message is already corrupted. >> >>>> >> >>>> ===== on central log server >> >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Connection >> >>>> from >> >>>> UDP: [172.25.0.230]:53547 >> >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Received >> SNMP >> >>>> packet(s) from UDP: [172.25.0.230]:53547 >> >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 2#032#025<D5>1 - - >> >>>> Connection from UDP: [172.25.0.230]:53547 >> >>>> >> >>>> Debug line with all properties: >> >>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: >> >>>> 'gw71', PRI: 30, syslogtag 'snmpd[24254]', programname: 'snmpd', >> >>>> APP-NAME: 'snmpd', >> >>>> PROCID: '24254', MSGID: '-', >> >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >> >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 >> >>>> snmpd >> >>>> 24254 - - Connection from UDP: [172.25.0.230]:53547' >> >>>> >> >>>> Debug line with all properties: >> >>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: >> >>>> 'gw71', PRI: 30, syslogtag 'snmpd[24254]', programname: 'snmpd', >> >>>> APP-NAME: 'snmpd', >> >>>> PROCID: '24254', MSGID: '-', >> >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >> >>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >> >>>> escaped msg: ' Received SNMP packet(s) from UDP: >> [172.25.0.230]:53547' >> >>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 >> >>>> snmpd >> >>>> 24254 - - Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >> >>>> >> >>>> Debug line with all properties: >> >>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: >> >>>> 'gw71', PRI: 30, syslogtag 'snmpd[2#032#025<D5>1]', programname: >> 'snmpd', APP-NAME: >> >>>> 'snmpd', PROCID: '2#032#025<D5>1', MSGID: '-', >> >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >> >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 >> >>>> snmpd >> >>>> 2#032#025<D5>1 - - Connection from UDP: [172.25.0.230]:53547' >> >>>> >> >>>> ===== on the orginal host (*.* @@(o)172.25.0.230:514) >> >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Connection >> >>>> from >> >>>> UDP: [172.25.0.230]:53547 >> >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Received >> SNMP >> >>>> packet(s) from UDP: [172.25.0.230]:53547 >> >>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - - Connection >> >>>> from >> >>>> UDP: [172.25.0.230]:53547 >> >>>> >> >>>> Debug line with all properties: >> >>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: >> >>>> 30, syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: >> >>>> 'snmpd', >> >>>> PROCID: '24254', MSGID: '-', >> >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >> >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]: >> >>>> Connection from UDP: [172.25.0.230]:53547' >> >>>> >> >>>> Debug line with all properties: >> >>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: >> >>>> 30, syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: >> >>>> 'snmpd', >> >>>> PROCID: '24254', MSGID: '-', >> >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >> >>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >> >>>> escaped msg: ' Received SNMP packet(s) from UDP: >> [172.25.0.230]:53547' >> >>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]: >> >>>> Received SNMP packet(s) from UDP: [172.25.0.230]:53547' >> >>>> >> >>>> Debug line with all properties: >> >>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: >> >>>> 30, syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: >> >>>> 'snmpd', >> >>>> PROCID: '24254', MSGID: '-', >> >>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-', >> >>>> msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547' >> >>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]: >> >>>> Connection from UDP: [172.25.0.230]:53547' >> >>>> >> >>>> >> >>>> Thanks, >> >>>> Kaiwang >> >>>>> >> >>>>> David Lang >> >>>>> >> >>>>> > rsyslogd 5.8.6, compiled with: >> >>>>>> >> >>>>>> FEATURE_REGEXP: Yes >> >>>>>> FEATURE_LARGEFILE: No >> >>>>>> GSSAPI Kerberos 5 support: Yes >> >>>>>> FEATURE_DEBUG (debug build, slow code): No >> >>>>>> 32bit Atomic operations supported: Yes >> >>>>>> 64bit Atomic operations supported: Yes >> >>>>>> Runtime Instrumentation (slow code): No >> >>>>>> >> >>>>>> /etc/rsyslog.conf >> >>>>>> $ActionFileDefaultTemplate RSYSLOG_SyslogProtocol23Format >> >>>>>> $ActionForwardDefaultTemplate RSYSLOG_SyslogProtocol23Format >> >>>>>> >> >>>>>> $ModLoad imklog >> >>>>>> $ModLoad imuxsock >> >>>>>> $ModLoad impstats >> >>>>>> >> >>>>>> $SystemLogSocketIgnoreMsgTimestamp off >> >>>>>> $SystemLogUsePIDFromSystem on >> >>>>>> >> >>>>>> $PStatInterval 600 >> >>>>>> $PStatSeverity 7 >> >>>>>> >> >>>>>> $WorkDirectory /var/spool/rsyslog >> >>>>>> >> >>>>>> $MainMsgQueueSaveOnShutdown on >> >>>>>> $MainMsgQueueFileName mq >> >>>>>> $MainMsgQueueMaxFileSize 5m >> >>>>>> >> >>>>>> $ActionQueueType LinkedList >> >>>>>> $ActionQueueSaveOnShutdown on >> >>>>>> $ActionQueueFileName dbq >> >>>>>> $ActionQueueMaxFileSize 10m >> >>>>>> $ActionResumeRetryCount -1 >> >>>>>> *.* @@(o)10.2.3.4 >> >>>>>> >> >>>>>> >> >>>>>> # Log all kernel messages to the console. >> >>>>>> # Logging much else clutters up the screen. >> >>>>>> #kern.* >> >>>>>> /dev/console >> >>>>>> >> >>>>>> # Log anything (except mail) of level info or higher. >> >>>>>> # Don't log private authentication messages! >> >>>>>> *.info;mail.none;authpriv.none;cron.none >> >>>>>> /var/log/messages >> >>>>>> >> >>>>>> # The authpriv file has restricted access. >> >>>>>> authpriv.* >> >>>>>> /var/log/secure >> >>>>>> >> >>>>>> # Log all the mail messages in one place. >> >>>>>> mail.* >> >>>>>> -/var/log/maillog >> >>>>>> >> >>>>>> >> >>>>>> # Log cron stuff >> >>>>>> cron.* >> >>>>>> /var/log/cron >> >>>>>> >> >>>>>> # Everybody gets emergency messages >> *.emerg >> >>>>>> * >> >>>>>> >> >>>>>> # Save news errors of level crit and higher in a special file. >> >>>>>> uucp,news.crit >> >>>>>> /var/log/spooler >> >>>>>> >> >>>>>> # Save boot messages also to boot.log >> >>>>>> local7.* >> >>>>>> /var/log/boot.log >> >>>>>> >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Kaiwang >> >>>>>> _______________________________________________ >> >>>>>> rsyslog mailing list >> >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >>>>>> http://www.rsyslog.com/professional-services/ >> >>>>>> >> >>>>> _______________________________________________ >> >>>>> rsyslog mailing list >> >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >> >>>>> http://www.rsyslog.com/professional-services/ >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/

