Hello,

I check ChangeLog and found 5.8.4 fixed some bug related to race
condition as discussed in
http://bugzilla.adiscon.com/show_bug.cgi?id=275

with patch protecting modification in the getMSGID():
http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=d66f7b95e081ecb356cc230685c5b7c90f8590b4


I guess the situation is almost identical to what in my case:
corruption is always at the APPNAME field and there is, I believe, a
typo in line 1955 from getAPPNAME() effectively ruins the mutex
proctection:
1947 /* rgerhards, 2005-11-24
1948  */
1949 char *getAPPNAME(msg_t *pM, sbool bLockMutex)
1950 {
1951         uchar *pszRet;
1952
1953         assert(pM != NULL);
1954         if(bLockMutex == LOCK_MUTEX)
1955                 MsgUnlock(pM);  /* should be MsgLock(pM) */
1956         prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED);
1957         if(pM->pCSAPPNAME == NULL)
1958                 pszRet = UCHAR_CONSTANT("");
1959         else
1960                 pszRet = rsCStrGetSzStrNoNULL(pM->pCSAPPNAME);
1961         if(bLockMutex == LOCK_MUTEX)
1962                 MsgUnlock(pM);
1963         return (char*)pszRet;
1964 }

Similar cases were spotted in getPROCID(), getStructuredData() and
getProgramName().  Hopefully the following patch against 5.8.6 tarball
should solve the problem, I will report back later.
--- rsyslog-5.8.6/runtime/msg.c.orig    2011-10-21 17:53:02.000000000 +0800
+++ rsyslog-5.8.6/runtime/msg.c 2012-01-05 21:54:26.594666510 +0800
@@ -1609,7 +1609,7 @@

        ISOBJ_TYPE_assert(pM, msg);
        if(bLockMutex == LOCK_MUTEX)
-               MsgUnlock(pM);
+               MsgLock(pM);
        preparePROCID(pM, MUTEX_ALREADY_LOCKED);
        if(pM->pCSPROCID == NULL)
                pszRet = UCHAR_CONSTANT("");
@@ -1846,7 +1846,7 @@
 {
        uchar *pszRet;

-       MsgUnlock(pM);
+       MsgLock(pM);
        if(pM->pCSStrucData == NULL)
                pszRet = UCHAR_CONSTANT("-");
        else
@@ -1894,7 +1894,7 @@
        uchar *pszRet;

        if(bLockMutex == LOCK_MUTEX)
-               MsgUnlock(pM);
+               MsgLock(pM);
        prepareProgramName(pM, MUTEX_ALREADY_LOCKED);
        if(pM->pCSProgName == NULL)
                pszRet = UCHAR_CONSTANT("");
@@ -1952,7 +1952,7 @@

        assert(pM != NULL);
        if(bLockMutex == LOCK_MUTEX)
-               MsgUnlock(pM);
+               MsgLock(pM);
        prepareAPPNAME(pM, MUTEX_ALREADY_LOCKED);
        if(pM->pCSAPPNAME == NULL)
                pszRet = UCHAR_CONSTANT("");


By the way, many crashes led to the convergence, for example
//// db1 core.26581 signal 11, Segmentation fault.
(gdb) p *pThis
$1 = {
  pBuf = 0x7f7f5c000cd0 "sshd\177\177",
  pszBuf = 0x7f7f5c000dd0 "sshd",
  iBufSize = 4,
  iStrLen = 4
}
(gdb) where
#0  0x000000000041e6a9 in rsCStrGetSzStr (pThis=0x131cb10) at stringbuf.c:334
#1  0x000000000041b1bf in getAPPNAME (pM=0x131c140, bLockMutex=1
'\001') at msg.c:1960
#2  0x000000000041c0f8 in MsgGetProp (pMsg=0x131c140, pTpe=0x12ef350,
propID=115 's', pPropLen=0x43fb8f00,
    pbMustBeFreed=0xfefefefefefefeff) at msg.c:2429
#3  0x000000000043be88 in tplToString (pTpl=0x12eeca0, pMsg=0x131c140,
ppBuf=0x12f7910, pLenBuf=0x12f7930) at ../template.c:119
#4  0x0000000000438498 in prepareBatch (pAction=0x12f72d0,
pBatch=0x12f7898, pbShutdownImmediate=<value optimized out>)
    at ../action.c:738
#5  processBatchMain (pAction=0x12f72d0, pBatch=0x12f7898,
pbShutdownImmediate=<value optimized out>) at ../action.c:1163
#6  0x000000000043190d in ConsumerReg (pThis=0x12f73f0,
pWti=0x12f7870) at queue.c:1705
#7  0x000000000042b0bd in wtiWorker (pThis=0x12f7870) at wti.c:315
#8  0x000000000042abd8 in wtpWorker (arg=0x12f7870) at wtp.c:387
#9  0x00000031d5a06367 in __nptl_setxid () from /lib64/libpthread.so.0
#10 0x0000000000000000 in ?? ()



Thanks,
Kaiwang

2011/12/31 Kaiwang Chen <[email protected]>:
> Even worse case
> <6>1 2011-12-31T12:04:05.036812+08:00 squid6 kernel  - - imklog 5.8.6,
> log source = /proc/kmsg started.
> ...several SYN flooding entries in the same second...
> *** glibc detected *** /sbin/rsyslogd: free(): invalid next size
> (fast): 0x00007fafb8000e00 ***
> *** glibc detected *** /sbin/rsyslogd: malloc(): memory corruption:
> 0x00007fafb8000e20 ***
>
>  And possibly hours later the host was not responsive to ssh
> connection requests, and even TTY login attempts failed...
> Interestingly the debug log in another file continues a little beyond:
>
> Debug line with all properties:
> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', PRI: 47,
> syslogtag 'rsyslogd-pstats:', programname: 'rsyslogd-pstats',
> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-',
> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-',
> msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 '
> escaped msg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 '
> inputname: impstats rawmsg: 'main Q[DA]: size=0 enqueued=0 full=0 maxqsize=0 '
>
> Debug line with all properties:
> FROMHOST: 'squid6', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid6', PRI: 47,
> syslogtag 'rsyslogd-pstats:', programname: 'rsyslogd-pstats',
> APP-NAME: 'rsyslogd-pstats', PROCID: '', MSGID: '-',
> TIMESTAMP: 'Dec 31 12:24:05', STRUCTURED-DATA: '-',
> msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 '
> escaped msg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 '
> inputname: impstats rawmsg: 'main Q: size=4 enqueued=30 full=0 maxqsize=5 '
>
> Fortunately the squid service on that host was not badly affected. So
> I did graceful failover and reboot the host, everything became OK.
> Although there was no direct timestamp associated, I guess the problem
> occured soon after rsyslogd started. I also spotted several corrupted
> messages on other hosts when rsyslogd started. They are same version:
>
> # rsyslogd -v
> rsyslogd 5.8.6, compiled with:
>        FEATURE_REGEXP:                         Yes
>        FEATURE_LARGEFILE:                      No
>        GSSAPI Kerberos 5 support:              Yes
>        FEATURE_DEBUG (debug build, slow code): No
>        32bit Atomic operations supported:      Yes
>        64bit Atomic operations supported:      Yes
>        Runtime Instrumentation (slow code):    No
>
>
> Thanks,
> Kaiwang
>
> 2011/12/31 Kaiwang Chen <[email protected]>:
>> Also spotted corrupted messages from imuxsock:
>>
>> Debug line with all properties:
>> FROMHOST: 'squid9', fromhost-ip: '127.0.0.1', HOSTNAME: 'squid9', PRI: 47,
>> syslogtag 'rsyslogd-pstats:', programname: '<88>+?<99>2
>> <88>+?<99>rssyslogd-pstats', APP-NAME: '', PROCID: '', MSGID: '-',
>> TIMESTAMP: 'Dec 30 21:46:14', STRUCTURED-DATA: '-',
>> msg: 'imuxsock: submitted=429 ratelimit.discarded=0
>> ratelimit.numratelimiters=213 '
>> escaped msg: 'imuxsock: submitted=429 ratelimit.discarded=0
>> ratelimit.numratelimiters=213 '
>> inputname: impstats rawmsg: 'imuxsock: submitted=429
>> ratelimit.discarded=0 ratelimit.numratelimiters=213 '
>>
>>
>> Thanks,
>> Kaiwang
>>
>> 2011/12/27 Kaiwang Chen <[email protected]>:
>>> 2011/12/25  <[email protected]>:
>>>> On Fri, 23 Dec 2011, Kaiwang Chen wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I found rsyslogd occasionally produced corrupted log entries like
>>>>>
>>>>> <6>1 2011-12-23T23:03:18.089938+08:00 db1 <D0>#001 D^kernel  - -
>>>>> device eth0 entered promiscuous mode
>>>>>
>>>>> I believe that problem appeared in earlier versions including 5.8.2.
>>>>> Looks like corruption never occur before hostname field. Is it a
>>>>> reported bug? Any clue?
>>>>
>>>>
>>>> the question is if this problem is in the raw message being sent to you, or
>>>> is it something added by the rsyslog processing.
>>>>
>>>> I would suggest adding something along the following line.
>>>>
>>>> :rawmesg, :contains, "#001"  /var/log/badmessages:RSYSLOG_DEBUG
>>>>
>>>> this will look for any messages with the hex 01 character in them and spit
>>>> out all the info that you can use about the message in to the file
>>>> /var/log/badmessages
>>>>
>>>> run this for a little while and look to see what the raw message that is
>>>> being received over the wire looks like. If it's bad, then you need to look
>>>> at the sender. If the raw message looks sane, but rsyslog isn't handling it
>>>> right, then we can troubleshoot from there.
>>>
>>> Looks like it's related to tcp transmission. On the original server
>>> that get input from unix socket, it is OK; while on the centeral log
>>> server, the raw message is already corrupted.
>>>
>>> ===== on central log server
>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Connection from
>>> UDP: [172.25.0.230]:53547
>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Received SNMP
>>> packet(s) from UDP: [172.25.0.230]:53547
>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 2#032#025<D5>1 - -
>>> Connection from UDP: [172.25.0.230]:53547
>>>
>>> Debug line with all properties:
>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', PRI: 
>>> 30,
>>> syslogtag 'snmpd[24254]', programname: 'snmpd', APP-NAME: 'snmpd',
>>> PROCID: '24254', MSGID: '-',
>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd
>>> 24254 - -  Connection from UDP: [172.25.0.230]:53547'
>>>
>>> Debug line with all properties:
>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', PRI: 
>>> 30,
>>> syslogtag 'snmpd[24254]', programname: 'snmpd', APP-NAME: 'snmpd',
>>> PROCID: '24254', MSGID: '-',
>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>> escaped msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd
>>> 24254 - -  Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>
>>> Debug line with all properties:
>>> FROMHOST: '172.25.0.71', fromhost-ip: '172.25.0.71', HOSTNAME: 'gw71', PRI: 
>>> 30,
>>> syslogtag 'snmpd[2#032#025<D5>1]', programname: 'snmpd', APP-NAME:
>>> 'snmpd', PROCID: '2#032#025<D5>1', MSGID: '-',
>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> inputname: imptcp rawmsg: '<30>1 2011-12-27T14:06:56+08:00 gw71 snmpd
>>> 2#032#025<D5>1 - -  Connection from UDP: [172.25.0.230]:53547'
>>>
>>> ===== on the orginal host  (*.*       @@(o)172.25.0.230:514)
>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Connection from
>>> UDP: [172.25.0.230]:53547
>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Received SNMP
>>> packet(s) from UDP: [172.25.0.230]:53547
>>> <30>1 2011-12-27T14:06:56+08:00 gw71 snmpd 24254 - -  Connection from
>>> UDP: [172.25.0.230]:53547
>>>
>>> Debug line with all properties:
>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30,
>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd',
>>> PROCID: '24254', MSGID: '-',
>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]:
>>> Connection from UDP: [172.25.0.230]:53547'
>>>
>>> Debug line with all properties:
>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30,
>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd',
>>> PROCID: '24254', MSGID: '-',
>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>> msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>> escaped msg: ' Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]:
>>> Received SNMP packet(s) from UDP: [172.25.0.230]:53547'
>>>
>>> Debug line with all properties:
>>> FROMHOST: 'gw71', fromhost-ip: '127.0.0.1', HOSTNAME: 'gw71', PRI: 30,
>>> syslogtag 'snmpd[24254]:', programname: 'snmpd', APP-NAME: 'snmpd',
>>> PROCID: '24254', MSGID: '-',
>>> TIMESTAMP: 'Dec 27 14:06:56', STRUCTURED-DATA: '-',
>>> msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> escaped msg: ' Connection from UDP: [172.25.0.230]:53547'
>>> inputname: imuxsock rawmsg: '<30>Dec 27 14:06:56 snmpd[24254]:
>>> Connection from UDP: [172.25.0.230]:53547'
>>>
>>>
>>> Thanks,
>>> Kaiwang
>>>>
>>>> David Lang
>>>>
>>>>  > rsyslogd 5.8.6, compiled with:
>>>>>
>>>>>       FEATURE_REGEXP:                         Yes
>>>>>       FEATURE_LARGEFILE:                      No
>>>>>       GSSAPI Kerberos 5 support:              Yes
>>>>>       FEATURE_DEBUG (debug build, slow code): No
>>>>>       32bit Atomic operations supported:      Yes
>>>>>       64bit Atomic operations supported:      Yes
>>>>>       Runtime Instrumentation (slow code):    No
>>>>>
>>>>> /etc/rsyslog.conf
>>>>> $ActionFileDefaultTemplate RSYSLOG_SyslogProtocol23Format
>>>>> $ActionForwardDefaultTemplate RSYSLOG_SyslogProtocol23Format
>>>>>
>>>>> $ModLoad imklog
>>>>> $ModLoad imuxsock
>>>>> $ModLoad impstats
>>>>>
>>>>> $SystemLogSocketIgnoreMsgTimestamp  off
>>>>> $SystemLogUsePIDFromSystem on
>>>>>
>>>>> $PStatInterval 600
>>>>> $PStatSeverity 7
>>>>>
>>>>> $WorkDirectory /var/spool/rsyslog
>>>>>
>>>>> $MainMsgQueueSaveOnShutdown on
>>>>> $MainMsgQueueFileName mq
>>>>> $MainMsgQueueMaxFileSize 5m
>>>>>
>>>>> $ActionQueueType LinkedList
>>>>> $ActionQueueSaveOnShutdown on
>>>>> $ActionQueueFileName dbq
>>>>> $ActionQueueMaxFileSize 10m
>>>>> $ActionResumeRetryCount -1
>>>>> *.*       @@(o)10.2.3.4
>>>>>
>>>>>
>>>>> # Log all kernel messages to the console.
>>>>> # Logging much else clutters up the screen.
>>>>> #kern.*                                                 /dev/console
>>>>>
>>>>> # Log anything (except mail) of level info or higher.
>>>>> # Don't log private authentication messages!
>>>>> *.info;mail.none;authpriv.none;cron.none                /var/log/messages
>>>>>
>>>>> # The authpriv file has restricted access.
>>>>> authpriv.*                                              /var/log/secure
>>>>>
>>>>> # Log all the mail messages in one place.
>>>>> mail.*                                                  -/var/log/maillog
>>>>>
>>>>>
>>>>> # Log cron stuff
>>>>> cron.*                                                  /var/log/cron
>>>>>
>>>>> # Everybody gets emergency messages
>>>>> *.emerg                                                 *
>>>>>
>>>>> # Save news errors of level crit and higher in a special file.
>>>>> uucp,news.crit                                          /var/log/spooler
>>>>>
>>>>> # Save boot messages also to boot.log
>>>>> local7.*                                                /var/log/boot.log
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Kaiwang
>>>>> _______________________________________________
>>>>> rsyslog mailing list
>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>>> http://www.rsyslog.com/professional-services/
>>>>>
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/

Reply via email to