Please give v4 a try, the bug is potentially fixed there. If it actually is, this gives me a clue of where to look at.
Rainer > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Nathan March > Sent: Saturday, June 06, 2009 12:09 AM > To: rsyslog-users > Subject: Re: [rsyslog] Possible bug? 100% cpu on a server > thread and the client stops sending data > > Any hope of this bug getting some attention soon? =) > > Thanks, > Nathan > > > Patrick Shen wrote: > > Hi Rainer, > > > > Thanks for the quick reply. > > > > My environment is like below: > > > > CLIENT --- > > | > > | (TCP) > > --------> SERVER ---- > > | > > | (RELP Relay) > > --------------> SERVER Standby > > > > I've tested, even if the standby server is under high load, > one of our > > clients won't accept any new network connections. > > > > Just for your information. > > > > Best regards, > > Patrick > > > > Rainer Gerhards wrote: > > > >> Hi all, > >> > >> thanks for the detailed info. Will look at it asap, but I > am currently tied > >> up with some other work... > >> > >> Please let me know anything else that may be relevant. > >> > >> Thanks, > >> Rainer > >> > >> > >>> -----Original Message----- > >>> From: [email protected] [mailto:rsyslog- > >>> [email protected]] On Behalf Of Patrick Shen > >>> Sent: Monday, May 25, 2009 8:17 AM > >>> To: rsyslog-users > >>> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server > thread and the > >>> client stops sending data > >>> > >>> Hi Nathan, > >>> > >>> In my company, we have the similar symptom. We have 50+ > servers which > >>> send logs to the central logging servers. Both clients > and server use > >>> rsyslog v3.20.5 right now. In the past when we use > v3.20.0, the symptom > >>> is more common when server is under high load average. > >>> > >>> It's also weird for us, not all of clients were lost of > connections. We > >>> have 2-3 clients which is running the same application > will looks like > >>> hanging up when server is high-load. New ssh connection > is very slow at > >>> that time. But the connected ssh session works normal (We > have some > >>> prepared screen session on other clients which connect to > them via ssh). > >>> > >>> When the clients are lost of connections. So far our > solution is to > >>> restart rsyslog both on affected clients and server. Then > the clients > >>> get back and look normal. > >>> > >>> We've suffered it for a while, but I didn't take chance > to analyze it > >>> like you. > >>> > >>> Best regards, > >>> Patrick > >>> > >>> Nathan March wrote: > >>> > >>>> Having a weird issue here..... Testing out a new rsyslog > deployment and > >>>> I've got around 30 servers logging to one machine. On > one of the clients > >>>> we had an issue where ssh was mysteriously very slow to login and > >>>> tracked it down to rsyslog causing issues. > Simultaneously, one of the > >>>> threads on the logging server spiked up to 100%. New log > data from the > >>>> client would stop showing up on the server at this poitn. > >>>> > >>>> On the client after attempting to gracefully stop rsyslog: > >>>> > >>>> vanguard etc # ps aux | grep -i rsyslog > >>>> root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > /etc/rsyslog.conf > >>>> root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > /etc/rsyslog.conf > >>>> root 6203 0.0 0.0 1512 524 pts/3 S 15:09 > 0:00 grep -i > >>>> rsyslog > >>>> vanguard etc # strace -p 3764 > >>>> Process 3764 attached - interrupt to quit > >>>> send(11, > "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"..., > >>>> 261, 0 <unfinished ...> > >>>> Process 3764 detached > >>>> > >>>> FD 11 being the socket to the logging server, it never > does anything > >>>> asides from that send. > >>>> > >>>> On the server: > >>>> > >>>> ldap ~ # ps aux | grep rsyslog > >>>> root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07 > >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f > >>>> > >> /etc/rsyslog/rsyslog.conf > >> > >>>> Stracing the 7385 pid just shows this repeating over and > over (FD 4 > >>>> being the network socket again): > >>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 > EAGAIN (Resource > >>>> temporarily unavailable) > >>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 > EAGAIN (Resource > >>>> temporarily unavailable) > >>>> > >>>> It happened 3 times within a relatively short period of > time, sometimes > >>>> within minutes of me restarting rsyslog on both > machines. I enabled > >>>> debugging and it took around 8 hours to pop up again. > >>>> > >>>> With debugging, I pulled this from the rsyslog server: > >>>> > >>>> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0. > >>>> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: > pszRcvBuf (nil), > >>>> lenRcvBuf 0 > >>>> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf > >>>> 0x5c11c0, lenRcvBuf -1 > >>>> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf > >>>> 0x5cb510, lenRcvBuf -1 > >>>> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf > >>>> 0x5d45c0, lenRcvBuf -1 > >>>> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf > >>>> 0x5dd010, lenRcvBuf -1 > >>>> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf > >>>> 0x5e58b0, lenRcvBuf -1 > >>>> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf > >>>> 0x5ee150, lenRcvBuf -1 > >>>> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf > >>>> 0x5f7320, lenRcvBuf -1 > >>>> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf > >>>> 0x5fe0a0, lenRcvBuf -1 > >>>> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf > >>>> 0x608350, lenRcvBuf -1 > >>>> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf > >>>> 0x612110, lenRcvBuf -1 > >>>> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf > >>>> 0x615040, lenRcvBuf -1 > >>>> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf > >>>> 0x6218f0, lenRcvBuf -1 > >>>> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf > >>>> 0x62a190, lenRcvBuf -1 > >>>> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf > >>>> 0x634fe0, lenRcvBuf -1 > >>>> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf > >>>> 0x63b000, lenRcvBuf -1 > >>>> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf > >>>> 0x643c70, lenRcvBuf -1 > >>>> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf > >>>> 0x64dbb0, lenRcvBuf -1 > >>>> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf > >>>> 0x650960, lenRcvBuf -1 > >>>> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf > >>>> 0x65dbc0, lenRcvBuf -1 > >>>> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf > >>>> 0x666020, lenRcvBuf -1 > >>>> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf > >>>> 0x66e480, lenRcvBuf -1 > >>>> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf > >>>> 0x676d20, lenRcvBuf 78 > >>>> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf > >>>> 0x67fdd0, lenRcvBuf -1 > >>>> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf > >>>> 0x687f10, lenRcvBuf -1 > >>>> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf > >>>> 0x6907b0, lenRcvBuf -1 > >>>> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf > >>>> 0x699050, lenRcvBuf -1 > >>>> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: > pszRcvBuf (nil), > >>>> lenRcvBuf 0 > >>>> > >>>> This repeats enough to generate around 29 million lines > in the debug log > >>>> file. > >>>> > >>>> In the client log, there's plenty of these which seems > somewhat normal: > >>>> > >>>> 6509.743164000:imuxsock.c: --------imuxsock calling > select, active file > >>>> descriptors (max 11): 11 > >>>> 6509.743231000:imuxsock.c: Message from UNIX socket: #11 > >>>> 6509.743262000:imuxsock.c: logmsg: flags 4, from > 'vanguard', msg May 20 > >>>> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum > >>>> 6509.743281000:imuxsock.c: Message has legacy syslog format. > >>>> 6509.743302000:imuxsock.c: main queue: entry added, size > now 579 entries > >>>> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy > >>>> 6509.743343000:imuxsock.c: main queue: EnqueueMsg > advised worker start > >>>> > >>>> However comparing it to an earlier point in the log > there's none of the > >>>> associated tcp queue entries like this: > >>>> > >>>> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, > requested 107 > >>>> 3587.234204000:main queue:Reg/w0: main queue: entering > rate limiter > >>>> 3587.234229000:main queue:Reg/w0: main queue: entry > deleted, state 0, > >>>> size now 0 entries > >>>> 3587.234255000:main queue:Reg/w0: Called action, logging > to builtin-file > >>>> 3587.234285000:main queue:Reg/w0: (/var/log/maillog) > >>>> 3587.234314000:main queue:Reg/w0: Called action, logging > to builtin-fwd > >>>> 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com > >>>> 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp > >>>> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78 > >>>> 3587.234446000:main queue:Reg/w0: main queue: entering > rate limiter > >>>> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE, > >>>> waiting for work. > >>>> > >>>> Anyone able to shed some light on this situation? I've > got the full > >>>> debug logs if it's useful to anyone to take a closer look. > >>>> > >>>> Thanks! > >>>> > >>>> - Nathan > >>>> > >>>> > >>>> > >>>> > >> > -------------------------------------------------------------- > -------------- > >> > >>> -------- > >>> > >>>> Server config file > >>>> > >>>> > >> > -------------------------------------------------------------- > -------------- > >> > >>> -------- > >>> > >>>> $ModLoad immark.so # provides --MARK-- message capability > >>>> $ModLoad imuxsock.so # provides support for local system > logging (e.g. > >>>> via logger command) > >>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd) > >>>> > >>>> $umask 0137 > >>>> $DirCreateMode 0640 > >>>> $FileCreateMode 0640 > >>>> $FileOwner root > >>>> $FileGroup admin > >>>> > >>>> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages" > >>>> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron" > >>>> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure" > >>>> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog" > >>>> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler" > >>>> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot" > >>>> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog" > >>>> > >>>> :msg, contains, "no keys found for" ~ > >>>> :msg, contains, "session opened for user" ~ > >>>> > >>>> # Log all kernel messages to the console. > >>>> # Logging much else clutters up the screen. > >>>> kern.* > /dev/console > >>>> > >>>> # Log anything (except mail) of level info or higher. > >>>> # Don't log private authentication messages! > >>>> *.info;mail.none;cron.none ?DynFile > >>>> > >>>> *.warn;\ > >>>> authpriv.none;cron.none;mail.none;news.none > ?DynFileSyslog > >>>> > >>>> # Log all the mail messages in one place. > >>>> mail.* > ?DynFileMail > >>>> > >>>> # Log cron stuff > >>>> cron.* > ?DynFileCron > >>>> > >>>> # Everybody gets emergency messages > >>>> *.emerg * > >>>> > >>>> # Save news errors of level crit and higher in a special file > >>>> uucp,news.crit > ?DynFileSpool > >>>> > >>>> # Save boot messages also to boot.log > >>>> local7.* > ?DynFileBoot > >>>> > >>>> # Relp config for PCI > >>>> $ModLoad imrelp > >>>> $InputRELPServerRun 2515 > >>>> > >>>> # make gtls driver the default > >>>> $DefaultNetstreamDriver gtls > >>>> > >>>> # certificate files > >>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem > >>>> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem > >>>> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem > >>>> > >>>> $ModLoad imtcp.so > >>>> > >>>> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode > >>>> $InputTCPServerStreamDriverAuthMode anon # client is NOT > authenticated > >>>> $InputTCPServerRun 10514 # start up listener at port 10514 > >>>> $InputTCPMaxSessions 500 > >>>> > >>>> > >>>> > >> > -------------------------------------------------------------- > -------------- > >> > >>> -------- > >>> > >>>> Client config file > >>>> > >>>> > >> > -------------------------------------------------------------- > -------------- > >> > >>> -------- > >>> > >>>> $ModLoad imuxsock.so # provides support for local system > logging (e.g. > >>>> via logger command) > >>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd) > >>>> > >>>> $umask 0137 > >>>> $DirCreateMode 0640 > >>>> $FileCreateMode 0640 > >>>> $FileOwner root > >>>> $FileGroup admin > >>>> > >>>> > >>>> # Log anything 'info' or higher, but lower than 'warn'. > >>>> # Exclude authpriv, cron, mail, and news. These are > logged elsewhere. > >>>> *.info;*.!warn;\ > >>>> authpriv.none;cron.none;mail.none;news.none > /var/log/messages > >>>> > >>>> # Log anything 'warn' or higher. > >>>> # Exclude authpriv, cron, mail, and news. These are > logged elsewhere. > >>>> *.warn;\ > >>>> authpriv.none;cron.none;mail.none;news.none > /var/log/syslog > >>>> > >>>> # Debugging information is logged here. > >>>> *.=debug > /var/log/debug > >>>> > >>>> # Private authentication message logging: > >>>> authpriv.* > /var/log/secure > >>>> > >>>> # Cron related logs: > >>>> cron.* > /var/log/cron > >>>> > >>>> # Mail related logs: > >>>> mail.* > /var/log/maillog > >>>> > >>>> # Emergency level messages go to all users: > >>>> *.emerg * > >>>> > >>>> > >>>> ## Uncomment these lines to use RELP instead for PCI > compliance (stunnel > >>>> required) > >>>> #$ModLoad omrelp > >>>> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat > >>>> > >>>> # certificate files - just CA for a client > >>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem > >>>> > >>>> # set up the action > >>>> $DefaultNetstreamDriver gtls # use gtls netstream driver > >>>> $ActionSendStreamDriverMode 1 # require TLS for the connection > >>>> $ActionSendStreamDriverAuthMode anon # server is NOT > authenticated > >>>> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages > >>>> > >>>> _______________________________________________ > >>>> rsyslog mailing list > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>>> http://www.rsyslog.comoften > >>>> > >>> -- > >>> Patrick Shen > >>> Operations Engineer > >>> > >>> net mobile AG Shanghai office > >>> > >>> T: +86 21 61350900 - 222 > >>> F: +86 21 61350906 > >>> M: +86 13122245730 > >>> > >>> FE55 6A7F 3192 F359 C1DF 12F6 9078 58B4 57AF 0BE3 > >>> > >>> _______________________________________________ > >>> rsyslog mailing list > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > >>> http://www.rsyslog.com > >>> > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > >> > > > > > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

