Confirmed as still happening on 4.1.7 - Nathan
Rainer Gerhards wrote: > Please give v4 a try, the bug is potentially fixed there. If it actually is, > this gives me a clue of where to look at. > > Rainer > > >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Nathan March >> Sent: Saturday, June 06, 2009 12:09 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server >> thread and the client stops sending data >> >> Any hope of this bug getting some attention soon? =) >> >> Thanks, >> Nathan >> >> >> Patrick Shen wrote: >> >>> Hi Rainer, >>> >>> Thanks for the quick reply. >>> >>> My environment is like below: >>> >>> CLIENT --- >>> | >>> | (TCP) >>> --------> SERVER ---- >>> | >>> | (RELP Relay) >>> --------------> SERVER Standby >>> >>> I've tested, even if the standby server is under high load, >>> >> one of our >> >>> clients won't accept any new network connections. >>> >>> Just for your information. >>> >>> Best regards, >>> Patrick >>> >>> Rainer Gerhards wrote: >>> >>> >>>> Hi all, >>>> >>>> thanks for the detailed info. Will look at it asap, but I >>>> >> am currently tied >> >>>> up with some other work... >>>> >>>> Please let me know anything else that may be relevant. >>>> >>>> Thanks, >>>> Rainer >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: [email protected] [mailto:rsyslog- >>>>> [email protected]] On Behalf Of Patrick Shen >>>>> Sent: Monday, May 25, 2009 8:17 AM >>>>> To: rsyslog-users >>>>> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server >>>>> >> thread and the >> >>>>> client stops sending data >>>>> >>>>> Hi Nathan, >>>>> >>>>> In my company, we have the similar symptom. We have 50+ >>>>> >> servers which >> >>>>> send logs to the central logging servers. Both clients >>>>> >> and server use >> >>>>> rsyslog v3.20.5 right now. In the past when we use >>>>> >> v3.20.0, the symptom >> >>>>> is more common when server is under high load average. >>>>> >>>>> It's also weird for us, not all of clients were lost of >>>>> >> connections. We >> >>>>> have 2-3 clients which is running the same application >>>>> >> will looks like >> >>>>> hanging up when server is high-load. New ssh connection >>>>> >> is very slow at >> >>>>> that time. But the connected ssh session works normal (We >>>>> >> have some >> >>>>> prepared screen session on other clients which connect to >>>>> >> them via ssh). >> >>>>> When the clients are lost of connections. So far our >>>>> >> solution is to >> >>>>> restart rsyslog both on affected clients and server. Then >>>>> >> the clients >> >>>>> get back and look normal. >>>>> >>>>> We've suffered it for a while, but I didn't take chance >>>>> >> to analyze it >> >>>>> like you. >>>>> >>>>> Best regards, >>>>> Patrick >>>>> >>>>> Nathan March wrote: >>>>> >>>>> >>>>>> Having a weird issue here..... Testing out a new rsyslog >>>>>> >> deployment and >> >>>>>> I've got around 30 servers logging to one machine. On >>>>>> >> one of the clients >> >>>>>> we had an issue where ssh was mysteriously very slow to login and >>>>>> tracked it down to rsyslog causing issues. >>>>>> >> Simultaneously, one of the >> >>>>>> threads on the logging server spiked up to 100%. New log >>>>>> >> data from the >> >>>>>> client would stop showing up on the server at this poitn. >>>>>> >>>>>> On the client after attempting to gracefully stop rsyslog: >>>>>> >>>>>> vanguard etc # ps aux | grep -i rsyslog >>>>>> root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >> /etc/rsyslog.conf >> >>>>>> root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >> /etc/rsyslog.conf >> >>>>>> root 6203 0.0 0.0 1512 524 pts/3 S 15:09 >>>>>> >> 0:00 grep -i >> >>>>>> rsyslog >>>>>> vanguard etc # strace -p 3764 >>>>>> Process 3764 attached - interrupt to quit >>>>>> send(11, >>>>>> >> "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"..., >> >>>>>> 261, 0 <unfinished ...> >>>>>> Process 3764 detached >>>>>> >>>>>> FD 11 being the socket to the logging server, it never >>>>>> >> does anything >> >>>>>> asides from that send. >>>>>> >>>>>> On the server: >>>>>> >>>>>> ldap ~ # ps aux | grep rsyslog >>>>>> root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07 >>>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>>>> >>>>>> >>>> /etc/rsyslog/rsyslog.conf >>>> >>>> >>>>>> Stracing the 7385 pid just shows this repeating over and >>>>>> >> over (FD 4 >> >>>>>> being the network socket again): >>>>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 >>>>>> >> EAGAIN (Resource >> >>>>>> temporarily unavailable) >>>>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 >>>>>> >> EAGAIN (Resource >> >>>>>> temporarily unavailable) >>>>>> >>>>>> It happened 3 times within a relatively short period of >>>>>> >> time, sometimes >> >>>>>> within minutes of me restarting rsyslog on both >>>>>> >> machines. I enabled >> >>>>>> debugging and it took around 8 hours to pop up again. >>>>>> >>>>>> With debugging, I pulled this from the rsyslog server: >>>>>> >>>>>> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0. >>>>>> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: >>>>>> >> pszRcvBuf (nil), >> >>>>>> lenRcvBuf 0 >>>>>> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf >>>>>> 0x5c11c0, lenRcvBuf -1 >>>>>> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf >>>>>> 0x5cb510, lenRcvBuf -1 >>>>>> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf >>>>>> 0x5d45c0, lenRcvBuf -1 >>>>>> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf >>>>>> 0x5dd010, lenRcvBuf -1 >>>>>> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf >>>>>> 0x5e58b0, lenRcvBuf -1 >>>>>> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf >>>>>> 0x5ee150, lenRcvBuf -1 >>>>>> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf >>>>>> 0x5f7320, lenRcvBuf -1 >>>>>> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf >>>>>> 0x5fe0a0, lenRcvBuf -1 >>>>>> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf >>>>>> 0x608350, lenRcvBuf -1 >>>>>> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf >>>>>> 0x612110, lenRcvBuf -1 >>>>>> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf >>>>>> 0x615040, lenRcvBuf -1 >>>>>> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf >>>>>> 0x6218f0, lenRcvBuf -1 >>>>>> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf >>>>>> 0x62a190, lenRcvBuf -1 >>>>>> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf >>>>>> 0x634fe0, lenRcvBuf -1 >>>>>> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf >>>>>> 0x63b000, lenRcvBuf -1 >>>>>> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf >>>>>> 0x643c70, lenRcvBuf -1 >>>>>> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf >>>>>> 0x64dbb0, lenRcvBuf -1 >>>>>> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf >>>>>> 0x650960, lenRcvBuf -1 >>>>>> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf >>>>>> 0x65dbc0, lenRcvBuf -1 >>>>>> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf >>>>>> 0x666020, lenRcvBuf -1 >>>>>> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf >>>>>> 0x66e480, lenRcvBuf -1 >>>>>> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf >>>>>> 0x676d20, lenRcvBuf 78 >>>>>> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf >>>>>> 0x67fdd0, lenRcvBuf -1 >>>>>> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf >>>>>> 0x687f10, lenRcvBuf -1 >>>>>> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf >>>>>> 0x6907b0, lenRcvBuf -1 >>>>>> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf >>>>>> 0x699050, lenRcvBuf -1 >>>>>> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: >>>>>> >> pszRcvBuf (nil), >> >>>>>> lenRcvBuf 0 >>>>>> >>>>>> This repeats enough to generate around 29 million lines >>>>>> >> in the debug log >> >>>>>> file. >>>>>> >>>>>> In the client log, there's plenty of these which seems >>>>>> >> somewhat normal: >> >>>>>> 6509.743164000:imuxsock.c: --------imuxsock calling >>>>>> >> select, active file >> >>>>>> descriptors (max 11): 11 >>>>>> 6509.743231000:imuxsock.c: Message from UNIX socket: #11 >>>>>> 6509.743262000:imuxsock.c: logmsg: flags 4, from >>>>>> >> 'vanguard', msg May 20 >> >>>>>> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum >>>>>> 6509.743281000:imuxsock.c: Message has legacy syslog format. >>>>>> 6509.743302000:imuxsock.c: main queue: entry added, size >>>>>> >> now 579 entries >> >>>>>> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy >>>>>> 6509.743343000:imuxsock.c: main queue: EnqueueMsg >>>>>> >> advised worker start >> >>>>>> However comparing it to an earlier point in the log >>>>>> >> there's none of the >> >>>>>> associated tcp queue entries like this: >>>>>> >>>>>> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, >>>>>> >> requested 107 >> >>>>>> 3587.234204000:main queue:Reg/w0: main queue: entering >>>>>> >> rate limiter >> >>>>>> 3587.234229000:main queue:Reg/w0: main queue: entry >>>>>> >> deleted, state 0, >> >>>>>> size now 0 entries >>>>>> 3587.234255000:main queue:Reg/w0: Called action, logging >>>>>> >> to builtin-file >> >>>>>> 3587.234285000:main queue:Reg/w0: (/var/log/maillog) >>>>>> 3587.234314000:main queue:Reg/w0: Called action, logging >>>>>> >> to builtin-fwd >> >>>>>> 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com >>>>>> 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp >>>>>> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78 >>>>>> 3587.234446000:main queue:Reg/w0: main queue: entering >>>>>> >> rate limiter >> >>>>>> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE, >>>>>> waiting for work. >>>>>> >>>>>> Anyone able to shed some light on this situation? I've >>>>>> >> got the full >> >>>>>> debug logs if it's useful to anyone to take a closer look. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> - Nathan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >> -------------------------------------------------------------- >> -------------- >> >>>> >>>> >>>>> -------- >>>>> >>>>> >>>>>> Server config file >>>>>> >>>>>> >>>>>> >> -------------------------------------------------------------- >> -------------- >> >>>> >>>> >>>>> -------- >>>>> >>>>> >>>>>> $ModLoad immark.so # provides --MARK-- message capability >>>>>> $ModLoad imuxsock.so # provides support for local system >>>>>> >> logging (e.g. >> >>>>>> via logger command) >>>>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd) >>>>>> >>>>>> $umask 0137 >>>>>> $DirCreateMode 0640 >>>>>> $FileCreateMode 0640 >>>>>> $FileOwner root >>>>>> $FileGroup admin >>>>>> >>>>>> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages" >>>>>> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron" >>>>>> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure" >>>>>> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog" >>>>>> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler" >>>>>> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot" >>>>>> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog" >>>>>> >>>>>> :msg, contains, "no keys found for" ~ >>>>>> :msg, contains, "session opened for user" ~ >>>>>> >>>>>> # Log all kernel messages to the console. >>>>>> # Logging much else clutters up the screen. >>>>>> kern.* >>>>>> >> /dev/console >> >>>>>> # Log anything (except mail) of level info or higher. >>>>>> # Don't log private authentication messages! >>>>>> *.info;mail.none;cron.none ?DynFile >>>>>> >>>>>> *.warn;\ >>>>>> authpriv.none;cron.none;mail.none;news.none >>>>>> >> ?DynFileSyslog >> >>>>>> # Log all the mail messages in one place. >>>>>> mail.* >>>>>> >> ?DynFileMail >> >>>>>> # Log cron stuff >>>>>> cron.* >>>>>> >> ?DynFileCron >> >>>>>> # Everybody gets emergency messages >>>>>> *.emerg * >>>>>> >>>>>> # Save news errors of level crit and higher in a special file >>>>>> uucp,news.crit >>>>>> >> ?DynFileSpool >> >>>>>> # Save boot messages also to boot.log >>>>>> local7.* >>>>>> >> ?DynFileBoot >> >>>>>> # Relp config for PCI >>>>>> $ModLoad imrelp >>>>>> $InputRELPServerRun 2515 >>>>>> >>>>>> # make gtls driver the default >>>>>> $DefaultNetstreamDriver gtls >>>>>> >>>>>> # certificate files >>>>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem >>>>>> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem >>>>>> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem >>>>>> >>>>>> $ModLoad imtcp.so >>>>>> >>>>>> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode >>>>>> $InputTCPServerStreamDriverAuthMode anon # client is NOT >>>>>> >> authenticated >> >>>>>> $InputTCPServerRun 10514 # start up listener at port 10514 >>>>>> $InputTCPMaxSessions 500 >>>>>> >>>>>> >>>>>> >>>>>> >> -------------------------------------------------------------- >> -------------- >> >>>> >>>> >>>>> -------- >>>>> >>>>> >>>>>> Client config file >>>>>> >>>>>> >>>>>> >> -------------------------------------------------------------- >> -------------- >> >>>> >>>> >>>>> -------- >>>>> >>>>> >>>>>> $ModLoad imuxsock.so # provides support for local system >>>>>> >> logging (e.g. >> >>>>>> via logger command) >>>>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd) >>>>>> >>>>>> $umask 0137 >>>>>> $DirCreateMode 0640 >>>>>> $FileCreateMode 0640 >>>>>> $FileOwner root >>>>>> $FileGroup admin >>>>>> >>>>>> >>>>>> # Log anything 'info' or higher, but lower than 'warn'. >>>>>> # Exclude authpriv, cron, mail, and news. These are >>>>>> >> logged elsewhere. >> >>>>>> *.info;*.!warn;\ >>>>>> authpriv.none;cron.none;mail.none;news.none >>>>>> >> /var/log/messages >> >>>>>> # Log anything 'warn' or higher. >>>>>> # Exclude authpriv, cron, mail, and news. These are >>>>>> >> logged elsewhere. >> >>>>>> *.warn;\ >>>>>> authpriv.none;cron.none;mail.none;news.none >>>>>> >> /var/log/syslog >> >>>>>> # Debugging information is logged here. >>>>>> *.=debug >>>>>> >> /var/log/debug >> >>>>>> # Private authentication message logging: >>>>>> authpriv.* >>>>>> >> /var/log/secure >> >>>>>> # Cron related logs: >>>>>> cron.* >>>>>> >> /var/log/cron >> >>>>>> # Mail related logs: >>>>>> mail.* >>>>>> >> /var/log/maillog >> >>>>>> # Emergency level messages go to all users: >>>>>> *.emerg * >>>>>> >>>>>> >>>>>> ## Uncomment these lines to use RELP instead for PCI >>>>>> >> compliance (stunnel >> >>>>>> required) >>>>>> #$ModLoad omrelp >>>>>> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat >>>>>> >>>>>> # certificate files - just CA for a client >>>>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem >>>>>> >>>>>> # set up the action >>>>>> $DefaultNetstreamDriver gtls # use gtls netstream driver >>>>>> $ActionSendStreamDriverMode 1 # require TLS for the connection >>>>>> $ActionSendStreamDriverAuthMode anon # server is NOT >>>>>> >> authenticated >> >>>>>> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages >>>>>> >>>>>> _______________________________________________ >>>>>> rsyslog mailing list >>>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>>> http://www.rsyslog.comoften >>>>>> >>>>>> >>>>> -- >>>>> Patrick Shen >>>>> Operations Engineer >>>>> >>>>> net mobile AG Shanghai office >>>>> >>>>> T: +86 21 61350900 - 222 >>>>> F: +86 21 61350906 >>>>> M: +86 13122245730 >>>>> >>>>> FE55 6A7F 3192 F359 C1DF 12F6 9078 58B4 57AF 0BE3 >>>>> >>>>> _______________________________________________ >>>>> rsyslog mailing list >>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>>> http://www.rsyslog.com >>>>> >>>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.com >>>> >>>> >>> >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com >> >> > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

