Any hope of this bug getting some attention soon? =) Thanks, Nathan
Patrick Shen wrote: > Hi Rainer, > > Thanks for the quick reply. > > My environment is like below: > > CLIENT --- > | > | (TCP) > --------> SERVER ---- > | > | (RELP Relay) > --------------> SERVER Standby > > I've tested, even if the standby server is under high load, one of our > clients won't accept any new network connections. > > Just for your information. > > Best regards, > Patrick > > Rainer Gerhards wrote: > >> Hi all, >> >> thanks for the detailed info. Will look at it asap, but I am currently tied >> up with some other work... >> >> Please let me know anything else that may be relevant. >> >> Thanks, >> Rainer >> >> >>> -----Original Message----- >>> From: [email protected] [mailto:rsyslog- >>> [email protected]] On Behalf Of Patrick Shen >>> Sent: Monday, May 25, 2009 8:17 AM >>> To: rsyslog-users >>> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server thread and the >>> client stops sending data >>> >>> Hi Nathan, >>> >>> In my company, we have the similar symptom. We have 50+ servers which >>> send logs to the central logging servers. Both clients and server use >>> rsyslog v3.20.5 right now. In the past when we use v3.20.0, the symptom >>> is more common when server is under high load average. >>> >>> It's also weird for us, not all of clients were lost of connections. We >>> have 2-3 clients which is running the same application will looks like >>> hanging up when server is high-load. New ssh connection is very slow at >>> that time. But the connected ssh session works normal (We have some >>> prepared screen session on other clients which connect to them via ssh). >>> >>> When the clients are lost of connections. So far our solution is to >>> restart rsyslog both on affected clients and server. Then the clients >>> get back and look normal. >>> >>> We've suffered it for a while, but I didn't take chance to analyze it >>> like you. >>> >>> Best regards, >>> Patrick >>> >>> Nathan March wrote: >>> >>>> Having a weird issue here..... Testing out a new rsyslog deployment and >>>> I've got around 30 servers logging to one machine. On one of the clients >>>> we had an issue where ssh was mysteriously very slow to login and >>>> tracked it down to rsyslog causing issues. Simultaneously, one of the >>>> threads on the logging server spiked up to 100%. New log data from the >>>> client would stop showing up on the server at this poitn. >>>> >>>> On the client after attempting to gracefully stop rsyslog: >>>> >>>> vanguard etc # ps aux | grep -i rsyslog >>>> root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf >>>> root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf >>>> root 6203 0.0 0.0 1512 524 pts/3 S 15:09 0:00 grep -i >>>> rsyslog >>>> vanguard etc # strace -p 3764 >>>> Process 3764 attached - interrupt to quit >>>> send(11, "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"..., >>>> 261, 0 <unfinished ...> >>>> Process 3764 detached >>>> >>>> FD 11 being the socket to the logging server, it never does anything >>>> asides from that send. >>>> >>>> On the server: >>>> >>>> ldap ~ # ps aux | grep rsyslog >>>> root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07 >>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f >>>> >> /etc/rsyslog/rsyslog.conf >> >>>> Stracing the 7385 pid just shows this repeating over and over (FD 4 >>>> being the network socket again): >>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource >>>> temporarily unavailable) >>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource >>>> temporarily unavailable) >>>> >>>> It happened 3 times within a relatively short period of time, sometimes >>>> within minutes of me restarting rsyslog on both machines. I enabled >>>> debugging and it took around 8 hours to pop up again. >>>> >>>> With debugging, I pulled this from the rsyslog server: >>>> >>>> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0. >>>> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), >>>> lenRcvBuf 0 >>>> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf >>>> 0x5c11c0, lenRcvBuf -1 >>>> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf >>>> 0x5cb510, lenRcvBuf -1 >>>> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf >>>> 0x5d45c0, lenRcvBuf -1 >>>> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf >>>> 0x5dd010, lenRcvBuf -1 >>>> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf >>>> 0x5e58b0, lenRcvBuf -1 >>>> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf >>>> 0x5ee150, lenRcvBuf -1 >>>> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf >>>> 0x5f7320, lenRcvBuf -1 >>>> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf >>>> 0x5fe0a0, lenRcvBuf -1 >>>> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf >>>> 0x608350, lenRcvBuf -1 >>>> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf >>>> 0x612110, lenRcvBuf -1 >>>> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf >>>> 0x615040, lenRcvBuf -1 >>>> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf >>>> 0x6218f0, lenRcvBuf -1 >>>> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf >>>> 0x62a190, lenRcvBuf -1 >>>> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf >>>> 0x634fe0, lenRcvBuf -1 >>>> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf >>>> 0x63b000, lenRcvBuf -1 >>>> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf >>>> 0x643c70, lenRcvBuf -1 >>>> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf >>>> 0x64dbb0, lenRcvBuf -1 >>>> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf >>>> 0x650960, lenRcvBuf -1 >>>> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf >>>> 0x65dbc0, lenRcvBuf -1 >>>> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf >>>> 0x666020, lenRcvBuf -1 >>>> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf >>>> 0x66e480, lenRcvBuf -1 >>>> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf >>>> 0x676d20, lenRcvBuf 78 >>>> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf >>>> 0x67fdd0, lenRcvBuf -1 >>>> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf >>>> 0x687f10, lenRcvBuf -1 >>>> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf >>>> 0x6907b0, lenRcvBuf -1 >>>> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf >>>> 0x699050, lenRcvBuf -1 >>>> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), >>>> lenRcvBuf 0 >>>> >>>> This repeats enough to generate around 29 million lines in the debug log >>>> file. >>>> >>>> In the client log, there's plenty of these which seems somewhat normal: >>>> >>>> 6509.743164000:imuxsock.c: --------imuxsock calling select, active file >>>> descriptors (max 11): 11 >>>> 6509.743231000:imuxsock.c: Message from UNIX socket: #11 >>>> 6509.743262000:imuxsock.c: logmsg: flags 4, from 'vanguard', msg May 20 >>>> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum >>>> 6509.743281000:imuxsock.c: Message has legacy syslog format. >>>> 6509.743302000:imuxsock.c: main queue: entry added, size now 579 entries >>>> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy >>>> 6509.743343000:imuxsock.c: main queue: EnqueueMsg advised worker start >>>> >>>> However comparing it to an earlier point in the log there's none of the >>>> associated tcp queue entries like this: >>>> >>>> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, requested 107 >>>> 3587.234204000:main queue:Reg/w0: main queue: entering rate limiter >>>> 3587.234229000:main queue:Reg/w0: main queue: entry deleted, state 0, >>>> size now 0 entries >>>> 3587.234255000:main queue:Reg/w0: Called action, logging to builtin-file >>>> 3587.234285000:main queue:Reg/w0: (/var/log/maillog) >>>> 3587.234314000:main queue:Reg/w0: Called action, logging to builtin-fwd >>>> 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com >>>> 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp >>>> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78 >>>> 3587.234446000:main queue:Reg/w0: main queue: entering rate limiter >>>> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE, >>>> waiting for work. >>>> >>>> Anyone able to shed some light on this situation? I've got the full >>>> debug logs if it's useful to anyone to take a closer look. >>>> >>>> Thanks! >>>> >>>> - Nathan >>>> >>>> >>>> >>>> >> ---------------------------------------------------------------------------- >> >>> -------- >>> >>>> Server config file >>>> >>>> >> ---------------------------------------------------------------------------- >> >>> -------- >>> >>>> $ModLoad immark.so # provides --MARK-- message capability >>>> $ModLoad imuxsock.so # provides support for local system logging (e.g. >>>> via logger command) >>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd) >>>> >>>> $umask 0137 >>>> $DirCreateMode 0640 >>>> $FileCreateMode 0640 >>>> $FileOwner root >>>> $FileGroup admin >>>> >>>> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages" >>>> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron" >>>> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure" >>>> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog" >>>> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler" >>>> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot" >>>> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog" >>>> >>>> :msg, contains, "no keys found for" ~ >>>> :msg, contains, "session opened for user" ~ >>>> >>>> # Log all kernel messages to the console. >>>> # Logging much else clutters up the screen. >>>> kern.* /dev/console >>>> >>>> # Log anything (except mail) of level info or higher. >>>> # Don't log private authentication messages! >>>> *.info;mail.none;cron.none ?DynFile >>>> >>>> *.warn;\ >>>> authpriv.none;cron.none;mail.none;news.none ?DynFileSyslog >>>> >>>> # Log all the mail messages in one place. >>>> mail.* ?DynFileMail >>>> >>>> # Log cron stuff >>>> cron.* ?DynFileCron >>>> >>>> # Everybody gets emergency messages >>>> *.emerg * >>>> >>>> # Save news errors of level crit and higher in a special file >>>> uucp,news.crit ?DynFileSpool >>>> >>>> # Save boot messages also to boot.log >>>> local7.* ?DynFileBoot >>>> >>>> # Relp config for PCI >>>> $ModLoad imrelp >>>> $InputRELPServerRun 2515 >>>> >>>> # make gtls driver the default >>>> $DefaultNetstreamDriver gtls >>>> >>>> # certificate files >>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem >>>> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem >>>> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem >>>> >>>> $ModLoad imtcp.so >>>> >>>> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode >>>> $InputTCPServerStreamDriverAuthMode anon # client is NOT authenticated >>>> $InputTCPServerRun 10514 # start up listener at port 10514 >>>> $InputTCPMaxSessions 500 >>>> >>>> >>>> >> ---------------------------------------------------------------------------- >> >>> -------- >>> >>>> Client config file >>>> >>>> >> ---------------------------------------------------------------------------- >> >>> -------- >>> >>>> $ModLoad imuxsock.so # provides support for local system logging (e.g. >>>> via logger command) >>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd) >>>> >>>> $umask 0137 >>>> $DirCreateMode 0640 >>>> $FileCreateMode 0640 >>>> $FileOwner root >>>> $FileGroup admin >>>> >>>> >>>> # Log anything 'info' or higher, but lower than 'warn'. >>>> # Exclude authpriv, cron, mail, and news. These are logged elsewhere. >>>> *.info;*.!warn;\ >>>> authpriv.none;cron.none;mail.none;news.none /var/log/messages >>>> >>>> # Log anything 'warn' or higher. >>>> # Exclude authpriv, cron, mail, and news. These are logged elsewhere. >>>> *.warn;\ >>>> authpriv.none;cron.none;mail.none;news.none /var/log/syslog >>>> >>>> # Debugging information is logged here. >>>> *.=debug /var/log/debug >>>> >>>> # Private authentication message logging: >>>> authpriv.* /var/log/secure >>>> >>>> # Cron related logs: >>>> cron.* /var/log/cron >>>> >>>> # Mail related logs: >>>> mail.* /var/log/maillog >>>> >>>> # Emergency level messages go to all users: >>>> *.emerg * >>>> >>>> >>>> ## Uncomment these lines to use RELP instead for PCI compliance (stunnel >>>> required) >>>> #$ModLoad omrelp >>>> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat >>>> >>>> # certificate files - just CA for a client >>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem >>>> >>>> # set up the action >>>> $DefaultNetstreamDriver gtls # use gtls netstream driver >>>> $ActionSendStreamDriverMode 1 # require TLS for the connection >>>> $ActionSendStreamDriverAuthMode anon # server is NOT authenticated >>>> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages >>>> >>>> _______________________________________________ >>>> rsyslog mailing list >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>>> http://www.rsyslog.comoften >>>> >>> -- >>> Patrick Shen >>> Operations Engineer >>> >>> net mobile AG Shanghai office >>> >>> T: +86 21 61350900 - 222 >>> F: +86 21 61350906 >>> M: +86 13122245730 >>> >>> FE55 6A7F 3192 F359 C1DF 12F6 9078 58B4 57AF 0BE3 >>> >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com >> > > > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

