one thing that looks like it's missing from this report is what version of rsyslog you are running.
David Lang On Thu, 21 May 2009, Nathan March wrote: > Date: Thu, 21 May 2009 11:12:45 -0700 > From: Nathan March <[email protected]> > Reply-To: rsyslog-users <[email protected]> > To: [email protected] > Subject: [rsyslog] Possible bug? 100% cpu on a server thread and the client > stops sending data > > Having a weird issue here..... Testing out a new rsyslog deployment and > I've got around 30 servers logging to one machine. On one of the clients > we had an issue where ssh was mysteriously very slow to login and > tracked it down to rsyslog causing issues. Simultaneously, one of the > threads on the logging server spiked up to 100%. New log data from the > client would stop showing up on the server at this poitn. > > On the client after attempting to gracefully stop rsyslog: > > vanguard etc # ps aux | grep -i rsyslog > root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf > root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf > root 6203 0.0 0.0 1512 524 pts/3 S 15:09 0:00 grep -i > rsyslog > vanguard etc # strace -p 3764 > Process 3764 attached - interrupt to quit > send(11, "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"..., > 261, 0 <unfinished ...> > Process 3764 detached > > FD 11 being the socket to the logging server, it never does anything > asides from that send. > > On the server: > > ldap ~ # ps aux | grep rsyslog > root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > > Stracing the 7385 pid just shows this repeating over and over (FD 4 > being the network socket again): > accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource > temporarily unavailable) > accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource > temporarily unavailable) > > It happened 3 times within a relatively short period of time, sometimes > within minutes of me restarting rsyslog on both machines. I enabled > debugging and it took around 8 hours to pop up again. > > With debugging, I pulled this from the rsyslog server: > > 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0. > 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), > lenRcvBuf 0 > 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf > 0x5c11c0, lenRcvBuf -1 > 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf > 0x5cb510, lenRcvBuf -1 > 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf > 0x5d45c0, lenRcvBuf -1 > 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf > 0x5dd010, lenRcvBuf -1 > 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf > 0x5e58b0, lenRcvBuf -1 > 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf > 0x5ee150, lenRcvBuf -1 > 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf > 0x5f7320, lenRcvBuf -1 > 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf > 0x5fe0a0, lenRcvBuf -1 > 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf > 0x608350, lenRcvBuf -1 > 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf > 0x612110, lenRcvBuf -1 > 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf > 0x615040, lenRcvBuf -1 > 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf > 0x6218f0, lenRcvBuf -1 > 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf > 0x62a190, lenRcvBuf -1 > 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf > 0x634fe0, lenRcvBuf -1 > 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf > 0x63b000, lenRcvBuf -1 > 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf > 0x643c70, lenRcvBuf -1 > 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf > 0x64dbb0, lenRcvBuf -1 > 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf > 0x650960, lenRcvBuf -1 > 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf > 0x65dbc0, lenRcvBuf -1 > 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf > 0x666020, lenRcvBuf -1 > 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf > 0x66e480, lenRcvBuf -1 > 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf > 0x676d20, lenRcvBuf 78 > 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf > 0x67fdd0, lenRcvBuf -1 > 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf > 0x687f10, lenRcvBuf -1 > 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf > 0x6907b0, lenRcvBuf -1 > 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf > 0x699050, lenRcvBuf -1 > 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), > lenRcvBuf 0 > > This repeats enough to generate around 29 million lines in the debug log > file. > > In the client log, there's plenty of these which seems somewhat normal: > > 6509.743164000:imuxsock.c: --------imuxsock calling select, active file > descriptors (max 11): 11 > 6509.743231000:imuxsock.c: Message from UNIX socket: #11 > 6509.743262000:imuxsock.c: logmsg: flags 4, from 'vanguard', msg May 20 > 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum > 6509.743281000:imuxsock.c: Message has legacy syslog format. > 6509.743302000:imuxsock.c: main queue: entry added, size now 579 entries > 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy > 6509.743343000:imuxsock.c: main queue: EnqueueMsg advised worker start > > However comparing it to an earlier point in the log there's none of the > associated tcp queue entries like this: > > 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, requested 107 > 3587.234204000:main queue:Reg/w0: main queue: entering rate limiter > 3587.234229000:main queue:Reg/w0: main queue: entry deleted, state 0, > size now 0 entries > 3587.234255000:main queue:Reg/w0: Called action, logging to builtin-file > 3587.234285000:main queue:Reg/w0: (/var/log/maillog) > 3587.234314000:main queue:Reg/w0: Called action, logging to builtin-fwd > 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com > 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp > 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78 > 3587.234446000:main queue:Reg/w0: main queue: entering rate limiter > 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE, > waiting for work. > > Anyone able to shed some light on this situation? I've got the full > debug logs if it's useful to anyone to take a closer look. > > Thanks! > > - Nathan > > > ------------------------------------------------------------------------------------ > Server config file > ------------------------------------------------------------------------------------ > > $ModLoad immark.so # provides --MARK-- message capability > $ModLoad imuxsock.so # provides support for local system logging (e.g. > via logger command) > $ModLoad imklog.so # kernel logging (formerly provided by rklogd) > > $umask 0137 > $DirCreateMode 0640 > $FileCreateMode 0640 > $FileOwner root > $FileGroup admin > > $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages" > $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron" > $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure" > $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog" > $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler" > $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot" > $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog" > > :msg, contains, "no keys found for" ~ > :msg, contains, "session opened for user" ~ > > # Log all kernel messages to the console. > # Logging much else clutters up the screen. > kern.* /dev/console > > # Log anything (except mail) of level info or higher. > # Don't log private authentication messages! > *.info;mail.none;cron.none ?DynFile > > *.warn;\ > authpriv.none;cron.none;mail.none;news.none ?DynFileSyslog > > # Log all the mail messages in one place. > mail.* ?DynFileMail > > # Log cron stuff > cron.* ?DynFileCron > > # Everybody gets emergency messages > *.emerg * > > # Save news errors of level crit and higher in a special file > uucp,news.crit ?DynFileSpool > > # Save boot messages also to boot.log > local7.* ?DynFileBoot > > # Relp config for PCI > $ModLoad imrelp > $InputRELPServerRun 2515 > > # make gtls driver the default > $DefaultNetstreamDriver gtls > > # certificate files > $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem > $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem > $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem > > $ModLoad imtcp.so > > $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode > $InputTCPServerStreamDriverAuthMode anon # client is NOT authenticated > $InputTCPServerRun 10514 # start up listener at port 10514 > $InputTCPMaxSessions 500 > > ------------------------------------------------------------------------------------ > Client config file > ------------------------------------------------------------------------------------ > $ModLoad imuxsock.so # provides support for local system logging (e.g. > via logger command) > $ModLoad imklog.so # kernel logging (formerly provided by rklogd) > > $umask 0137 > $DirCreateMode 0640 > $FileCreateMode 0640 > $FileOwner root > $FileGroup admin > > > # Log anything 'info' or higher, but lower than 'warn'. > # Exclude authpriv, cron, mail, and news. These are logged elsewhere. > *.info;*.!warn;\ > authpriv.none;cron.none;mail.none;news.none /var/log/messages > > # Log anything 'warn' or higher. > # Exclude authpriv, cron, mail, and news. These are logged elsewhere. > *.warn;\ > authpriv.none;cron.none;mail.none;news.none /var/log/syslog > > # Debugging information is logged here. > *.=debug /var/log/debug > > # Private authentication message logging: > authpriv.* /var/log/secure > > # Cron related logs: > cron.* /var/log/cron > > # Mail related logs: > mail.* /var/log/maillog > > # Emergency level messages go to all users: > *.emerg * > > > ## Uncomment these lines to use RELP instead for PCI compliance (stunnel > required) > #$ModLoad omrelp > #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat > > # certificate files - just CA for a client > $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem > > # set up the action > $DefaultNetstreamDriver gtls # use gtls netstream driver > $ActionSendStreamDriverMode 1 # require TLS for the connection > $ActionSendStreamDriverAuthMode anon # server is NOT authenticated > *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

