Hi Nathan, In my company, we have the similar symptom. We have 50+ servers which send logs to the central logging servers. Both clients and server use rsyslog v3.20.5 right now. In the past when we use v3.20.0, the symptom is more common when server is under high load average.
It's also weird for us, not all of clients were lost of connections. We have 2-3 clients which is running the same application will looks like hanging up when server is high-load. New ssh connection is very slow at that time. But the connected ssh session works normal (We have some prepared screen session on other clients which connect to them via ssh). When the clients are lost of connections. So far our solution is to restart rsyslog both on affected clients and server. Then the clients get back and look normal. We've suffered it for a while, but I didn't take chance to analyze it like you. Best regards, Patrick Nathan March wrote: > Having a weird issue here..... Testing out a new rsyslog deployment and > I've got around 30 servers logging to one machine. On one of the clients > we had an issue where ssh was mysteriously very slow to login and > tracked it down to rsyslog causing issues. Simultaneously, one of the > threads on the logging server spiked up to 100%. New log data from the > client would stop showing up on the server at this poitn. > > On the client after attempting to gracefully stop rsyslog: > > vanguard etc # ps aux | grep -i rsyslog > root 3764 0.0 0.1 5092 2564 ? S 15:06 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf > root 3766 0.0 0.1 5092 2564 ? S 15:06 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf > root 6203 0.0 0.0 1512 524 pts/3 S 15:09 0:00 grep -i > rsyslog > vanguard etc # strace -p 3764 > Process 3764 attached - interrupt to quit > send(11, "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"..., > 261, 0 <unfinished ...> > Process 3764 detached > > FD 11 being the socket to the logging server, it never does anything > asides from that send. > > On the server: > > ldap ~ # ps aux | grep rsyslog > root 7370 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7379 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7380 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7381 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7382 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7383 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7384 0.0 0.0 83936 3380 ? S 14:54 0:00 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > root 7385 28.3 0.0 83936 3380 ? R 14:54 5:07 > /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf > > Stracing the 7385 pid just shows this repeating over and over (FD 4 > being the network socket again): > accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource > temporarily unavailable) > accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource > temporarily unavailable) > > It happened 3 times within a relatively short period of time, sometimes > within minutes of me restarting rsyslog on both machines. I enabled > debugging and it took around 8 hours to pop up again. > > With debugging, I pulled this from the rsyslog server: > > 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0. > 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), > lenRcvBuf 0 > 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf > 0x5c11c0, lenRcvBuf -1 > 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf > 0x5cb510, lenRcvBuf -1 > 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf > 0x5d45c0, lenRcvBuf -1 > 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf > 0x5dd010, lenRcvBuf -1 > 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf > 0x5e58b0, lenRcvBuf -1 > 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf > 0x5ee150, lenRcvBuf -1 > 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf > 0x5f7320, lenRcvBuf -1 > 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf > 0x5fe0a0, lenRcvBuf -1 > 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf > 0x608350, lenRcvBuf -1 > 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf > 0x612110, lenRcvBuf -1 > 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf > 0x615040, lenRcvBuf -1 > 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf > 0x6218f0, lenRcvBuf -1 > 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf > 0x62a190, lenRcvBuf -1 > 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf > 0x634fe0, lenRcvBuf -1 > 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf > 0x63b000, lenRcvBuf -1 > 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf > 0x643c70, lenRcvBuf -1 > 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf > 0x64dbb0, lenRcvBuf -1 > 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf > 0x650960, lenRcvBuf -1 > 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf > 0x65dbc0, lenRcvBuf -1 > 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf > 0x666020, lenRcvBuf -1 > 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf > 0x66e480, lenRcvBuf -1 > 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf > 0x676d20, lenRcvBuf 78 > 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf > 0x67fdd0, lenRcvBuf -1 > 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf > 0x687f10, lenRcvBuf -1 > 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf > 0x6907b0, lenRcvBuf -1 > 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf > 0x699050, lenRcvBuf -1 > 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), > lenRcvBuf 0 > > This repeats enough to generate around 29 million lines in the debug log > file. > > In the client log, there's plenty of these which seems somewhat normal: > > 6509.743164000:imuxsock.c: --------imuxsock calling select, active file > descriptors (max 11): 11 > 6509.743231000:imuxsock.c: Message from UNIX socket: #11 > 6509.743262000:imuxsock.c: logmsg: flags 4, from 'vanguard', msg May 20 > 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum > 6509.743281000:imuxsock.c: Message has legacy syslog format. > 6509.743302000:imuxsock.c: main queue: entry added, size now 579 entries > 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy > 6509.743343000:imuxsock.c: main queue: EnqueueMsg advised worker start > > However comparing it to an earlier point in the log there's none of the > associated tcp queue entries like this: > > 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, requested 107 > 3587.234204000:main queue:Reg/w0: main queue: entering rate limiter > 3587.234229000:main queue:Reg/w0: main queue: entry deleted, state 0, > size now 0 entries > 3587.234255000:main queue:Reg/w0: Called action, logging to builtin-file > 3587.234285000:main queue:Reg/w0: (/var/log/maillog) > 3587.234314000:main queue:Reg/w0: Called action, logging to builtin-fwd > 3587.234340000:main queue:Reg/w0: ldap.nmsrv.com > 3587.234359000:main queue:Reg/w0: ldap.nmsrv.com:10514/tcp > 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78 > 3587.234446000:main queue:Reg/w0: main queue: entering rate limiter > 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE, > waiting for work. > > Anyone able to shed some light on this situation? I've got the full > debug logs if it's useful to anyone to take a closer look. > > Thanks! > > - Nathan > > > ------------------------------------------------------------------------------------ > Server config file > ------------------------------------------------------------------------------------ > > $ModLoad immark.so # provides --MARK-- message capability > $ModLoad imuxsock.so # provides support for local system logging (e.g. > via logger command) > $ModLoad imklog.so # kernel logging (formerly provided by rklogd) > > $umask 0137 > $DirCreateMode 0640 > $FileCreateMode 0640 > $FileOwner root > $FileGroup admin > > $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages" > $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron" > $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure" > $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog" > $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler" > $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot" > $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog" > > :msg, contains, "no keys found for" ~ > :msg, contains, "session opened for user" ~ > > # Log all kernel messages to the console. > # Logging much else clutters up the screen. > kern.* /dev/console > > # Log anything (except mail) of level info or higher. > # Don't log private authentication messages! > *.info;mail.none;cron.none ?DynFile > > *.warn;\ > authpriv.none;cron.none;mail.none;news.none ?DynFileSyslog > > # Log all the mail messages in one place. > mail.* ?DynFileMail > > # Log cron stuff > cron.* ?DynFileCron > > # Everybody gets emergency messages > *.emerg * > > # Save news errors of level crit and higher in a special file > uucp,news.crit ?DynFileSpool > > # Save boot messages also to boot.log > local7.* ?DynFileBoot > > # Relp config for PCI > $ModLoad imrelp > $InputRELPServerRun 2515 > > # make gtls driver the default > $DefaultNetstreamDriver gtls > > # certificate files > $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem > $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem > $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem > > $ModLoad imtcp.so > > $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode > $InputTCPServerStreamDriverAuthMode anon # client is NOT authenticated > $InputTCPServerRun 10514 # start up listener at port 10514 > $InputTCPMaxSessions 500 > > ------------------------------------------------------------------------------------ > Client config file > ------------------------------------------------------------------------------------ > $ModLoad imuxsock.so # provides support for local system logging (e.g. > via logger command) > $ModLoad imklog.so # kernel logging (formerly provided by rklogd) > > $umask 0137 > $DirCreateMode 0640 > $FileCreateMode 0640 > $FileOwner root > $FileGroup admin > > > # Log anything 'info' or higher, but lower than 'warn'. > # Exclude authpriv, cron, mail, and news. These are logged elsewhere. > *.info;*.!warn;\ > authpriv.none;cron.none;mail.none;news.none /var/log/messages > > # Log anything 'warn' or higher. > # Exclude authpriv, cron, mail, and news. These are logged elsewhere. > *.warn;\ > authpriv.none;cron.none;mail.none;news.none /var/log/syslog > > # Debugging information is logged here. > *.=debug /var/log/debug > > # Private authentication message logging: > authpriv.* /var/log/secure > > # Cron related logs: > cron.* /var/log/cron > > # Mail related logs: > mail.* /var/log/maillog > > # Emergency level messages go to all users: > *.emerg * > > > ## Uncomment these lines to use RELP instead for PCI compliance (stunnel > required) > #$ModLoad omrelp > #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat > > # certificate files - just CA for a client > $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem > > # set up the action > $DefaultNetstreamDriver gtls # use gtls netstream driver > $ActionSendStreamDriverMode 1 # require TLS for the connection > $ActionSendStreamDriverAuthMode anon # server is NOT authenticated > *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.comoften -- Patrick Shen Operations Engineer net mobile AG Shanghai office T: +86 21 61350900 - 222 F: +86 21 61350906 M: +86 13122245730 FE55 6A7F 3192 F359 C1DF 12F6 9078 58B4 57AF 0BE3 _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

