Hi Nathan,

In my company, we have the similar symptom. We have 50+ servers which
send logs to the central logging servers. Both clients and server use
rsyslog v3.20.5 right now. In the past when we use v3.20.0, the symptom
is more common when server is under high load average.

It's also weird for us, not all of clients were lost of connections. We
have 2-3 clients which is running the same application will looks like
hanging up when server is high-load. New ssh connection is very slow at
that time. But the connected ssh session works normal (We have some
prepared screen session on other clients which connect to them via ssh).

When the clients are lost of connections. So far our solution is to
restart rsyslog both on affected clients and server. Then the clients
get back and look normal.

We've suffered it for a while, but I didn't take chance to analyze it
like you.

Best regards,
Patrick

Nathan March wrote:
> Having a weird issue here..... Testing out a new rsyslog deployment and 
> I've got around 30 servers logging to one machine. On one of the clients 
> we had an issue where ssh was mysteriously very slow to login and 
> tracked it down to rsyslog causing issues. Simultaneously, one of the 
> threads on the logging server spiked up to 100%. New log data from the 
> client would stop showing up on the server at this poitn.
> 
> On the client after attempting to gracefully stop rsyslog:
> 
> vanguard etc # ps aux | grep -i rsyslog
> root      3764  0.0  0.1  5092 2564 ?        S    15:06   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf
> root      3766  0.0  0.1  5092 2564 ?        S    15:06   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf
> root      6203  0.0  0.0  1512  524 pts/3    S    15:09   0:00 grep -i 
> rsyslog
> vanguard etc # strace -p 3764
> Process 3764 attached - interrupt to quit
> send(11, "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"..., 
> 261, 0 <unfinished ...>
> Process 3764 detached
> 
> FD 11 being the socket to the logging server, it never does anything 
> asides from that send.
> 
> On the server:
> 
> ldap ~ # ps aux | grep rsyslog
> root      7370  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7379  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7380  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7381  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7382  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7383  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7384  0.0  0.0  83936  3380 ?        S    14:54   0:00 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> root      7385 28.3  0.0  83936  3380 ?        R    14:54   5:07 
> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog/rsyslog.conf
> 
> Stracing the 7385 pid just shows this repeating over and over (FD 4 
> being the network socket again):
> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource 
> temporarily unavailable)
> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource 
> temporarily unavailable)
> 
> It happened 3 times within a relatively short period of time, sometimes 
> within minutes of me restarting rsyslog on both machines. I enabled 
> debugging and it took around 8 hours to pop up again.
> 
> With debugging, I pulled this from the rsyslog server:
> 
> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0.
> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), 
> lenRcvBuf 0
> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf 
> 0x5c11c0, lenRcvBuf -1
> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf 
> 0x5cb510, lenRcvBuf -1
> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf 
> 0x5d45c0, lenRcvBuf -1
> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf 
> 0x5dd010, lenRcvBuf -1
> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf 
> 0x5e58b0, lenRcvBuf -1
> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf 
> 0x5ee150, lenRcvBuf -1
> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf 
> 0x5f7320, lenRcvBuf -1
> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf 
> 0x5fe0a0, lenRcvBuf -1
> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf 
> 0x608350, lenRcvBuf -1
> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf 
> 0x612110, lenRcvBuf -1
> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf 
> 0x615040, lenRcvBuf -1
> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf 
> 0x6218f0, lenRcvBuf -1
> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf 
> 0x62a190, lenRcvBuf -1
> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf 
> 0x634fe0, lenRcvBuf -1
> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf 
> 0x63b000, lenRcvBuf -1
> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf 
> 0x643c70, lenRcvBuf -1
> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf 
> 0x64dbb0, lenRcvBuf -1
> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf 
> 0x650960, lenRcvBuf -1
> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf 
> 0x65dbc0, lenRcvBuf -1
> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf 
> 0x666020, lenRcvBuf -1
> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf 
> 0x66e480, lenRcvBuf -1
> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf 
> 0x676d20, lenRcvBuf 78
> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf 
> 0x67fdd0, lenRcvBuf -1
> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf 
> 0x687f10, lenRcvBuf -1
> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf 
> 0x6907b0, lenRcvBuf -1
> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf 
> 0x699050, lenRcvBuf -1
> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil), 
> lenRcvBuf 0
> 
> This repeats enough to generate around 29 million lines in the debug log 
> file.
> 
> In the client log, there's plenty of these which seems somewhat normal:
> 
> 6509.743164000:imuxsock.c: --------imuxsock calling select, active file 
> descriptors (max 11): 11
> 6509.743231000:imuxsock.c: Message from UNIX socket: #11
> 6509.743262000:imuxsock.c: logmsg: flags 4, from 'vanguard', msg May 20 
> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum
> 6509.743281000:imuxsock.c: Message has legacy syslog format.
> 6509.743302000:imuxsock.c: main queue: entry added, size now 579 entries
> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy
> 6509.743343000:imuxsock.c: main queue: EnqueueMsg advised worker start
> 
> However comparing it to an earlier point in the log  there's none of the 
> associated tcp queue entries like this:
> 
> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, requested 107
> 3587.234204000:main queue:Reg/w0: main queue: entering rate limiter
> 3587.234229000:main queue:Reg/w0: main queue: entry deleted, state 0, 
> size now 0 entries
> 3587.234255000:main queue:Reg/w0: Called action, logging to builtin-file
> 3587.234285000:main queue:Reg/w0:  (/var/log/maillog)
> 3587.234314000:main queue:Reg/w0: Called action, logging to builtin-fwd
> 3587.234340000:main queue:Reg/w0:  ldap.nmsrv.com
> 3587.234359000:main queue:Reg/w0:  ldap.nmsrv.com:10514/tcp
> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78
> 3587.234446000:main queue:Reg/w0: main queue: entering rate limiter
> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE, 
> waiting for work.
> 
> Anyone able to shed some light on this situation? I've got the full 
> debug logs if it's useful to anyone to take a closer look.
> 
> Thanks!
> 
> - Nathan
> 
> 
> ------------------------------------------------------------------------------------
> Server config file
> ------------------------------------------------------------------------------------
> 
> $ModLoad immark.so # provides --MARK-- message capability
> $ModLoad imuxsock.so # provides support for local system logging (e.g. 
> via logger command)
> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
> 
> $umask 0137
> $DirCreateMode 0640
> $FileCreateMode 0640
> $FileOwner root
> $FileGroup admin
> 
> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages"
> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron"
> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure"
> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog"
> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler"
> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot"
> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog"
> 
> :msg, contains, "no keys found for"            ~
> :msg, contains, "session opened for user"        ~
> 
> # Log all kernel messages to the console.
> # Logging much else clutters up the screen.
> kern.*                                                 /dev/console
> 
> # Log anything (except mail) of level info or higher.
> # Don't log private authentication messages!
> *.info;mail.none;cron.none                     ?DynFile
> 
> *.warn;\
>         authpriv.none;cron.none;mail.none;news.none     ?DynFileSyslog
> 
> # Log all the mail messages in one place.
> mail.*                                                  ?DynFileMail
> 
> # Log cron stuff
> cron.*                                                  ?DynFileCron
> 
> # Everybody gets emergency messages
> *.emerg                                                 *
> 
> # Save news errors of level crit and higher in a special file
> uucp,news.crit                                          ?DynFileSpool
> 
> # Save boot messages also to boot.log
> local7.*                                                ?DynFileBoot
> 
> # Relp config for PCI
> $ModLoad imrelp
> $InputRELPServerRun 2515
> 
> # make gtls driver the default
> $DefaultNetstreamDriver gtls
> 
> # certificate files
> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem
> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem
> 
> $ModLoad imtcp.so
> 
> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode
> $InputTCPServerStreamDriverAuthMode anon # client is NOT authenticated
> $InputTCPServerRun 10514 # start up listener at port 10514
> $InputTCPMaxSessions 500
> 
> ------------------------------------------------------------------------------------
> Client config file
> ------------------------------------------------------------------------------------
> $ModLoad imuxsock.so # provides support for local system logging (e.g. 
> via logger command)
> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
> 
> $umask 0137
> $DirCreateMode 0640
> $FileCreateMode 0640
> $FileOwner root
> $FileGroup admin
> 
> 
> # Log anything 'info' or higher, but lower than 'warn'.
> # Exclude authpriv, cron, mail, and news.  These are logged elsewhere.
> *.info;*.!warn;\
>         authpriv.none;cron.none;mail.none;news.none     /var/log/messages
> 
> # Log anything 'warn' or higher.
> # Exclude authpriv, cron, mail, and news.  These are logged elsewhere.
> *.warn;\
>         authpriv.none;cron.none;mail.none;news.none     /var/log/syslog
> 
> # Debugging information is logged here.
> *.=debug                                                /var/log/debug
> 
> # Private authentication message logging:
> authpriv.*                                              /var/log/secure
> 
> # Cron related logs:
> cron.*                                                  /var/log/cron
> 
> # Mail related logs:
> mail.*                                                  /var/log/maillog
> 
> # Emergency level messages go to all users:
> *.emerg                                                 *
> 
> 
> ## Uncomment these lines to use RELP instead for PCI compliance (stunnel 
> required)
> #$ModLoad omrelp
> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat
> 
> # certificate files - just CA for a client
> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
> 
> # set up the action
> $DefaultNetstreamDriver gtls # use gtls netstream driver
> $ActionSendStreamDriverMode 1 # require TLS for the connection
> $ActionSendStreamDriverAuthMode anon # server is NOT authenticated
> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages
> 
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.comoften


-- 
Patrick Shen
Operations Engineer

net mobile AG Shanghai office

T: +86  21 61350900 - 222
F: +86  21 61350906
M: +86  13122245730

FE55 6A7F 3192 F359 C1DF  12F6 9078 58B4 57AF 0BE3

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to