Any hope of this bug getting some attention soon? =)

Thanks,
Nathan


Patrick Shen wrote:
> Hi Rainer,
>
> Thanks for the quick reply.
>
> My environment is like below:
>
> CLIENT ---
>          |
>          | (TCP)
>          --------> SERVER ----
>                              |
>                              | (RELP Relay)
>                              --------------> SERVER Standby
>
> I've tested, even if the standby server is under high load, one of our
> clients won't accept any new network connections.
>
> Just for your information.
>
> Best regards,
> Patrick
>
> Rainer Gerhards wrote:
>   
>> Hi all,
>>
>> thanks for the detailed info. Will look at it asap, but I am currently tied
>> up with some other work...
>>
>> Please let me know anything else that may be relevant.
>>
>> Thanks,
>> Rainer
>>
>>     
>>> -----Original Message-----
>>> From: [email protected] [mailto:rsyslog-
>>> [email protected]] On Behalf Of Patrick Shen
>>> Sent: Monday, May 25, 2009 8:17 AM
>>> To: rsyslog-users
>>> Subject: Re: [rsyslog] Possible bug? 100% cpu on a server thread and the
>>> client stops sending data
>>>
>>> Hi Nathan,
>>>
>>> In my company, we have the similar symptom. We have 50+ servers which
>>> send logs to the central logging servers. Both clients and server use
>>> rsyslog v3.20.5 right now. In the past when we use v3.20.0, the symptom
>>> is more common when server is under high load average.
>>>
>>> It's also weird for us, not all of clients were lost of connections. We
>>> have 2-3 clients which is running the same application will looks like
>>> hanging up when server is high-load. New ssh connection is very slow at
>>> that time. But the connected ssh session works normal (We have some
>>> prepared screen session on other clients which connect to them via ssh).
>>>
>>> When the clients are lost of connections. So far our solution is to
>>> restart rsyslog both on affected clients and server. Then the clients
>>> get back and look normal.
>>>
>>> We've suffered it for a while, but I didn't take chance to analyze it
>>> like you.
>>>
>>> Best regards,
>>> Patrick
>>>
>>> Nathan March wrote:
>>>       
>>>> Having a weird issue here..... Testing out a new rsyslog deployment and
>>>> I've got around 30 servers logging to one machine. On one of the clients
>>>> we had an issue where ssh was mysteriously very slow to login and
>>>> tracked it down to rsyslog causing issues. Simultaneously, one of the
>>>> threads on the logging server spiked up to 100%. New log data from the
>>>> client would stop showing up on the server at this poitn.
>>>>
>>>> On the client after attempting to gracefully stop rsyslog:
>>>>
>>>> vanguard etc # ps aux | grep -i rsyslog
>>>> root      3764  0.0  0.1  5092 2564 ?        S    15:06   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf
>>>> root      3766  0.0  0.1  5092 2564 ?        S    15:06   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f /etc/rsyslog.conf
>>>> root      6203  0.0  0.0  1512  524 pts/3    S    15:09   0:00 grep -i
>>>> rsyslog
>>>> vanguard etc # strace -p 3764
>>>> Process 3764 attached - interrupt to quit
>>>> send(11, "\25\3\2\1\0uE\254\304\n\311\377\204}p\237O\211\322\211"...,
>>>> 261, 0 <unfinished ...>
>>>> Process 3764 detached
>>>>
>>>> FD 11 being the socket to the logging server, it never does anything
>>>> asides from that send.
>>>>
>>>> On the server:
>>>>
>>>> ldap ~ # ps aux | grep rsyslog
>>>> root      7370  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7379  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7380  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7381  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7382  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7383  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7384  0.0  0.0  83936  3380 ?        S    14:54   0:00
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> root      7385 28.3  0.0  83936  3380 ?        R    14:54   5:07
>>>> /usr/sbin/rsyslogd -c3 -i /var/run/rsyslogd.pid -f
>>>>         
>> /etc/rsyslog/rsyslog.conf
>>     
>>>> Stracing the 7385 pid just shows this repeating over and over (FD 4
>>>> being the network socket again):
>>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource
>>>> temporarily unavailable)
>>>> accept(4, 0x2b0f67658ca0, [7450515777377009792]) = -1 EAGAIN (Resource
>>>> temporarily unavailable)
>>>>
>>>> It happened 3 times within a relatively short period of time, sometimes
>>>> within minutes of me restarting rsyslog on both machines. I enabled
>>>> debugging and it took around 8 hours to pop up again.
>>>>
>>>> With debugging, I pulled this from the rsyslog server:
>>>>
>>>> 6640.408108000:imtcp.c: New connect on NSD 0x59ddc0.
>>>> 6640.408129000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil),
>>>> lenRcvBuf 0
>>>> 6640.408144000:imtcp.c: hasRcvInBuffer on nsd 0x59dcd0: pszRcvBuf
>>>> 0x5c11c0, lenRcvBuf -1
>>>> 6640.408158000:imtcp.c: hasRcvInBuffer on nsd 0x59dc10: pszRcvBuf
>>>> 0x5cb510, lenRcvBuf -1
>>>> 6640.408171000:imtcp.c: hasRcvInBuffer on nsd 0x5c3d70: pszRcvBuf
>>>> 0x5d45c0, lenRcvBuf -1
>>>> 6640.408185000:imtcp.c: hasRcvInBuffer on nsd 0x5cf460: pszRcvBuf
>>>> 0x5dd010, lenRcvBuf -1
>>>> 6640.408199000:imtcp.c: hasRcvInBuffer on nsd 0x5d8600: pszRcvBuf
>>>> 0x5e58b0, lenRcvBuf -1
>>>> 6640.408213000:imtcp.c: hasRcvInBuffer on nsd 0x59d690: pszRcvBuf
>>>> 0x5ee150, lenRcvBuf -1
>>>> 6640.408227000:imtcp.c: hasRcvInBuffer on nsd 0x59d2f0: pszRcvBuf
>>>> 0x5f7320, lenRcvBuf -1
>>>> 6640.408241000:imtcp.c: hasRcvInBuffer on nsd 0x5e8840: pszRcvBuf
>>>> 0x5fe0a0, lenRcvBuf -1
>>>> 6640.408255000:imtcp.c: hasRcvInBuffer on nsd 0x6027b0: pszRcvBuf
>>>> 0x608350, lenRcvBuf -1
>>>> 6640.408269000:imtcp.c: hasRcvInBuffer on nsd 0x5c3660: pszRcvBuf
>>>> 0x612110, lenRcvBuf -1
>>>> 6640.408283000:imtcp.c: hasRcvInBuffer on nsd 0x602690: pszRcvBuf
>>>> 0x615040, lenRcvBuf -1
>>>> 6640.408296000:imtcp.c: hasRcvInBuffer on nsd 0x5c3b60: pszRcvBuf
>>>> 0x6218f0, lenRcvBuf -1
>>>> 6640.408310000:imtcp.c: hasRcvInBuffer on nsd 0x60c150: pszRcvBuf
>>>> 0x62a190, lenRcvBuf -1
>>>> 6640.408324000:imtcp.c: hasRcvInBuffer on nsd 0x625420: pszRcvBuf
>>>> 0x634fe0, lenRcvBuf -1
>>>> 6640.408338000:imtcp.c: hasRcvInBuffer on nsd 0x6117b0: pszRcvBuf
>>>> 0x63b000, lenRcvBuf -1
>>>> 6640.408352000:imtcp.c: hasRcvInBuffer on nsd 0x603990: pszRcvBuf
>>>> 0x643c70, lenRcvBuf -1
>>>> 6640.408365000:imtcp.c: hasRcvInBuffer on nsd 0x633c00: pszRcvBuf
>>>> 0x64dbb0, lenRcvBuf -1
>>>> 6640.408379000:imtcp.c: hasRcvInBuffer on nsd 0x62cfa0: pszRcvBuf
>>>> 0x650960, lenRcvBuf -1
>>>> 6640.408393000:imtcp.c: hasRcvInBuffer on nsd 0x633d10: pszRcvBuf
>>>> 0x65dbc0, lenRcvBuf -1
>>>> 6640.408407000:imtcp.c: hasRcvInBuffer on nsd 0x5f2860: pszRcvBuf
>>>> 0x666020, lenRcvBuf -1
>>>> 6640.408421000:imtcp.c: hasRcvInBuffer on nsd 0x64d250: pszRcvBuf
>>>> 0x66e480, lenRcvBuf -1
>>>> 6640.408435000:imtcp.c: hasRcvInBuffer on nsd 0x659b30: pszRcvBuf
>>>> 0x676d20, lenRcvBuf 78
>>>> 6640.408448000:imtcp.c: hasRcvInBuffer on nsd 0x669fb0: pszRcvBuf
>>>> 0x67fdd0, lenRcvBuf -1
>>>> 6640.408462000:imtcp.c: hasRcvInBuffer on nsd 0x59a630: pszRcvBuf
>>>> 0x687f10, lenRcvBuf -1
>>>> 6640.408476000:imtcp.c: hasRcvInBuffer on nsd 0x6614c0: pszRcvBuf
>>>> 0x6907b0, lenRcvBuf -1
>>>> 6640.408490000:imtcp.c: hasRcvInBuffer on nsd 0x60c2d0: pszRcvBuf
>>>> 0x699050, lenRcvBuf -1
>>>> 6640.408504000:imtcp.c: hasRcvInBuffer on nsd 0x5a0550: pszRcvBuf (nil),
>>>> lenRcvBuf 0
>>>>
>>>> This repeats enough to generate around 29 million lines in the debug log
>>>> file.
>>>>
>>>> In the client log, there's plenty of these which seems somewhat normal:
>>>>
>>>> 6509.743164000:imuxsock.c: --------imuxsock calling select, active file
>>>> descriptors (max 11): 11
>>>> 6509.743231000:imuxsock.c: Message from UNIX socket: #11
>>>> 6509.743262000:imuxsock.c: logmsg: flags 4, from 'vanguard', msg May 20
>>>> 23:15:09 bin/qmail-local[23198]: using .qmail .qmail-cfsc-forum
>>>> 6509.743281000:imuxsock.c: Message has legacy syslog format.
>>>> 6509.743302000:imuxsock.c: main queue: entry added, size now 579 entries
>>>> 6509.743325000:imuxsock.c: wtpAdviseMaxWorkers signals busy
>>>> 6509.743343000:imuxsock.c: main queue: EnqueueMsg advised worker start
>>>>
>>>> However comparing it to an earlier point in the log  there's none of the
>>>> associated tcp queue entries like this:
>>>>
>>>> 3587.234178000:main queue:Reg/w0: TCP sent 107 bytes, requested 107
>>>> 3587.234204000:main queue:Reg/w0: main queue: entering rate limiter
>>>> 3587.234229000:main queue:Reg/w0: main queue: entry deleted, state 0,
>>>> size now 0 entries
>>>> 3587.234255000:main queue:Reg/w0: Called action, logging to builtin-file
>>>> 3587.234285000:main queue:Reg/w0:  (/var/log/maillog)
>>>> 3587.234314000:main queue:Reg/w0: Called action, logging to builtin-fwd
>>>> 3587.234340000:main queue:Reg/w0:  ldap.nmsrv.com
>>>> 3587.234359000:main queue:Reg/w0:  ldap.nmsrv.com:10514/tcp
>>>> 3587.234425000:main queue:Reg/w0: TCP sent 78 bytes, requested 78
>>>> 3587.234446000:main queue:Reg/w0: main queue: entering rate limiter
>>>> 3587.234471000:main queue:Reg/w0: main queue:Reg/w0: worker IDLE,
>>>> waiting for work.
>>>>
>>>> Anyone able to shed some light on this situation? I've got the full
>>>> debug logs if it's useful to anyone to take a closer look.
>>>>
>>>> Thanks!
>>>>
>>>> - Nathan
>>>>
>>>>
>>>>
>>>>         
>> ----------------------------------------------------------------------------
>>     
>>> --------
>>>       
>>>> Server config file
>>>>
>>>>         
>> ----------------------------------------------------------------------------
>>     
>>> --------
>>>       
>>>> $ModLoad immark.so # provides --MARK-- message capability
>>>> $ModLoad imuxsock.so # provides support for local system logging (e.g.
>>>> via logger command)
>>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
>>>>
>>>> $umask 0137
>>>> $DirCreateMode 0640
>>>> $FileCreateMode 0640
>>>> $FileOwner root
>>>> $FileGroup admin
>>>>
>>>> $template DynFile,"/var/log/rsyslog/%HOSTNAME%/messages"
>>>> $template DynFileCron,"/var/log/rsyslog/%HOSTNAME%/cron"
>>>> $template DynFileSecure,"/var/log/rsyslog/%HOSTNAME%/secure"
>>>> $template DynFileMail,"/var/log/rsyslog/%HOSTNAME%/maillog"
>>>> $template DynFileSpool,"/var/log/rsyslog/%HOSTNAME%/spooler"
>>>> $template DynFileBoot,"/var/log/rsyslog/%HOSTNAME%/boot"
>>>> $template DynFileSyslog,"/var/log/rsyslog/%HOSTNAME%/syslog"
>>>>
>>>> :msg, contains, "no keys found for"            ~
>>>> :msg, contains, "session opened for user"        ~
>>>>
>>>> # Log all kernel messages to the console.
>>>> # Logging much else clutters up the screen.
>>>> kern.*                                                 /dev/console
>>>>
>>>> # Log anything (except mail) of level info or higher.
>>>> # Don't log private authentication messages!
>>>> *.info;mail.none;cron.none                     ?DynFile
>>>>
>>>> *.warn;\
>>>>         authpriv.none;cron.none;mail.none;news.none     ?DynFileSyslog
>>>>
>>>> # Log all the mail messages in one place.
>>>> mail.*                                                  ?DynFileMail
>>>>
>>>> # Log cron stuff
>>>> cron.*                                                  ?DynFileCron
>>>>
>>>> # Everybody gets emergency messages
>>>> *.emerg                                                 *
>>>>
>>>> # Save news errors of level crit and higher in a special file
>>>> uucp,news.crit                                          ?DynFileSpool
>>>>
>>>> # Save boot messages also to boot.log
>>>> local7.*                                                ?DynFileBoot
>>>>
>>>> # Relp config for PCI
>>>> $ModLoad imrelp
>>>> $InputRELPServerRun 2515
>>>>
>>>> # make gtls driver the default
>>>> $DefaultNetstreamDriver gtls
>>>>
>>>> # certificate files
>>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
>>>> $DefaultNetstreamDriverCertFile /etc/ssl/servercrt.pem
>>>> $DefaultNetstreamDriverKeyFile /etc/ssl/serverkey.pem
>>>>
>>>> $ModLoad imtcp.so
>>>>
>>>> $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode
>>>> $InputTCPServerStreamDriverAuthMode anon # client is NOT authenticated
>>>> $InputTCPServerRun 10514 # start up listener at port 10514
>>>> $InputTCPMaxSessions 500
>>>>
>>>>
>>>>         
>> ----------------------------------------------------------------------------
>>     
>>> --------
>>>       
>>>> Client config file
>>>>
>>>>         
>> ----------------------------------------------------------------------------
>>     
>>> --------
>>>       
>>>> $ModLoad imuxsock.so # provides support for local system logging (e.g.
>>>> via logger command)
>>>> $ModLoad imklog.so # kernel logging (formerly provided by rklogd)
>>>>
>>>> $umask 0137
>>>> $DirCreateMode 0640
>>>> $FileCreateMode 0640
>>>> $FileOwner root
>>>> $FileGroup admin
>>>>
>>>>
>>>> # Log anything 'info' or higher, but lower than 'warn'.
>>>> # Exclude authpriv, cron, mail, and news.  These are logged elsewhere.
>>>> *.info;*.!warn;\
>>>>         authpriv.none;cron.none;mail.none;news.none     /var/log/messages
>>>>
>>>> # Log anything 'warn' or higher.
>>>> # Exclude authpriv, cron, mail, and news.  These are logged elsewhere.
>>>> *.warn;\
>>>>         authpriv.none;cron.none;mail.none;news.none     /var/log/syslog
>>>>
>>>> # Debugging information is logged here.
>>>> *.=debug                                                /var/log/debug
>>>>
>>>> # Private authentication message logging:
>>>> authpriv.*                                              /var/log/secure
>>>>
>>>> # Cron related logs:
>>>> cron.*                                                  /var/log/cron
>>>>
>>>> # Mail related logs:
>>>> mail.*                                                  /var/log/maillog
>>>>
>>>> # Emergency level messages go to all users:
>>>> *.emerg                                                 *
>>>>
>>>>
>>>> ## Uncomment these lines to use RELP instead for PCI compliance (stunnel
>>>> required)
>>>> #$ModLoad omrelp
>>>> #*.*;mail.none :omrelp:localhost:2515;RSYSLOG_ForwardFormat
>>>>
>>>> # certificate files - just CA for a client
>>>> $DefaultNetstreamDriverCAFile /etc/ssl/ca.pem
>>>>
>>>> # set up the action
>>>> $DefaultNetstreamDriver gtls # use gtls netstream driver
>>>> $ActionSendStreamDriverMode 1 # require TLS for the connection
>>>> $ActionSendStreamDriverAuthMode anon # server is NOT authenticated
>>>> *.* @@(o)ldap.nmsrv.com:10514 # send (all) messages
>>>>
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.comoften
>>>>         
>>> --
>>> Patrick Shen
>>> Operations Engineer
>>>
>>> net mobile AG Shanghai office
>>>
>>> T: +86  21 61350900 - 222
>>> F: +86  21 61350906
>>> M: +86  13122245730
>>>
>>> FE55 6A7F 3192 F359 C1DF  12F6 9078 58B4 57AF 0BE3
>>>
>>> _______________________________________________
>>> rsyslog mailing list
>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>> http://www.rsyslog.com
>>>       
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com
>>     
>
>
>   
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to