On 7/8/17 9:23 PM, David Lang wrote:
On Sat, 8 Jul 2017, deoren wrote:

Looking around I learned of these two directives:

$DebugLevel 2
$DebugFile /var/log/rsyslog-debug.log

I added those, rebooted the VM and quickly had lots of debug info to work with. In the file I found these entries:

5676.682567045:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
5676.682583062:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet 10014 5676.682588226:sendToLogserver queue:Reg/w0: Action 1 transitioned to state: rtry 5676.682594836:sendToLogserver queue:Reg/w0: action 'sendToLogserver': is transactional - executing in commit phase 5676.682597400:sendToLogserver queue:Reg/w0: actionDoRetry: sendToLogserver enter loop, iRetries=0, ResumeInRow 1
5676.682684496:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
5676.682688779:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet 10014 5676.682691652:sendToLogserver queue:Reg/w0: actionDoRetry: sendToLogserver action->tryResume returned -2007 5676.682693866:sendToLogserver queue:Reg/w0: actionDoRetry: sendToLogserver check for max retries, iResumeRetryCount -1, iRetries 0


I believe that getaddrinfo is attempting to lookup the IP for the given FQDN, but it's failing with whatever error 111 is. Looking at the counts given by way of the impstats module, it appears that the queue is only growing and even if the system appears to be fully functional and other daemons are accessing the network without issue, rsyslog still refuses to send messages to the remote system.

If rsyslog is unable to resolve the name, it cannot send the message. I always put the log destinations in /etc/hosts or configure it to send to an IP address.

I can see the advantage to that approach and have been considering both approaches (the second approach the more appealing)


rsyslog will suspend sending logs when the attempt to connect fails, and will only retry periodically (with a back-off to keep from probing too frequently as the probes themselves can be a problem)

I see lots of retries within the log file, and it's not necessarily the time it waits that I see as a problem (at least in this initial configuration), but that once rsyslog tries to resolve the name it doesn't seem (looking at this from the outside, just by the "feel" of things) to ever make another attempt at resolving the name to an IP once the network is fully established. It feels like rsyslog is continuing to use a cached result (whatever that may be) with future retry attempts. I left the test system running overnight and it was still hung up the next day.


we've had other people report that the backoff gets unreasonably long, we should put in a limit to how long it will wait.

I'm not one to argue with adding more knobs/buttons to fine tune behavior. :)

In your debug logs, look for the initial suspend message, it should say when it will try again (you can also configure rsyslog to log suspends and resumes as well)

Thank you for those tips. I'm fairly new to anything close to 'advanced' with rsyslog, but I believe I have logging of suspends and resumes enabled globally and via the specific queue for omrelp. My hope was that it would assist with determining whether anything at all was happening.

For what it is worth, I'm booting this test Ubuntu system from a SSD. Once I move it onto slower storage I see a 3x slower startup time:

root@vmclone:/var/log# systemd-analyze critical-chain rsyslog.service | grep rsyslog.service
rsyslog.service +616ms

Running the same command on the SSD copy of that VM I see about 220ms startup time. I'm also new to systemd, so I might be misinterpreting the values, but it appears that the slower load time for rsyslog is giving the system sufficient time to load all required networking support so that the remote server's name resolves to an IP properly.

As indicated before, regardless of the boot speed, if I enter the IP mapping in /etc/hosts, the bare address within the omrelp action (target) or if I add 'After=network.target' to the /lib/systemd/system/rsyslog.service file ([Unit] section) I get positive results. It is when I don't do one of those things and boot the VM from the SSD that I'm seeing these results.

Any thoughts/tips/tricks re the repeat getaddrinfo failures, continuing (according to the debug log file) even after the system has been up for a while? Other applications, such as remote_rsyslog2, are able to send messages to the remote syslog server (same version of ryslog as the client) using the FQDN without issue.

Thanks again for your help with this.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to