On 7/8/17 9:23 PM, David Lang wrote:
On Sat, 8 Jul 2017, deoren wrote:
Looking around I learned of these two directives:
$DebugLevel 2
$DebugFile /var/log/rsyslog-debug.log
I added those, rebooted the VM and quickly had lots of debug info to
work with. In the file I found these entries:
5676.682567045:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
5676.682583062:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet
10014
5676.682588226:sendToLogserver queue:Reg/w0: Action 1 transitioned to
state: rtry
5676.682594836:sendToLogserver queue:Reg/w0: action 'sendToLogserver':
is transactional - executing in commit phase
5676.682597400:sendToLogserver queue:Reg/w0: actionDoRetry:
sendToLogserver enter loop, iRetries=0, ResumeInRow 1
5676.682684496:sendToLogserver queue:Reg/w0: error 111 in getaddrinfo
5676.682688779:sendToLogserver queue:Reg/w0: end relpSessConnect, iRet
10014
5676.682691652:sendToLogserver queue:Reg/w0: actionDoRetry:
sendToLogserver action->tryResume returned -2007
5676.682693866:sendToLogserver queue:Reg/w0: actionDoRetry:
sendToLogserver check for max retries, iResumeRetryCount -1, iRetries 0
I believe that getaddrinfo is attempting to lookup the IP for the
given FQDN, but it's failing with whatever error 111 is. Looking at
the counts given by way of the impstats module, it appears that the
queue is only growing and even if the system appears to be fully
functional and other daemons are accessing the network without issue,
rsyslog still refuses to send messages to the remote system.
If rsyslog is unable to resolve the name, it cannot send the message. I
always put the log destinations in /etc/hosts or configure it to send to
an IP address.
I can see the advantage to that approach and have been considering both
approaches (the second approach the more appealing)
rsyslog will suspend sending logs when the attempt to connect fails, and
will only retry periodically (with a back-off to keep from probing too
frequently as the probes themselves can be a problem)
I see lots of retries within the log file, and it's not necessarily the
time it waits that I see as a problem (at least in this initial
configuration), but that once rsyslog tries to resolve the name it
doesn't seem (looking at this from the outside, just by the "feel" of
things) to ever make another attempt at resolving the name to an IP once
the network is fully established. It feels like rsyslog is continuing to
use a cached result (whatever that may be) with future retry attempts. I
left the test system running overnight and it was still hung up the next
day.
we've had other people report that the backoff gets unreasonably long,
we should put in a limit to how long it will wait.
I'm not one to argue with adding more knobs/buttons to fine tune
behavior. :)
In your debug logs, look for the initial suspend message, it should say
when it will try again (you can also configure rsyslog to log suspends
and resumes as well)
Thank you for those tips. I'm fairly new to anything close to 'advanced'
with rsyslog, but I believe I have logging of suspends and resumes
enabled globally and via the specific queue for omrelp. My hope was that
it would assist with determining whether anything at all was happening.
For what it is worth, I'm booting this test Ubuntu system from a SSD.
Once I move it onto slower storage I see a 3x slower startup time:
root@vmclone:/var/log# systemd-analyze critical-chain rsyslog.service |
grep rsyslog.service
rsyslog.service +616ms
Running the same command on the SSD copy of that VM I see about 220ms
startup time. I'm also new to systemd, so I might be misinterpreting the
values, but it appears that the slower load time for rsyslog is giving
the system sufficient time to load all required networking support so
that the remote server's name resolves to an IP properly.
As indicated before, regardless of the boot speed, if I enter the IP
mapping in /etc/hosts, the bare address within the omrelp action
(target) or if I add 'After=network.target' to the
/lib/systemd/system/rsyslog.service file ([Unit] section) I get positive
results. It is when I don't do one of those things and boot the VM from
the SSD that I'm seeing these results.
Any thoughts/tips/tricks re the repeat getaddrinfo failures, continuing
(according to the debug log file) even after the system has been up for
a while? Other applications, such as remote_rsyslog2, are able to send
messages to the remote syslog server (same version of ryslog as the
client) using the FQDN without issue.
Thanks again for your help with this.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.