Hi Radu, thanks for the response.
I've ran rsyslog on one of the machines, as you specified below. After that
I redeployed my logging server and changed the IP. There was no change in
the behavior or syslog - I still get the same logs before and after the
change. The logs are a little verbose, so here's a snippet of what I see
regarding my omfwd action (this is from after the change, but it was
looking identical before - except for the timestamp):
5545.609123832:7f1331cdf700: actionCommitAll: action 4, state 1, nbr to
commit 0 isTransactional 1
5545.609128757:7f1331cdf700: doTransaction: have commitTransaction IF,
using that, pWrkrInfo 0x196d090
5545.609133137:7f1331cdf700: entering actionCallCommitTransaction(), state:
itx, actionNbr 4, nMsgs 1
5545.609137473:7f1331cdf700: logging.server
5545.609142140:7f1331cdf700: logging.server:5545/tcp
5545.609147889:7f1331cdf700: omfwd: add 297 bytes to send buffer (curr offs
0)
5545.609152831:7f1331cdf700: omfwd: endTransaction, offsSndBuf 297, iRet
-2121
5545.609183045:7f1331cdf700: omfwd: TCP sent 297 bytes, requested 297
5545.609188736:7f1331cdf700: Action 4 transitioned to state: rdy
5545.609193196:7f1331cdf700: omfwd: beginTransaction
5545.609197183:7f1331cdf700: logging.server
5545.609201276:7f1331cdf700: Action 4 transitioned to state: itx
5545.609205543:7f1331cdf700: Action 4 transitioned to state: rdy
5545.609209770:7f1331cdf700: actionCommit, in retry loop, iRet 0
Another thing I figured out - which makes me more troubled about the
apparent change of behavior between rsyslog 5.x and 8.x - is that when I
deploy a new logging server and assign the elastic IP to it, the IP that
rsyslog sees does change: all machines are running on EC2 and so address
each other using EC2's "private IPs" (a class A non-routable network),
while the elastic IP is a public IP mapped to the actual internal IP of the
logging server.
So the behavior can be explained by rsyslog not doing DNS lookups for its
remote targets: once it starts and resolves the logging.server name to the
intenal IP, it keeps sending there, even after the elastic IP has moved.
While the old machine is still running (because moving the elastic IP does
not change its private IP, nor terminates it) the logs keep showing up on
the old machine. Once the old machine is turned off, I can see rsyslog
reopening the omfwd connection and everything starts working again.
A solution for my setup might be a setting in rsyslog to either always
re-resolve the DNS record before submitting a new message, or at least
occasionally refresh the cached DNS result (every 60 seconds or so).
From: Radu Gheorghe <[email protected]>
Hi Oded,
I've never seen this issue, but maybe you can see something in the debug
log?
To get the debug log, I usually start rsyslog with something like:
# rsyslogd -dn > /var/log/rsyslog.log
Best regards,
Radu
On Mon, Aug 11, 2014 at 4:08 PM, Oded Arbel <[email protected]> wrote:
Hi list,
I've been using the rsyslog version 5.8 delivered with Ubuntu 12.04
(which my servers are currently running with) to forward message to a
central logstash server over TCP, and that worked fine - the
forwarding part, I had problems with multiline messages and other
formatting issues, so I upgraded to rsyslog 8 and started to use the
omfwd module with a complex template - and now those problems are
solved, but I have a new one.
The central logging server is on EC2 with an elastic IP and I update
its configuration from time to time by basically building a new server
and moving the elastic IP. With the old 5.8 installation, this used to
work fine - as soon as the elastic IP moved, all the rsyslog daemons
would lose the connection to the server and reconnect. With version
8's omfwd, this doesn't happen - rsyslog doesn't notice the connection
dropping and doesn't reconnect. I have to log in to each servers and
restart the service to get it to deliver messages again.
There are messages sent to the rsyslogds during that time, and these
are logged to the local files correctly, but not delivered to the
central server (neither the old nor the new one). Also, nothing about
this is logged to the local syslog file.
Please advise?