Hi Radu, thanks for the response.
I've ran rsyslog on one of the machines, as you specified below.
After that
I redeployed my logging server and changed the IP. There was no
change in
the behavior or syslog - I still get the same logs before and
after the
change. The logs are a little verbose, so here's a snippet of what
I see
regarding my omfwd action (this is from after the change, but it
was
looking identical before - except for the timestamp):
5545.609123832:7f1331cdf700: actionCommitAll: action 4, state 1,
nbr to
commit 0 isTransactional 1
5545.609128757:7f1331cdf700: doTransaction: have commitTransaction
IF,
using that, pWrkrInfo 0x196d090
5545.609133137:7f1331cdf700: entering
actionCallCommitTransaction(), state:
itx, actionNbr 4, nMsgs 1
5545.609137473:7f1331cdf700: logging.server
5545.609142140:7f1331cdf700: logging.server:5545/tcp
5545.609147889:7f1331cdf700: omfwd: add 297 bytes to send buffer
(curr offs
0)
5545.609152831:7f1331cdf700: omfwd: endTransaction, offsSndBuf
297, iRet
-2121
5545.609183045:7f1331cdf700: omfwd: TCP sent 297 bytes, requested
297
5545.609188736:7f1331cdf700: Action 4 transitioned to state: rdy
5545.609193196:7f1331cdf700: omfwd: beginTransaction
5545.609197183:7f1331cdf700: logging.server
5545.609201276:7f1331cdf700: Action 4 transitioned to state: itx
5545.609205543:7f1331cdf700: Action 4 transitioned to state: rdy
5545.609209770:7f1331cdf700: actionCommit, in retry loop, iRet 0
Another thing I figured out - which makes me more troubled about
the
apparent change of behavior between rsyslog 5.x and 8.x - is that
when I
deploy a new logging server and assign the elastic IP to it, the
IP that
rsyslog sees does change: all machines are running on EC2 and so
address
each other using EC2's "private IPs" (a class A non-routable
network),
while the elastic IP is a public IP mapped to the actual internal
IP of the
logging server.
So the behavior can be explained by rsyslog not doing DNS lookups
for its
remote targets: once it starts and resolves the logging.server
name to the
intenal IP, it keeps sending there, even after the elastic IP has
moved.
While the old machine is still running (because moving the elastic
IP does
not change its private IP, nor terminates it) the logs keep
showing up on
the old machine. Once the old machine is turned off, I can see
rsyslog
reopening the omfwd connection and everything starts working again.
A solution for my setup might be a setting in rsyslog to either
always
re-resolve the DNS record before submitting a new message, or at
least
occasionally refresh the cached DNS result (every 60 seconds or
so).
From: Radu Gheorghe <[email protected]>
Hi Oded,
I've never seen this issue, but maybe you can see something in
the debug
log?
To get the debug log, I usually start rsyslog with something like:
# rsyslogd -dn > /var/log/rsyslog.log
Best regards,
Radu
On Mon, Aug 11, 2014 at 4:08 PM, Oded Arbel <[email protected]>
wrote:
Hi list,
I've been using the rsyslog version 5.8 delivered with Ubuntu
12.04
(which my servers are currently running with) to forward message
to a
central logstash server over TCP, and that worked fine - the
forwarding part, I had problems with multiline messages and other
formatting issues, so I upgraded to rsyslog 8 and started to use
the
omfwd module with a complex template - and now those problems are
solved, but I have a new one.
The central logging server is on EC2 with an elastic IP and I
update
its configuration from time to time by basically building a new
server
and moving the elastic IP. With the old 5.8 installation, this
used to
work fine - as soon as the elastic IP moved, all the rsyslog
daemons
would lose the connection to the server and reconnect. With
version
8's omfwd, this doesn't happen - rsyslog doesn't notice the
connection
dropping and doesn't reconnect. I have to log in to each servers
and
restart the service to get it to deliver messages again.
There are messages sent to the rsyslogds during that time, and
these
are logged to the local files correctly, but not delivered to the
central server (neither the old nor the new one). Also, nothing
about
this is logged to the local syslog file.
Please advise?