I suspect that what's happening is that a batch of logs is being sent to elasticsearch and part of the batch is being processed, but then some log isn't being accepted.

If rsyslog gets an error delivering a batch of messages, it will retry with the first half of the batch (recursively) so that all messages that can be delivered will be. But the effect of this is if the recipient of the batch processes some messages of the batch before erroring out, the early messages will show up multiple times.

look to see if you have any errors from the elasticsearch input or in rsyslog from the omelasticsearch module.

David Lang

On Mon, 14 Dec 2015, Alec Swan wrote:

Since I started running imfile in polling mode I noticed that occasionally
some log messages show up multiple times in elasticsearch. However, they
are only written once by omfile. The following shows the last two messages
written by omfile. However, I see the last message "No gossip backlog;
proceeding" three times in elasticsearch. It didn't used to happen when I
was tailing cassandra.log in inotify mode. Very strange.

{ "@timestamp": "2015-12-14T17:33:20.648376+00:00", "message": "Compacting
large row
Store_Users:3b2f551bc93215fe875a1297e62446e6fb341134e18fc2184c2b27302f9eca24
(149184786 bytes) incrementally"}
{ "@timestamp": "2015-12-14T17:33:27.813282+00:00", "message": "No gossip
backlog; proceeding" }

On Mon, Dec 14, 2015 at 10:38 AM, Alec Swan <[email protected]> wrote:

Thanks, Ciprian. Switching to polling mode fixed the problem.

This is very concerning because documentation
<http://www.rsyslog.com/doc/v8-stable/configuration/modules/imfile.html>
highly encourages the use of inotify mode and goes as far as to say "In
theory, there should never be a reason to enable “polling” mode and later
versions will most probably remove it."
I hope polling mode doesn't get removed until the problem with reading log
files that get recreated by programs, such as Cassandra, is fixed in
inotify mode.
How is your system resource usage with polling mode?

Thanks,

Alec

On Mon, Dec 14, 2015 at 3:34 AM, Ciprian Hacman <
[email protected]> wrote:

Hi Alec,

To get around some of the issues with imfile and the default inotify mode,
we changed to module(load="imfile" mode="polling").
We don't see any issues with rotated logs.

Ciprian

--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Sun, Dec 13, 2015 at 7:49 PM, Alec Swan <[email protected]> wrote:

> David, cassandra.log file exists when I start rsyslog. However, I
noticed
> that Cassandra recreates the log file every time it restarts. So, that
> explains why I need to restart rsyslog (so that it re-reads the new file
> recreated by Cassandra) after Cassandra restarts.
> Is there a way to configure rsyslog to read files which get periodically
> recreated?
>
> Thanks,
>
> Alec
>
> On Sun, Dec 13, 2015 at 1:07 AM, David Lang <[email protected]> wrote:
>
> > I suspect that no logs are being read from the file before you
restart.
> >
> > does the logfile exist when rsyslog starts?
> >
> > if you do a HUP of rsyslog instead of a restart, does it start reading
> > logs from the file?
> >
> > impstats would show how many logs have been processed from that input.
> >
> > David Lang
> >
> > On Sun, 13 Dec 2015, Alec Swan wrote:
> >
> > Date: Sun, 13 Dec 2015 00:26:40 -0700
> >> From: Alec Swan <[email protected]>
> >> Reply-To: rsyslog-users <[email protected]>
> >> To: rsyslog-users <[email protected]>
> >> Subject: Re: [rsyslog] Delayed batch processing
> >>
> >>
> >> Ciprian, that's correct, I am using imfile to tail cassandra.log.
Sorry
> I
> >> wasn't clear about that. I am not using startmsg.regex setting. My
> >> configuration for imfile, template and liblognorm rule are shown
below.
> >> I'd
> >> like to make clear that it's dozens of log lines, not just the last
one,
> >> that are not being sent. Note that Cassandra log lines start with a
> space,
> >> which I handle in liblognorm rule (see below).
> >>
> >> input(
> >>     type = "imfile"
> >>     File = "/var/log/cassandra/cassandra.log"
> >>     Tag = "cassandra"
> >>     readMode = "0"
> >>     Facility = "local2"
> >>     Ruleset = "cassandra.log"
> >> )
> >>
> >> ruleset(name = "cassandra.log") {
> >>     # Properties prefixed with $! will be included in json by
> mmnormalize
> >>     #set $!@timestamp = $timereported;
> >>     set $!@timestamp = exec_template("timereportedrfc3339");
> >>     set $!host = $hostname;
> >>     set $!severity = $syslogseverity-text;
> >>     set $!facility = $syslogfacility-text;
> >>     set $!syslogtag = $syslogtag;
> >>     set $!filename = "cassandra.log";
> >>
> >>     # Parse logfile using liblognorm
> >>
> >>           action(type = "mmnormalize" rulebase =
> >> "/etc/rsyslog.d/rules/cassandra.log.rb")
> >>
> >>
> >>     # Output parsed logs
> >>     action(type = "omfile" template="es-payload"
> >> file="/var/log/rsyslog/rsyslog-cassandra.log" FileCreateMode="0644")
> >>     action(
> >>         type = "omelasticsearch"
> >>         template = "es-payload"
> >>         dynSearchIndex = "on"
> >>         searchIndex = "logstash-index"
> >>         searchType = "cassandra"
> >>         server = "myhost"
> >>         serverport = "9200"
> >>         errorFile = "/var/log/rsyslog/ES-error.log"
> >>         bulkmode = "on"
> >>         action.resumeretrycount="5"  # retry if ES is unreachable (-1
> for
> >> infinite retries)
> >>         action.resumeInterval="60"
> >>         queue.dequeuebatchsize="1000"   # ES bulk size
> >>         queue.type="FixedArray"
> >>         queue.size="100000"
> >>         queue.workerthreads="5"
> >>         queue.spoolDirectory="/var/spool/rsyslog"
> >>         queue.filename="plain"
> >>         queue.maxfilesize="100m"
> >>         queue.maxdiskspace="1g"
> >>         # queue.highwatermark="50000"
> >>         # queue.lowwatermark="20000"
> >>         queue.saveonshutdown="on"
> >>     )
> >>     stop
> >> }
> >>
> >>
> >> Liblognorm configuration in /etc/rsyslog.d/rules/cassandra.log.rb:
> >>
> >>
> >> # The rule to parse each line in the log file. IMPORTANT: The leading
> >> space
> >> is important (see
> >> http://www.rsyslog.com/log-normalization-and-the-leading-space/)
> >> rule=: %log_level:word% %log_time:word% %message:rest%
> >>
> >> On Sun, Dec 13, 2015 at 12:01 AM, Ciprian Hacman <
> >> [email protected]> wrote:
> >>
> >> I see Alec is using imfile and omelasticsearch so I suppose he is
> tailing
> >>> a
> >>> log file.
> >>> By any chance, are you using Cassandra to capture multiline logs
(using
> >>> the
> >>> startmsg.regex setting)? That would explain why the last log line is
> not
> >>> sent until restart.
> >>>
> >>> Ciprian
> >>>
> >>> --
> >>> Performance Monitoring * Log Analytics * Search Analytics
> >>> Solr & Elasticsearch Support * http://sematext.com/
> >>>
> >>> On Sun, Dec 13, 2015 at 6:45 AM, David Lang <[email protected]> wrote:
> >>>
> >>> On Sat, 12 Dec 2015, Alec Swan wrote:
> >>>>
> >>>> David, I don't have name= attribute in the action statement.
However,
> >>>>
> >>> there
> >>>
> >>>> are log messages from omelasticsearch module after rsyslog restart
> (see
> >>>>>
> >>>> my
> >>>
> >>>> previous email).
> >>>>>
> >>>>> Cassandra does not generate a log of logs and I noticed that the
few
> >>>>>
> >>>> that
> >>>
> >>>> get generated are not sent to elasticsearch until rsyslog is
> restarted.
> >>>>>
> >>>> It
> >>>
> >>>> feels like logs are being buffered and then sent to ES on
subsequent
> >>>>> startup.
> >>>>>
> >>>>> So, here is what I see happening:
> >>>>> 1. I start Rsyslog
> >>>>> 2. I start Cassandra and it generates several dozens of log lines
> >>>>> 3. No logs show up in ES (even if I restart Cassandra several
times)
> >>>>> 4. I restart Rsyslog
> >>>>> 5. The logs generated in step 2 show up in ES
> >>>>>
> >>>>>
> >>>> This does not sound like rsyslog is buffering the logs, but that
there
> >>>> is
> >>>> some other problem.
> >>>>
> >>>>
> >>>> how are the logs getting from Cassandra to rsyslog?
> >>>>
> >>>> can you enable impstats so that you can see how many logs are
> received,
> >>>> and how many are processed by the various actions?
> >>>>
> >>>> it would help to name new style actions so that it's easier to
track
> >>>>
> >>> them.
> >>>
> >>>>
> >>>> in your debug log (the output of -dn), track a known message that
> >>>> Cassandra generates and see exactly what happens to it.
> >>>>
> >>>>
> >>>> David Lang
> >>>> _______________________________________________
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>> http://www.rsyslog.com/professional-services/
> >>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
> >>>> DON'T LIKE THAT.
> >>>>
> >>>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com/professional-services/
> >>> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> >>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
> >>> DON'T LIKE THAT.
> >>>
> >>> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
> >> DON'T LIKE THAT.
> >>
> >> _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T LIKE THAT.
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
DON'T LIKE THAT.



_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to