Re: [heka] Logstreamer input with RegexSplitter

Kai Storbeck Thu, 28 May 2015 14:47:12 -0700

Hello again,

On Mon, May 18, 2015 at 10:44:15 -0700, Rob Miller wrote:
> On 05/18/2015 02:10 AM, Kai Storbeck wrote:
> >Hello Heka,
> >
> >I'm currently streaming a logfile containing large XML messages. They
> >are separated by a long line with dashes, so I'm making use of
> >RegexSplitter containing those dashes.
> >
> >This works, the messages are getting thrown over to elasticsearch for
> >indexing.
> >
> >
> >Restarting heka will now give me an error:
> >
> >> 2015/05/18 10:29:56 Decoder 'b2bsoap-b2bdecoder-1' error: No match: ..
> >> .....
> >> .....
> >> ...
> >> </closing xml tag>
> >
> >I percieve that his is a problem in the bookkeeping of the seek
> >position, as that points to the middle of a multiline record.
> Yes, I think that's correct.
> >Can I assist in curing this? Is it curable? Is it a good starting point
> >to help improving heka? Or are there smaller outstanding issues to
> >assist with...
> Sure, your help resolving this would be welcome. I took a quick peek and I 
> think that the issue is related to the following code:
> 
> https://github.com/mozilla-services/heka/blob/dev/plugins/logstreamer/logstreamer_input.go#L359
> 
> That's the LogstreamInput (a pool of which are managed by each 
> LogstreamerInput) telling the underlying stream to update the ring buffer 
> with the latest read position. You'll notice that it's happening there 
> whenever n > 0, i.e. whenever any data is successfully read from the input 
> stream. What you're asking for is to instead only update the read position if 
> len(record) > 0, which implies that a full record was retrieved.


I sort of found my problem, our log delimiter is not at the end of the
logmessage but at the start. This is not so smart for several reasons,
I know. Legacy stuff. I'll ask the devs to add a trailing delimiter
instead :)

OTOH, this "problem" starts occurring after reading the second
file, and never in the first. It gets worse in the ones after that.

So, I ended up writing a testcase:
  https://github.com/giganteous/heka/tree/starting-delimiter-test
I don't know if it's related though.


But really: If this kind of logging doesn't suit the project's goals:
ignore the issue.

I can offer documenting the current weakness with this delimiter?

> You'll want to test this out, though, rather than take my word for it. 
> There's a lot of code in there, it might be that even if you change that code 
> the location will still get flushed to disk at shutdown. Hopefully this is a 
> good starting point.

Well: I certainly learned my way around in the logstreamer input plugin.
There's a few things that I noticed though. The hash is made from the
last 500 bytes. That's a tad small for the tail end of our XML. Ofcourse
I'm now adding a unique UUID at the bottom of my XML. Shall I document
that?

> If you do tackle this, I think it would be nice to retain backwards 
> compatibility by turning the new behavior on with a config flag, say if the 
> user sets `update_cursor_on_record_boundary` to true, or something.

I don't think this setting is needed. I don't see a problem when my
delimiter is at the bottom. It nicely matches up with the last char of
the delimiter.

Regards,
Kai

-- 
An above the .signature production

signature.asc
Description: Digital signature

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] Logstreamer input with RegexSplitter

Reply via email to