Hello again, On Mon, May 18, 2015 at 10:44:15 -0700, Rob Miller wrote: > On 05/18/2015 02:10 AM, Kai Storbeck wrote: > >Hello Heka, > > > >I'm currently streaming a logfile containing large XML messages. They > >are separated by a long line with dashes, so I'm making use of > >RegexSplitter containing those dashes. > > > >This works, the messages are getting thrown over to elasticsearch for > >indexing. > > > > > >Restarting heka will now give me an error: > > > >> 2015/05/18 10:29:56 Decoder 'b2bsoap-b2bdecoder-1' error: No match: .. > >> ..... > >> ..... > >> ... > >> </closing xml tag> > > > >I percieve that his is a problem in the bookkeeping of the seek > >position, as that points to the middle of a multiline record. > Yes, I think that's correct. > >Can I assist in curing this? Is it curable? Is it a good starting point > >to help improving heka? Or are there smaller outstanding issues to > >assist with... > Sure, your help resolving this would be welcome. I took a quick peek and I > think that the issue is related to the following code: > > https://github.com/mozilla-services/heka/blob/dev/plugins/logstreamer/logstreamer_input.go#L359 > > That's the LogstreamInput (a pool of which are managed by each > LogstreamerInput) telling the underlying stream to update the ring buffer > with the latest read position. You'll notice that it's happening there > whenever n > 0, i.e. whenever any data is successfully read from the input > stream. What you're asking for is to instead only update the read position if > len(record) > 0, which implies that a full record was retrieved.
I sort of found my problem, our log delimiter is not at the end of the logmessage but at the start. This is not so smart for several reasons, I know. Legacy stuff. I'll ask the devs to add a trailing delimiter instead :) OTOH, this "problem" starts occurring after reading the second file, and never in the first. It gets worse in the ones after that. So, I ended up writing a testcase: https://github.com/giganteous/heka/tree/starting-delimiter-test I don't know if it's related though. But really: If this kind of logging doesn't suit the project's goals: ignore the issue. I can offer documenting the current weakness with this delimiter? > You'll want to test this out, though, rather than take my word for it. > There's a lot of code in there, it might be that even if you change that code > the location will still get flushed to disk at shutdown. Hopefully this is a > good starting point. Well: I certainly learned my way around in the logstreamer input plugin. There's a few things that I noticed though. The hash is made from the last 500 bytes. That's a tad small for the tail end of our XML. Ofcourse I'm now adding a unique UUID at the bottom of my XML. Shall I document that? > If you do tackle this, I think it would be nice to retain backwards > compatibility by turning the new behavior on with a config flag, say if the > user sets `update_cursor_on_record_boundary` to true, or something. I don't think this setting is needed. I don't see a problem when my delimiter is at the bottom. It nicely matches up with the last char of the delimiter. Regards, Kai -- An above the .signature production
signature.asc
Description: Digital signature
_______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

