Parser chaining uses the original_string populated by the origin routing
parser unless you explicitly change it.
https://github.com/apache/metron/blob/master/metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md#example

For example, the logs here -
http://www.monitorware.com/en/logsamples/cisco-pix-61(2).php
Would result in a sample enveloped message with:
{
"original_string" : "Mar 29 2004 09:54:18: %PIX-6-302005: Built UDP
connection for faddr 198.207.223.240/53337 gaddr 10.0.0.187/53 laddr
192.168.0.2/53",
"payload" : "Built UDP connection for faddr 198.207.223.240/53337 gaddr
10.0.0.187/53 laddr 192.168.0.2/53",
etc.
}


On Fri, May 10, 2019 at 6:11 PM Otto Fowler <ottobackwa...@gmail.com> wrote:

> The original string would be the string specified as the message body, thus
> each message in the chain produced would just be the bytes passed in, from
> a specific field in the incoming message.
>
>
>
> On May 10, 2019 at 19:55:28, Simon Elliston Ball (
> si...@simonellistonball.com) wrote:
>
> My understanding is that chaining preserves (correctly to my mind) the
> original original string.
>
> In other words: unless the message strategy is raw message, the original
> string is just passed through. Original string therefore comes from outside
> Metron, and is preserved throughout Metron processes, allowing for
> recreation of original form for forensics and evidentiary purposes.
>
> Simon
>
> > On 11 May 2019, at 00:10, Otto Fowler <ottobackwa...@gmail.com> wrote:
> >
> > What about parser chaining? Should the original string be from kafka, or
> > the last parsed?
> >
> >
> > On May 10, 2019 at 19:03:39, Simon Elliston Ball (
> > si...@simonellistonball.com) wrote:
> >
> > The only scenario I can think of where a parser might treat original
> string
> > differently, or even need to know about it would be different encoding
> > locales. For example, if the string were to be encoded in a locale
> specific
> > to the device and choose the encoding based on metadata or parsed
> content,
> > then that could merit pushing it down. The other edge might be when you
> > have binary data that does not go down to an original string well (eg a
> > netflow parser).
> >
> > That said, that’s a highly unlikely edge case that could be handled by
> > workarounds.
> >
> > I’m a definitely +1 on Nick’s idea of pulling original string up to the
> > runner. Right now we’re pretty inconsistent in how it’s done, so that
> would
> > help.
> >
> > Simon
> >
> > Sent from my iPhone
> >
> > On 10 May 2019, at 23:10, Nick Allen <n...@nickallen.org> wrote:
> >
> >>> I suppose we could always allow this to be overridden, also.
> >>
> >> I like an on/off switch for the "original string" functionality. If on,
> >> you get the original string in pristine condition. If off, no original
> >> string is appended for those who care more about storage space.
> >>
> >> I can't think of a reason where one kind of parser would have a
> different
> >> original string mechanism than the others. If something like that does
> >> come up, the parser can create its own original string by just naming it
> >> something different and then turning "off" the switch that you
> described.
> >>
> >>
> >>
> >> On Fri, May 10, 2019 at 5:53 PM Michael Miklavcic <
> >> michael.miklav...@gmail.com> wrote:
> >>
> >>> I think that's an excellent idea. Can anyone think of a situation where
> > we
> >>> wouldn't want to add this the same way for all parsers? I suppose we
> > could
> >>> always allow this to be overridden, also.
> >>>
> >>>> On Fri, May 10, 2019 at 3:43 PM Nick Allen <n...@nickallen.org>
> wrote:
> >>>>
> >>>> I think maintaining the integrity of the original data makes a lot of
> >>> sense
> >>>> for any parser. And ideally the original string should be what came
> out
> >>> of
> >>>> Kafka with only the minimally necessary processing.
> >>>>
> >>>> With that in mind, we could solve this one level up. Instead of
> relying
> >>> on
> >>>> each parser to do this right, we could have the ParserRunner and
> >>>> specifically the ParserRunnerImpl [1] handle this round-abouts here
> >>>> <
> >>>>
> >>>
> >
>
> https://github.com/apache/metron/blob/1b6ef88c79d60022542cda7e9abbea7e720773cc/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java#L149-L158
> >>>>>
> >>>> [1].
> >>>> It has the raw message data and can append the original string to each
> >>>> message it gets back from the parsers.
> >>>>
> >>>> Just another approach to consider.
> >>>>
> >>>> --
> >>>> [1]
> >>>>
> >>>>
> >>>
> >
>
> https://github.com/apache/metron/blob/1b6ef88c79d60022542cda7e9abbea7e720773cc/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java#L149-L158
> >>>>
> >>>> On Fri, May 10, 2019 at 4:11 PM Otto Fowler <ottobackwa...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> +1
> >>>>>
> >>>>>
> >>>>> On May 10, 2019 at 13:57:55, Michael Miklavcic (
> >>>>> michael.miklav...@gmail.com)
> >>>>> wrote:
> >>>>>
> >>>>> When adding the capability for parsing messages in the JsonMapParser
> >>>> using
> >>>>> JSON Path expressions the original behavior for managing original
> >>> strings
> >>>>> was changed.
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >
>
> https://github.com/apache/metron/blob/master/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/json/JSONMapParser.java#L192
> >>>>>
> >>>>> A couple issues have been reported recently regarding this change:
> >>>>>
> >>>>> 1. We're losing the actual original string, which is a legal issue
> for
> >>>>> data lineage for some customers
> >>>>> 2. Even for the degenerate case with no sub-messages created, the
> >>>>> original sub-message string is modified because of the
> >>>>> serialization/deserialization process with Jackson/JsonSimple. The
> >>> fields
> >>>>> are reordered bc the content is normalized.
> >>>>>
> >>>>> I looked at options for preserving formatting, but am unable to find
> a
> >>>>> method that allows you to both parse, then query the original message
> >>> and
> >>>>> then also obtain the raw string matches without the normalizing from
> >>>>> ser/deserialization.
> >>>>>
> >>>>> I'd like to propose that we add a configuration option for this
> parser
> >>>> that
> >>>>> allows the user to toggle which approach they'd like to use. My
> >>> personal
> >>>>> preference based on feedback I've gotten from multiple customers is
> >>> that
> >>>>> the default should be the older approach which takes the raw original
> >>>>> string. It's arguable that this change in contract is a regression,
> so
> >>>> the
> >>>>> default should be the earlier behavior. Any sub-messages would then
> >>> have
> >>>> a
> >>>>> copy of that raw original string, not just the sub-message original
> >>>> string.
> >>>>> Enabling the flag would enable the current sub-message original
> string
> >>>>> functionality.
> >>>>>
> >>>>> Mike
> >>>>>
> >>>>
> >>>
>

Reply via email to