Parser chaining uses the original_string populated by the origin routing parser unless you explicitly change it. https://github.com/apache/metron/blob/master/metron-platform/metron-parsing/metron-parsers-common/ParserChaining.md#example
For example, the logs here - http://www.monitorware.com/en/logsamples/cisco-pix-61(2).php Would result in a sample enveloped message with: { "original_string" : "Mar 29 2004 09:54:18: %PIX-6-302005: Built UDP connection for faddr 198.207.223.240/53337 gaddr 10.0.0.187/53 laddr 192.168.0.2/53", "payload" : "Built UDP connection for faddr 198.207.223.240/53337 gaddr 10.0.0.187/53 laddr 192.168.0.2/53", etc. } On Fri, May 10, 2019 at 6:11 PM Otto Fowler <ottobackwa...@gmail.com> wrote: > The original string would be the string specified as the message body, thus > each message in the chain produced would just be the bytes passed in, from > a specific field in the incoming message. > > > > On May 10, 2019 at 19:55:28, Simon Elliston Ball ( > si...@simonellistonball.com) wrote: > > My understanding is that chaining preserves (correctly to my mind) the > original original string. > > In other words: unless the message strategy is raw message, the original > string is just passed through. Original string therefore comes from outside > Metron, and is preserved throughout Metron processes, allowing for > recreation of original form for forensics and evidentiary purposes. > > Simon > > > On 11 May 2019, at 00:10, Otto Fowler <ottobackwa...@gmail.com> wrote: > > > > What about parser chaining? Should the original string be from kafka, or > > the last parsed? > > > > > > On May 10, 2019 at 19:03:39, Simon Elliston Ball ( > > si...@simonellistonball.com) wrote: > > > > The only scenario I can think of where a parser might treat original > string > > differently, or even need to know about it would be different encoding > > locales. For example, if the string were to be encoded in a locale > specific > > to the device and choose the encoding based on metadata or parsed > content, > > then that could merit pushing it down. The other edge might be when you > > have binary data that does not go down to an original string well (eg a > > netflow parser). > > > > That said, that’s a highly unlikely edge case that could be handled by > > workarounds. > > > > I’m a definitely +1 on Nick’s idea of pulling original string up to the > > runner. Right now we’re pretty inconsistent in how it’s done, so that > would > > help. > > > > Simon > > > > Sent from my iPhone > > > > On 10 May 2019, at 23:10, Nick Allen <n...@nickallen.org> wrote: > > > >>> I suppose we could always allow this to be overridden, also. > >> > >> I like an on/off switch for the "original string" functionality. If on, > >> you get the original string in pristine condition. If off, no original > >> string is appended for those who care more about storage space. > >> > >> I can't think of a reason where one kind of parser would have a > different > >> original string mechanism than the others. If something like that does > >> come up, the parser can create its own original string by just naming it > >> something different and then turning "off" the switch that you > described. > >> > >> > >> > >> On Fri, May 10, 2019 at 5:53 PM Michael Miklavcic < > >> michael.miklav...@gmail.com> wrote: > >> > >>> I think that's an excellent idea. Can anyone think of a situation where > > we > >>> wouldn't want to add this the same way for all parsers? I suppose we > > could > >>> always allow this to be overridden, also. > >>> > >>>> On Fri, May 10, 2019 at 3:43 PM Nick Allen <n...@nickallen.org> > wrote: > >>>> > >>>> I think maintaining the integrity of the original data makes a lot of > >>> sense > >>>> for any parser. And ideally the original string should be what came > out > >>> of > >>>> Kafka with only the minimally necessary processing. > >>>> > >>>> With that in mind, we could solve this one level up. Instead of > relying > >>> on > >>>> each parser to do this right, we could have the ParserRunner and > >>>> specifically the ParserRunnerImpl [1] handle this round-abouts here > >>>> < > >>>> > >>> > > > > https://github.com/apache/metron/blob/1b6ef88c79d60022542cda7e9abbea7e720773cc/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java#L149-L158 > >>>>> > >>>> [1]. > >>>> It has the raw message data and can append the original string to each > >>>> message it gets back from the parsers. > >>>> > >>>> Just another approach to consider. > >>>> > >>>> -- > >>>> [1] > >>>> > >>>> > >>> > > > > https://github.com/apache/metron/blob/1b6ef88c79d60022542cda7e9abbea7e720773cc/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/ParserRunnerImpl.java#L149-L158 > >>>> > >>>> On Fri, May 10, 2019 at 4:11 PM Otto Fowler <ottobackwa...@gmail.com> > >>>> wrote: > >>>> > >>>>> +1 > >>>>> > >>>>> > >>>>> On May 10, 2019 at 13:57:55, Michael Miklavcic ( > >>>>> michael.miklav...@gmail.com) > >>>>> wrote: > >>>>> > >>>>> When adding the capability for parsing messages in the JsonMapParser > >>>> using > >>>>> JSON Path expressions the original behavior for managing original > >>> strings > >>>>> was changed. > >>>>> > >>>>> > >>>>> > >>>> > >>> > > > > https://github.com/apache/metron/blob/master/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/json/JSONMapParser.java#L192 > >>>>> > >>>>> A couple issues have been reported recently regarding this change: > >>>>> > >>>>> 1. We're losing the actual original string, which is a legal issue > for > >>>>> data lineage for some customers > >>>>> 2. Even for the degenerate case with no sub-messages created, the > >>>>> original sub-message string is modified because of the > >>>>> serialization/deserialization process with Jackson/JsonSimple. The > >>> fields > >>>>> are reordered bc the content is normalized. > >>>>> > >>>>> I looked at options for preserving formatting, but am unable to find > a > >>>>> method that allows you to both parse, then query the original message > >>> and > >>>>> then also obtain the raw string matches without the normalizing from > >>>>> ser/deserialization. > >>>>> > >>>>> I'd like to propose that we add a configuration option for this > parser > >>>> that > >>>>> allows the user to toggle which approach they'd like to use. My > >>> personal > >>>>> preference based on feedback I've gotten from multiple customers is > >>> that > >>>>> the default should be the older approach which takes the raw original > >>>>> string. It's arguable that this change in contract is a regression, > so > >>>> the > >>>>> default should be the earlier behavior. Any sub-messages would then > >>> have > >>>> a > >>>>> copy of that raw original string, not just the sub-message original > >>>> string. > >>>>> Enabling the flag would enable the current sub-message original > string > >>>>> functionality. > >>>>> > >>>>> Mike > >>>>> > >>>> > >>> >