Completely agreed on the acking. The reason I posed the question to begin with was because, while I believe dropping+acking is the correct functionality, I could see a few alternative patterns for handling this:
1. Require filtering to be handled by the message filter infrastructure and publish an error to the error queue if field transformations such as REGEX_SELECT violate this by dropping messages. 2. Default records to be written to enrichments, or handle per my comments in #1 3. Default records to be written to the topic defined by outputTopic (non-default version of #2) At any rate, we should fix the acking problem and then the dropped messages pattern makes sense to me. I've created a Jira to track it - https://issues.apache.org/jira/browse/METRON-1948. On Wed, Dec 19, 2018 at 12:43 PM Casey Stella <ceste...@gmail.com> wrote: > We absolutely should be acking the dropped messages otherwise they'll be in > a replay loop. Not acking is a flat-out bug IMO. > > On Wed, Dec 19, 2018 at 2:37 PM Michael Miklavcic < > michael.miklav...@gmail.com> wrote: > > > When a message is filtered by the message filtering mechanism, we > > explicitly drop the message (and presumably ack it in Storm), as > explained > > here - > > > > > https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#filtered > > . > > When using the REGEX_SELECT field transformation (see here - > > > > > https://github.com/apache/metron/tree/master/metron-platform/metron-parsing#fieldtransformation-configuration > > ) > > with the kafka.topicField option for parser-chaining, it's unclear to me > > whether we expect the same behavior (drop message, ack it). The > > interpretation I get from this example in the parser-chaining doc > > > > > https://github.com/apache/metron/tree/master/use-cases/parser_chaining#the-pix_syslog_router-parser > > suggests to me that the approach we take for messages with message > > filtering is the correct one, however in testing an example with dropped > > messages, we appear not to ack those dropped messages. > > > > Before I go creating a fix I thought it best to summarize and confirm my > > expectations on this functionality. Messages from a REGEX_SELECT that > don't > > match a pattern, and therefore don't get a value assigned to their output > > topic value, should be dropped and acked. > > > > *Example:* > > { > > "parserClassName": "org.apache.metron.parsers.GrokParser", > > "sensorTopic": "myInTopic", > > ... > > "parserConfig": { > > ..., > > "kafka.topicField": "output_topic" > > }, > > "fieldTransformations": [ > > { > > "input": [ > > "message" > > ], > > "output": [ > > "output_topic" > > ], > > "transformation": "REGEX_SELECT", > > "config": { > > "world": "^Hello " > > } > > }, > > ... > > } > > > > *Input Records:* > > "...sshd[32469]: Hello..." > > "...sshd[30432]: Bye..." > > > > *Output:* > > Kafka topic = "world" (as determined by the REGEX_SELECT pattern match > that > > sets the "output_topic" property used by kafka.topicField) > > 1 record present > > contents of that record = our record with "Hello" in it > > 1 record is dropped ("Bye" record) and will not be forwarded any further > > through the pipeline. > > >