I think this is a good idea.  As I mentioned in the other thread I’ve been
doing a lot of work on Nifi recently.
I think the important thing is that what is done should be done the NiFi
way, not bolting the Metron composition
onto Nifi.  Think of it like the Tao of Unix, the parsers and components
should be single purpose and simple, allowing
exceptional flexibility in composition.

Comments inline.

On August 7, 2018 at 09:27:01, Justin Leet (justinjl...@gmail.com) wrote:

Hi all,

There's interest in being able to run Metron parsers in NiFi, rather than
inside Storm. I dug into this a bit, and have some thoughts on how we could
go about this. I'd love feedback on this, along with anything we'd
consider must haves as well as future enhancements.

1. Separate metron-parsers into metron-parsers-common and metron-storm
and create metron-parsers-nifi. For this code to be reusable across
platforms (NiFi, Storm, and anything else in the future), we'll need to
decouple our parsers and Storm.

+1.  The “parsing code” should be a library that implements an interface (
another library ).

The Processors and the Storm things can share them.


- There's also some nice fringe benefits around refactoring our code
to be substantially more clear and understandable; something
which came up
while allowing for parser aggregation.
2. Create a MetronProcessor that can run our parsers.
- I took a look at how RecordReader could be leveraged (e.g.
CSVRecordReader), but this is pretty tightly tied into schemas
and is meant
to be used by ControllerServices, which are then used by Processors.
There's friction involved there in terms of schemas, but also in terms of
access to ZK configs and things like parser chaining. We might
be able to
leverage it, but it seems like it'd be fairly shoehorned in
without getting
the schema and other benefits.

We won’t have to provide our ‘no schema processors’ ( grok, csv, json ).

All the remaining processors DO have schemas that we know about.  We can
just provide the avro schemas the same way we provide the ES schemas.

The “parsing” should not be conflated with the transform/stellar in NiFi.
We should make that separate. Running Stellar over Records would be the
best thing.



- This Processor would work similarly to Storm: bytes[] in -> JSON
out.
- There is a Processor
<
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java
>
that
handles loading other JARs that we can model a
MetronParserProcessor off of
that handles classpath/classloader issues (basically just sets up a
classloader specific to what's being loaded and swaps out the Thread's
loader when it calls to outside resources).

There should be no reason to load modules outside the NAR.  Why do you
expect to?  If each Metron Processor equiv of a Metron Storm Parser is just
parsing to json it shouldn’t need much.And we could package them in the
NAR.  I would suggest we have a Processor per Parser to allow for
specialization.  It should all be in the nar.

The Stellar Processor, if you would support the works would possibly need
this.


3. Create a MetronZkControllerService to supply our configs to our
processors.
- This is a pretty established NiFi pattern for being able to provide
access to other services needed by a Processor (e.g. databases or large
configurations files).
- The same controller service can be used by all Processors to manage
configs in a consistent manner.

I think controller services would make sense where needed, I’m just not
sure what you imagine them being needed for?

If the user has NiFi, and a Registry etc, are you saying you imagine them
using Metron + ZK to manage configurations?  Or to be using BOTH storm
processors and Nifi Processors?



At that point, we can just NAR our controller service and parser processor
up as needed, deploy them to NiFi, and let the user provide a config for
where their custom parsers can be provided (i.e. their parser jar). This
would be 3 nars (processor, controller-service, and controller-service-api
in order to bind the other two together).

Once deployed, our ability to use parsers should fit well into the standard
NiFi workflow:

1. Create a MetronZkControllerService.
2. Configure the service to point at zookeeper.
3. Create a MetronParser.
4. Configure it to use the controller service + parser jar location +
any other needed configs.
5. Use the outputs as needed downstream (either writing out to Kafka or
feeding into more MetronParsers, etc.)

Chaining parsers should ideally become a matter of chaining MetronParsers
(and making sure the enveloping configs carry through properly). For parser
aggregation, I'd just avoid it entirely until we know it's needed in NiFi.

Justin

Reply via email to