I think this is a good idea. As I mentioned in the other thread I’ve been doing a lot of work on Nifi recently. I think the important thing is that what is done should be done the NiFi way, not bolting the Metron composition onto Nifi. Think of it like the Tao of Unix, the parsers and components should be single purpose and simple, allowing exceptional flexibility in composition.
Comments inline. On August 7, 2018 at 09:27:01, Justin Leet (justinjl...@gmail.com) wrote: Hi all, There's interest in being able to run Metron parsers in NiFi, rather than inside Storm. I dug into this a bit, and have some thoughts on how we could go about this. I'd love feedback on this, along with anything we'd consider must haves as well as future enhancements. 1. Separate metron-parsers into metron-parsers-common and metron-storm and create metron-parsers-nifi. For this code to be reusable across platforms (NiFi, Storm, and anything else in the future), we'll need to decouple our parsers and Storm. +1. The “parsing code” should be a library that implements an interface ( another library ). The Processors and the Storm things can share them. - There's also some nice fringe benefits around refactoring our code to be substantially more clear and understandable; something which came up while allowing for parser aggregation. 2. Create a MetronProcessor that can run our parsers. - I took a look at how RecordReader could be leveraged (e.g. CSVRecordReader), but this is pretty tightly tied into schemas and is meant to be used by ControllerServices, which are then used by Processors. There's friction involved there in terms of schemas, but also in terms of access to ZK configs and things like parser chaining. We might be able to leverage it, but it seems like it'd be fairly shoehorned in without getting the schema and other benefits. We won’t have to provide our ‘no schema processors’ ( grok, csv, json ). All the remaining processors DO have schemas that we know about. We can just provide the avro schemas the same way we provide the ES schemas. The “parsing” should not be conflated with the transform/stellar in NiFi. We should make that separate. Running Stellar over Records would be the best thing. - This Processor would work similarly to Storm: bytes[] in -> JSON out. - There is a Processor < https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/JoltTransformJSON.java > that handles loading other JARs that we can model a MetronParserProcessor off of that handles classpath/classloader issues (basically just sets up a classloader specific to what's being loaded and swaps out the Thread's loader when it calls to outside resources). There should be no reason to load modules outside the NAR. Why do you expect to? If each Metron Processor equiv of a Metron Storm Parser is just parsing to json it shouldn’t need much.And we could package them in the NAR. I would suggest we have a Processor per Parser to allow for specialization. It should all be in the nar. The Stellar Processor, if you would support the works would possibly need this. 3. Create a MetronZkControllerService to supply our configs to our processors. - This is a pretty established NiFi pattern for being able to provide access to other services needed by a Processor (e.g. databases or large configurations files). - The same controller service can be used by all Processors to manage configs in a consistent manner. I think controller services would make sense where needed, I’m just not sure what you imagine them being needed for? If the user has NiFi, and a Registry etc, are you saying you imagine them using Metron + ZK to manage configurations? Or to be using BOTH storm processors and Nifi Processors? At that point, we can just NAR our controller service and parser processor up as needed, deploy them to NiFi, and let the user provide a config for where their custom parsers can be provided (i.e. their parser jar). This would be 3 nars (processor, controller-service, and controller-service-api in order to bind the other two together). Once deployed, our ability to use parsers should fit well into the standard NiFi workflow: 1. Create a MetronZkControllerService. 2. Configure the service to point at zookeeper. 3. Create a MetronParser. 4. Configure it to use the controller service + parser jar location + any other needed configs. 5. Use the outputs as needed downstream (either writing out to Kafka or feeding into more MetronParsers, etc.) Chaining parsers should ideally become a matter of chaining MetronParsers (and making sure the enveloping configs carry through properly). For parser aggregation, I'd just avoid it entirely until we know it's needed in NiFi. Justin