Hello, I have not been very active on theNiFi mailing lists, but I have been working with NiFi for several years acrossdozens of companies. I have a great appreciation for NiFi’s value in real-worldscenarios. Its growth over the last few years has been very impressive, and Iwould like to see a further expansion of NiFi’s capabilities.
Over the last few months, I have beenworking on a new NiFi run-time to address some of the limitation that I haveseen in the field. Its intent is not to replace the existing NiFi engine, butrather to extend the possible applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an alternate run-time that expands NiFi’s reach tocloud scale. Given the similarities, MagNiFi might have been a bettername, but it was already trademarked. Here are some of the limitations thatI have seen in the field. In many cases, there are entirely valid reasons forthis behavior, but this behavior also prevents NiFi from being used for certainuse cases. - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed while the other part fails - For example, ConsumeKafka acks beforedownstream processing even starts. - Given this behavior, data deliveryguarantees require writing all incoming data to local disk in order to handlenode failures. - While this helps to accommodate non-resilient sources (e.g.TCP), it has downsides: - Increases cost significantly as throughput requirements rise(especially in the cloud) - Increases HA complexity, because the state on each node must bedurable - e.g. content repository replicationsimilar to Kafka is a common ask to improve this - Reduces flexibility, because data has to be migrated off of nodesto scale down - NiFi environments must be sized forthe peak expected volumes given the complexity of scaling up and down. - Resources are wasted when use caseshave periods of lower volume (such as overnight or on weekends) - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop (i.e. MapReduce) - Flow-specific error handling isrequired (such as this processor group) - NiFi’s content repository is now the source of truth and the flowcannot be restarted easily. - This is useful for multi-destination flows, because errors can behandled individually, but unnecessary in other cases (e.g. Kafka to Solr). - Job/task oriented data movement usecases do not fit well with NiFi - For example: triggering data movement as part of a scheduler job - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark ETL job to loadit into Hive, then run a report and send it to users. - In every other way, NiFi fits this use case. It just needs a joboriented interface/runtime that returns success or fail and allows fortimeouts. - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but it should be a first class runtime option - NiFi does not provide resource controls for multi-tenancy, requiring organizations to have multiple clusters - Granular authorization policies are possible, but there are no resource usage policies such as what YARN and other container engines provide. - The items listed in #1 make this even more challenging to accommodate than it would be otherwise. NiFi-Fn is a library for running NiFiflows as stateless functions. It provides similar delivery guarantees as NiFiwithout the need for on-disk repositories by waiting to confirm receipt ofincoming data until it has been written to the destination. This is similar toStorm’s acking mechanism and Spark’s interface for committing Kafka offsets,except that in nifi-fn, this is completely handled by the framework while stillsupporting all NiFi processors and controller services natively without change.This results in the ability to run NiFi flows as ephemeral, stateless functionsand should be able to rival MirrorMaker, Distcp, and Scoop for performance,efficiency, and scalability while leveraging the vast library of NiFiprocessors and the NiFi UI for building custom flows. By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn flows can be deployed that take fulladvantage of the platform’s scale and multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources (or just cron) for event-drivendata movement where flows only run when triggered and pricing is measured atthe 100ms granularity. By combining the two, large-scale batch processing couldalso be performed. An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could provide a clean solution for aNiFi jobs interface. A user could select a run-time on a per process group basisto take advantage of the NiFi-Fn efficiency and job-like execution whenappropriate without requiring a container engine or FaaS platform. A newmonitoring interface could then be provided in the NiFi UI for thesejob-oriented workloads. Potential NiFi-Fn run-times include: - Java (done) - Docker (done) - OpenWhisk - Java (done) - Custom (done) - YARN (done) - Kubernetes (TODO) - AWS Lambda (TODO) - Azure Functions (TODO) - Google Cloud Functions (TODO) - Oracle Fn (TODO) - CloudFoundry (TODO) - NiFi custom processor (TODO) - NiFi jobs runtime (TODO) The core of NiFi-Fn is complete,but it could use some improved testing, more run-times, and better reporting forlogs, metrics, and provenance. Sam Hjelmfelt Principal Software Engineer Hortonworks