Hello,

I have not been very active on theNiFi mailing lists, but I have been working 
with NiFi for several years acrossdozens of companies. I have a great 
appreciation for NiFi’s value in real-worldscenarios. Its growth over the last 
few years has been very impressive, and Iwould like to see a further expansion 
of NiFi’s capabilities.

 

Over the last few months, I have beenworking on a new NiFi run-time to address 
some of the limitation that I haveseen in the field. Its intent is not to 
replace the existing NiFi engine, butrather to extend the possible 
applications. Similar to MiNiFi extendingNiFi to the edge, NiFi-Fn is an 
alternate run-time that expands NiFi’s reach tocloud scale. Given the 
similarities, MagNiFi might have been a bettername, but it was already 
trademarked.

 

Here are some of the limitations thatI have seen in the field. In many cases, 
there are entirely valid reasons forthis behavior, but this behavior also 
prevents NiFi from being used for certainuse cases.
   
   - NiFi flows do not succeed or fail as a unit. Part of a flow can succeed 
while the other part fails
   
   - For example, ConsumeKafka acks beforedownstream processing even starts.
   - Given this behavior, data deliveryguarantees require writing all incoming 
data to local disk in order to handlenode failures.    

   - While this helps to accommodate non-resilient sources (e.g.TCP), it has 
downsides:
   
   - Increases cost significantly as throughput requirements rise(especially in 
the cloud)
   - Increases HA complexity, because the state on each node must bedurable
   
   - e.g. content repository replicationsimilar to Kafka is a common ask to 
improve this
   
   - Reduces flexibility, because data has to be migrated off of nodesto scale 
down
   
   - NiFi environments must be sized forthe peak expected volumes given the 
complexity of scaling up and down.
   - Resources are wasted when use caseshave periods of lower volume (such as 
overnight or on weekends)
   - This improved in 1.8, but it isnowhere near as fluid as DistCp or Sqoop 
(i.e. MapReduce)
   
   - Flow-specific error handling isrequired (such as this processor group)
   
   - NiFi’s content repository is now the source of truth and the flowcannot be 
restarted easily.
   - This is useful for multi-destination flows, because errors can behandled 
individually, but unnecessary in other cases (e.g. Kafka to Solr).
   
   - Job/task oriented data movement usecases do not fit well with NiFi
   
   - For example: triggering data movement as part of a scheduler job
   
   - Every hour,run a MySQL extract, load it into HDFS using NiFi, run a spark 
ETL job to loadit into Hive, then run a report and send it to users.
   
   - In every other way, NiFi fits this use case. It just needs a joboriented 
interface/runtime that returns success or fail and allows fortimeouts.
   - I have seen this “macgyvered” using ListenHTTP and the NiFi RESTAPIs, but 
it should be a first class runtime option
   
   -  NiFi does not provide resource controls for multi-tenancy, requiring 
organizations to have multiple clusters
   
   - Granular authorization policies are possible, but there are no resource 
usage policies such as what YARN and other container engines provide.
   - The items listed in #1 make this even more challenging to accommodate than 
it would be otherwise.   


NiFi-Fn is a library for running NiFiflows as stateless functions. It provides 
similar delivery guarantees as NiFiwithout the need for on-disk repositories by 
waiting to confirm receipt ofincoming data until it has been written to the 
destination. This is similar toStorm’s acking mechanism and Spark’s interface 
for committing Kafka offsets,except that in nifi-fn, this is completely handled 
by the framework while stillsupporting all NiFi processors and controller 
services natively without change.This results in the ability to run NiFi flows 
as ephemeral, stateless functionsand should be able to rival MirrorMaker, 
Distcp, and Scoop for performance,efficiency, and scalability while leveraging 
the vast library of NiFiprocessors and the NiFi UI for building custom flows.




By leveraging container engines (e.g.YARN, Kubernetes), long-running NiFi-Fn 
flows can be deployed that take fulladvantage of the platform’s scale and 
multi-tenancy features. By leveragingFunction as a Service engines (FaaS) (e.g. 
AWS Lambda, Apache OpenWhisk), NiFi-Fn flows can be attached to event sources 
(or just cron) for event-drivendata movement where flows only run when 
triggered and pricing is measured atthe 100ms granularity. By combining the 
two, large-scale batch processing couldalso be performed.




An additional opportunity is tointegrate NiFi-Fn back into NiFi. This could 
provide a clean solution for aNiFi jobs interface. A user could select a 
run-time on a per process group basisto take advantage of the NiFi-Fn 
efficiency and job-like execution whenappropriate without requiring a container 
engine or FaaS platform. A newmonitoring interface could then be provided in 
the NiFi UI for thesejob-oriented workloads.




Potential NiFi-Fn run-times include:
   
   - Java (done)
   - Docker (done)
   - OpenWhisk
   
   - Java (done)
   - Custom (done)
   
   - YARN (done)
   - Kubernetes (TODO)
   - AWS Lambda (TODO)
   - Azure Functions (TODO)
   - Google Cloud Functions (TODO)
   - Oracle Fn (TODO)
   - CloudFoundry (TODO)
   - NiFi custom processor (TODO)
   - NiFi jobs runtime (TODO)

 

The core of NiFi-Fn is complete,but it could use some improved testing, more 
run-times, and better reporting forlogs, metrics, and provenance.

 

 

Sam Hjelmfelt

Principal Software Engineer

Hortonworks

Reply via email to