Re: NiFi architecture

Joe Witt Tue, 16 Dec 2014 18:09:07 -0800

Natty

There are very little existing resources as of yet but fully recognize that
this is a problem.

https://issues.apache.org/jira/browse/NIFI-162

If there are specific examples of architectural descriptions that you think
are well done I'd love to see them.

The very brief version of how execution and scale work:

Execution:
NiFi runs within the JVM.  As data flows through a given NiFi instance
there are two primary repositories that we keep which hold key information
about the data.  One repository is known as the Flowfile repository and its
job is to keep information about the data in the flow.  The other
repository is the content repository and it keeps the actual data.  In nifi
you're composing directed graphs of processors.  Each processor is
scheduled to run according to its configured scheduling style and is given
time to run by a flow controller/thread-pool.  When a given process runs it
is given access to the Flowfile Repository and content repository as
necessary to be able to access and modify the data in a safe and efficient
manner.

Out of the box the flow file repo can be all in-memory or run off a
write-ahead log based implementation with high reliability and throughput.
For the content repo it too supports all in-memory or using one or more
disks in parallel yielding again very high throughput with excellent
durability.

Scale:
Vertical: Supports highly concurrent processing and can utilize multiple
physical disks in parallel.
Horizontal: Supports clustering whereby a cluster manager relays commands
to nodes in the cluster and coordinates all their responses.  Nodes then
operate as they would if they were standalone.

Lots more coming here of course but if you have specific questions now
please feel free to fire away.

Thanks
Joe

On Tue, Dec 16, 2014 at 7:16 PM, Jonathan Natkins <[email protected]>
wrote:
>
> Hi there,
>
> I was curious if there exist any resources that would be helpful in
> understanding the NiFi architecture. I'm trying to understand how dataflows
> are executed, or how I would scale the system. Are there any architectural
> docs, or blog posts, or academic papers out there that would be helpful?
>
> Alternatively, some pointers into the code base as to where the execution
> layer code lives could be helpful.
>
> Thanks!
> Natty
>
> Jonathan "Natty" Natkins
> StreamSets | Customer Engagement Engineer
> mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice>
>

Re: NiFi architecture

Reply via email to