Jonathan NAR stands for "NiFi Archive" and it is simply our approach to classloader isolation. In maven then you can build these nar's and they become bundles of processors/services/whatever NiFi extension you want which you can then include into a build ala carte style or however. The framework itself is even a nar. We did all this to first reduce the amount stuff (and thus possible conflicts) on the system classloader and second we did it so different bundles of capabilities could have different and sometimes conflicting dependencies but there to be no problem.
I am with you on the distributed side being where things get more interesting. Processors execute in nodes and all nodes execute all processors. They only fire if they have data though. So you interface with protocols which inherently offer load balancing or you ensure that NiFi is pulling the data and it by its nature will do load balancing. We then include things like back-pressure and congestion avoidance functions which means even if load balancing isn't working we can still have nodes 'back off' and thus other nodes naturally pick up the slack. This helps to address the natural hot-spotting that tends to occur in data flow and data processing. Fault tolerance: If a node dies the data on the node at this time is 'as dead as the node'. Any new data will be routed around that node in the case of pushes from producers to NiFi or other nodes will take over the load of pulling in the case of pull by NiFi from those consumers. The cluster of NiFi Nodes is managed by a thing called the 'NiFi Cluster Manager [NCM]'. As long as it is up you can comman and control all the nodes in the cluster. If the NCM is dead the nodes all keep rolling based on what they knew last. You then also have to be concerned about network partitioning. Each node is always heartbeating to the NCM. If the NCM doesn't get a heartbeat in a reasonable period of time it will mark that node as disconnected. When any node in the cluster is disconnected then data flow changes cannot be made. This prevents the flow from being into a bad state as a result of the partitioning event. If you know that node is not just isolated but is actually 'dead', 'gone for while', whatever then you can delete it from the cluster and then you'll be able to make changes to the flow again. As new nodes join the cluster the first thing that happens is the typical 'am i authorized' to be here check. Once through that then the NCM sends a copy of the latest flow configuration to the node at which point the node loads the flow begins its journey. I'm glossing over a lot of the fun details of course but we will obviously dig further here and if there are any tracks you want to go deeper on sooner then please advise. Thanks Joe On Tue, Dec 16, 2014 at 11:04 PM, Jonathan Natkins <[email protected]> wrote: > > Hey Joe, > > This is really helpful. In terms of examples of good architectural > descriptions, I think the Kafka overview is pretty great (I think a lot of > it came from the original academic paper). It's very helpful for > understanding the key concepts and design trade-offs. My personal feeling > is that diagrams are very helpful: my guess is that the single-node > processing layer is not all that complex, but where architecture gets > interesting (and where a lot of my curiosity lies) is once you get into the > distributed modes. How is fault-tolerance handled, how do I specify which > processors operate on which nodes, how is cluster membership handled, etc. > > Also: what does NAR stand for? > > Thanks! > Natty > > Jonathan "Natty" Natkins > StreamSets | Customer Engagement Engineer > mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice> > > > On Tue, Dec 16, 2014 at 6:08 PM, Joe Witt <[email protected]> wrote: > > > > Natty > > > > There are very little existing resources as of yet but fully recognize > that > > this is a problem. > > > > https://issues.apache.org/jira/browse/NIFI-162 > > > > If there are specific examples of architectural descriptions that you > think > > are well done I'd love to see them. > > > > The very brief version of how execution and scale work: > > > > Execution: > > NiFi runs within the JVM. As data flows through a given NiFi instance > > there are two primary repositories that we keep which hold key > information > > about the data. One repository is known as the Flowfile repository and > its > > job is to keep information about the data in the flow. The other > > repository is the content repository and it keeps the actual data. In > nifi > > you're composing directed graphs of processors. Each processor is > > scheduled to run according to its configured scheduling style and is > given > > time to run by a flow controller/thread-pool. When a given process runs > it > > is given access to the Flowfile Repository and content repository as > > necessary to be able to access and modify the data in a safe and > efficient > > manner. > > > > Out of the box the flow file repo can be all in-memory or run off a > > write-ahead log based implementation with high reliability and > throughput. > > For the content repo it too supports all in-memory or using one or more > > disks in parallel yielding again very high throughput with excellent > > durability. > > > > Scale: > > Vertical: Supports highly concurrent processing and can utilize multiple > > physical disks in parallel. > > Horizontal: Supports clustering whereby a cluster manager relays commands > > to nodes in the cluster and coordinates all their responses. Nodes then > > operate as they would if they were standalone. > > > > Lots more coming here of course but if you have specific questions now > > please feel free to fire away. > > > > Thanks > > Joe > > > > > > > > > > On Tue, Dec 16, 2014 at 7:16 PM, Jonathan Natkins <[email protected]> > > wrote: > > > > > > Hi there, > > > > > > I was curious if there exist any resources that would be helpful in > > > understanding the NiFi architecture. I'm trying to understand how > > dataflows > > > are executed, or how I would scale the system. Are there any > > architectural > > > docs, or blog posts, or academic papers out there that would be > helpful? > > > > > > Alternatively, some pointers into the code base as to where the > execution > > > layer code lives could be helpful. > > > > > > Thanks! > > > Natty > > > > > > Jonathan "Natty" Natkins > > > StreamSets | Customer Engagement Engineer > > > mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice> > > > > > >
