Hi there,

In the beginning of our NiFi adoption, we faced similar issues. For us, we 
clustered NiFi, limited the number of concurrent tasks for each processor and 
added more logical partitions for content and provenance repositories. Now, we 
easily processor million of flow files per minute on a 5-node cluster with 
hundreds of processors in the data flow pipeline. When we need to ingest more 
data or process it faster, we simply add more nodes.

First and foremost, clustering NiFi allows horizontal scaling: a must. It seems 
counterintuitive, but limiting the number of concurrent tasks was a major 
performance improvement. Doing so keeps the flow "balanced", preventing 
hotspots within the flow pipeline.

I hope this helps

Rick.

--
Richard St. John, PhD
Asymmetrik
141 National Business Pkwy, Suite 110
Annapolis Junction, MD 20701

On Jul 3, 2017, 12:53 PM -0400, Karthik Kothareddy (karthikk) [CONT - Type 2] 
<karth...@micron.com>, wrote:
> All,
>
> I am currently using NiFi 1.2.0 on a Linux (RHEL) machine. I am using a 
> single instance without any clustering. My machine has ~800GB of RAM and 2.5 
> TB of disk space (SSD’s with RAID5). I have set my Java heap space values to 
> below in “bootstrap.conf” file
>
> # JVM memory settings
> java.arg.2=-Xms40960m
> java.arg.3=-Xmx81920m
>
> # Some custom Configurations
> java.arg.7=-XX:ReservedCodeCacheSize=1024m
> java.arg.8=-XX:CodeCacheMinimumFreeSpace=10m
> java.arg.9=-XX:+UseCodeCacheFlushing
>
> Now, the problem that I am facing when I am stress testing this instance is 
> whenever the Read/Write of Data feeds reach the limit of 5GB (at least that’s 
> what I observed) the whole instance is running super slow meaning the 
> flowfiles are moving very slow in the queues. It is heavily affecting the 
> other Processor groups as well which are very simple flows. I tied to read 
> the system diagnostics at that point and see that all the usage is below 20% 
> including heap Usage, flowFile and content repository usage. I tried to 
> capture the status history of the Process Group at that particular point and 
> below are some results.
>
>
>
>
>
>
>
>
>
> From the above images it is obvious that the process group is working on lot 
> of IO at that point. Is there a way to increase the throughput of the 
> instance given my requirement which has tons of read/writes every hour. Also 
> to add all my repositories (flowfile , content and provenance) are on the 
> same disk. I tried to increase all the memory settings I possibly can in both 
> bootstrap.conf and nifi.properties , but no use the whole instance is running 
> very slow and is processing minimum amount of flowfiles. Just to make sure I 
> created a GenerateFlowfile processor when the system is slow and to my 
> surprise the rate of flow files generated is less that one per minute (which 
> should fill the queue in less than 5 secs under normal circumstances). Any 
> help on this would be much appreciated.
>
>
> Thanks
> Karthik
>
>
>
>
>
>
>
>
>
>
>

Reply via email to