Phil,

For the content repository, you can configure the directory by changing the 
value of
the "nifi.content.repository.directory.default" property in nifi.properties. 
The suffix here,
"default" is the name of this "container". You can have multiple containers by 
adding extra
properties. So, for example, you could set:

nifi.content.repository.directory.content1=/nifi/repos/content-1
nifi.content.repository.directory.content2=/nifi/repos/content-2
nifi.content.repository.directory.content3=/nifi/repos/content-3
nifi.content.repository.directory.content4=/nifi/repos/content-4

Similarly, the Provenance Repo property is named 
"nifi.provenance.repository.directory.default"
and can have any number of "containers":

nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1
nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2
nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3
nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4

When NiFi writes to these, it does a Round Robin so that if you're writing to 4 
Flow Files'
content simultaneously with different threads, you're able to get the full 
throughput of each
disk. (So if you have 4 disks for your content repo, each capable of writing 
100 MB/sec, then
your effective write rate to the content repo is 400 MB/sec). Similar with 
Provenance Repository.

Doing this also will allow you to hold a larger 'archive' of content and 
provenance data, because
it will span the archive across all of the listed directories, as well.

Thanks
-Mark



> On Sep 11, 2018, at 3:35 PM, Phil H <gippyp...@gmail.com> wrote:
> 
> Thanks Mark, this is great advice.
> 
> Disk access is certainly an issue with the current set up. I will certainly
> shoot for NVMe disks in the build. How does NiFi get configured to span
> it's repositories across multiple physical disks?
> 
> Thanks,
> Phil
> 
> On Wed, 12 Sep 2018 at 01:32, Mark Payne <marka...@hotmail.com> wrote:
> 
>> Phil,
>> 
>> As Sivaprasanna mentioned, your bottleneck will certainly depend on your
>> flow.
>> There's nothing inherent about NiFi or the JVM, AFAIK that would limit
>> you. I've
>> seen NiFi run on VM's containing 4-8 cores, and I've seen it run on bare
>> metal
>> on servers containing 96+ cores. Most often, I see people with a lot of
>> CPU cores
>> but insufficient disk, so if you're running several cores ensure that
>> you're using
>> SSD's / NVMe's or enough spinning disks to accommodate the flow. NiFi does
>> a good
>> job of spanning the content and FlowFile repositories across multiple
>> disks to take
>> full advantage of the hardware, and scales the CPU vertically by way of
>> multiple
>> Processors and multiple concurrent tasks (threads) on a given Processor.
>> 
>> It really comes down to what you're doing in your flow, though. If you've
>> got 96 cores and
>> you're trying to perform 5 dozen transformations against a large number of
>> FlowFiles
>> but have only a single spinning disk, then those 96 cores will likely go
>> to waste, because
>> your disk will bottleneck you.
>> 
>> Likewise, if you have 10 SSD's and only 8 cores you're likely going to
>> waste a lot of
>> disk because you won't have the CPU needed to reach the disks' full
>> potential.
>> So you'll need to strike the correct balance for your use case.Since you
>> have the
>> flow running right now, I would recommend looking at things like `top` and
>> `iostat` in order
>> to understand if you're reaching your limit on CPU, disk, etc.
>> 
>> As far as RAM is concerned, NiFI typically only needs 4-8 GB of ram for
>> the heap. However,
>> more RAM means that your operating system can make better use of disk
>> caching, which
>> can certainly speed things up, especially if you're reading the content
>> several times for
>> each FlowFile.
>> 
>> Does this help at all?
>> 
>> Thanks
>> -Mark
>> 
>> 
>>> On Sep 10, 2018, at 6:05 AM, Phil H <gippyp...@gmail.com> wrote:
>>> 
>>> Thanks for that. Sorry I should have been more specific - we have a flow
>>> running already on non-dedicated hardware. Looking to identify any
>>> limitations in NiFi/JVM that would limit how much parallelism it can take
>>> advantage of
>>> 
>>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna <sivaprasanna...@gmail.com>
>>> wrote:
>>> 
>>>> Phil,
>>>> 
>>>> The hardware requirements are driven by the nature of the dataflow you
>> are
>>>> developing. If you're looking to play around with NiFi and gain some
>>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern
>>>> laptops/computer would do the job. In my case, where I'm having 100s of
>>>> dataflows, I have it clustered with 3 nodes. Each having 16GB RAM and
>> 4(8)
>>>> cores. I went with SSDs of smaller size because my flows are involved in
>>>> writing to object stores like Google Cloud Storage, Azure Blob and
>> Amazon
>>>> S3 and NoSQL DBs. Hope this helps.
>>>> 
>>>> -
>>>> Sivaprasanna
>>>> 
>>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <gippyp...@gmail.com> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I've been asked to spec some hardware for a NiFi installation. Does
>>>> anyone
>>>>> have any advice? My gut feel is lots of processor cores and RAM, with
>>>> less
>>>>> emphasis on storage (small fast disks). Are there any limitations on
>> how
>>>>> many cores the JRE/NiFi can actually make use of, or any other
>>>>> considerations like that I should be aware of?
>>>>> 
>>>>> Most likely will be pairs of servers in a cluster, but again any advice
>>>> to
>>>>> the contrary would be appreciated.
>>>>> 
>>>>> Cheers,
>>>>> Phil
>>>>> 
>>>> 
>> 
>> 

Reply via email to