Hello

So that is about 300MB/s if that really was 150Gb (not GB) per minute.
If you expect each node to be able to handle say 75MB/s of throughput
(which would be low but i'm being conservative) then you'd need 4 or
so boxes to hit that rate.  Then assume there will be surges in
arrival and lulls in processing then say 7-8 nodes.  The other thing
to consider is none of those protocols offer scalable exchange of
multi-node/queue-based interaction so having the cluster all operate
efficiently may be non-trivial.  In such a case you may be better off
having a few nodes that gather from a variety of predetermined
sources, compress the data, then fire to a smaller central cluster.

Anyway, lots of ways to tackle this of course depending on what
resources you have available to you and the sorts of failure modes you
can accept vs those you cannot.

Thanks
Joe

On Fri, Nov 20, 2015 at 4:25 AM, Venkatesh Sellappa
<vs186...@outlook.com> wrote:
> Are there any guidelines on how-to scale up/down NiFI ?
>
> (I know we don;t do autoscaling at present and nodes are independent of each
> other)
>
> The use-case is :
>
> 16,000 text files (csv, xml, json)/per minute totalling 150Gb are getting
> delivered onto a combination of FTP, S3, Local Filesystem etc. sources.
>
> These files are then ingested with some light processing onto a HDFS
> cluster.
>
> My question is : Are there any best-practices, guidelines , ideas on setting
> up a NiFI cluster for this kind of volume , throughput ?
>
>
>
>
>
> --
> View this message in context: 
> http://apache-nifi-developer-list.39713.n7.nabble.com/Capacity-Planning-Guidelines-tp5142.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Reply via email to