[
https://issues.apache.org/jira/browse/NIFI-15669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Handermann resolved NIFI-15669.
-------------------------------------
Fix Version/s: 2.9.0
Resolution: Fixed
> Refactor ConsumeKinesis to avoid dependency on KCL
> --------------------------------------------------
>
> Key: NIFI-15669
> URL: https://issues.apache.org/jira/browse/NIFI-15669
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Major
> Fix For: 2.9.0
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> The existing ConsumeKinesis Processor depends on the Kinesis Client Library,
> which is a reasonable choice, in general. However, the Kinesis Client Library
> is very heavy-weight, brings in a lot of transitive dependencies and was
> designed with some assumptions in mind that do not align well with use in
> NiFi.
> Specifically, startup time is very long - typically 10-15 minutes for the
> first iteration as it waits for all consumers to join before allowing any
> consumer to pull data. While this makes sense for a big, long-running job
> that you might run from Spark or Flink, it is too long to be reasonable for
> NiFi and often leads to users thinking it's not working.
> More importantly, the KCL library, when using Enhanced Fan-Out Consumer,
> buffers up to 11 Kinesis Events per shard. This limit of 11 events is hard
> coded and cannot be configured. An Event is effectively a batch of records,
> which typically amounts to 1-2 MB. For a few shards and/or where NiFi is
> keeping up without issue, this works well. However, if backpressure gets
> applied we end up buffering 2 MB * 11 = 22 MB or so per shard. Many users
> have Kinesis Streams with hundreds of shards, some even with 1,000+ shards.
> With 1,000 shards that's 22 GB of data buffered in heap. And this is
> considering only the 'payload'. There is additional overhead from metadata
> and Java objects.
> We need a faster, more stable solution. The new solution can still make use
> of DynamoDB for checkpointing and support more or less the same options and
> configuration in a backward compatible manner.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)