Dariusz Seweryn created NIFI-14359:
--------------------------------------
Summary: ConsumeKinesisStream multi-stream tracking
Key: NIFI-14359
URL: https://issues.apache.org/jira/browse/NIFI-14359
Project: Apache NiFi
Issue Type: Improvement
Components: Extensions
Reporter: Dariusz Seweryn
Currently ConsumeKinesisStream (CKS) allows for a single stream tracking.
Scaling to multiple streams needs introduction of separate processors. With
multi-stream tracking this problem would be solved.
We would migrate the property "Amazon Kinesis Stream Name" to make it plural.
The CKS would need to add new attributes to emitted FlowFiles for granular
decision making based on stream, shard or both:
* the "stream" name
* a "stream+shardId" for partitioning in connections
The downside of this change is breaking backwards compatibility, see [official
documentation:|https://docs.aws.amazon.com/streams/latest/dev/kcl-multi-stream.html]
{quote}
h6. Important
When your existing KCL consumer application is configured to process only one
data stream, the {{leaseKey}} (which is the partition key for the lease table)
is the shard ID. If you reconfigure an existing KCL consumer application to
process multiple data streams, it breaks your lease table, because the
{{leaseKey}} structure must be as follows:
{{account-id:StreamName:StreamCreationTimestamp:ShardId}} to support
multi-stream.
{quote}
While it is possible to keep using single stream tracking if a single stream
name is defined, adding a second stream name would still make it break. Less
surprising behavior is to introduce the breaking change on processor version
change.
Automatic migration could potentially be possible given CKS would still be
configured with a single stream and it could fetch account-id and
StreamCreationTimestamp.
Things to consider:
* Setting initial stream position per stream — may be a separate work item
--
This message was sent by Atlassian Jira
(v8.20.10#820010)