Dariusz Seweryn created NIFI-14359:
--------------------------------------

             Summary: ConsumeKinesisStream multi-stream tracking
                 Key: NIFI-14359
                 URL: https://issues.apache.org/jira/browse/NIFI-14359
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Extensions
            Reporter: Dariusz Seweryn


Currently ConsumeKinesisStream (CKS) allows for a single stream tracking. 
Scaling to multiple streams needs introduction of separate processors. With 
multi-stream tracking this problem would be solved.

We would migrate the property "Amazon Kinesis Stream Name" to make it plural. 
The CKS would need to add new attributes to emitted FlowFiles for granular 
decision making based on stream, shard or both:
 * the "stream" name
 * a "stream+shardId" for partitioning in connections

The downside of this change is breaking backwards compatibility, see [official 
documentation:|https://docs.aws.amazon.com/streams/latest/dev/kcl-multi-stream.html]
{quote}
h6. Important
When your existing KCL consumer application is configured to process only one 
data stream, the {{leaseKey}} (which is the partition key for the lease table) 
is the shard ID. If you reconfigure an existing KCL consumer application to 
process multiple data streams, it breaks your lease table, because the 
{{leaseKey}} structure must be as follows: 
{{account-id:StreamName:StreamCreationTimestamp:ShardId}} to support 
multi-stream.
{quote}
While it is possible to keep using single stream tracking if a single stream 
name is defined, adding a second stream name would still make it break. Less 
surprising behavior is to introduce the breaking change on processor version 
change.
Automatic migration could potentially be possible given CKS would still be 
configured with a single stream and it could fetch account-id and 
StreamCreationTimestamp.

Things to consider:
 * Setting initial stream position per stream — may be a separate work item



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to