[ 
https://issues.apache.org/jira/browse/FLINK-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665527#comment-16665527
 ] 

Ying Xu commented on FLINK-4582:
--------------------------------

Hi [~tinder-dthomson] thanks for raising this issue up.  And sorry for the 
delay in responding to the original request. 

We actually implemented a version of the flink-dynamodbstreams connector on top 
of the existing flink-kinesis connector. The work is currently in production 
and was presented in a meetup event back in Sep.  I wasn't able to get a chance 
to contribute back because of other work priorities – my bad!  

I looked at your PR.  The use of _DynamodbProxy.getShardList()_ is interesting. 
We took a slightly different approach, which plugs in a dynamodbstreams-kinesis 
adapter object into KinesisProxy and makes it an equivalent _DynamodbProxy_ 
(approach mentioned in another thread titled *Consuming data from dynamoDB 
streams to flink*).  We rely on the assumption that during re-sharing, one can 
retrieve all the new child shard Ids based on passing the last seen shardId. 
Although Dynamodbstreams do not officially claim this, we consistently observed 
similar behavior in production during resharding. 

Other benefits of directly embedding a dynamodbstreams-kinesis adapter is to 
allow *ONE* source (consumer) to consume from multiple data streams (which is 
important for our use cases), plus other error handling in the existing 
KinesisProxy. I agree that if the _DynamodbProxy_ provides _efficient 
multi-stream_ implementation, it is an interesting direction to look into. 

If you can wait a few days, I can adapt my PR on top of the OSS flink and post 
it by early next week.  We can have more discussions at then. What do you think?

Thank you very much!

 

> Allow FlinkKinesisConsumer to adapt for AWS DynamoDB Streams
> ------------------------------------------------------------
>
>                 Key: FLINK-4582
>                 URL: https://issues.apache.org/jira/browse/FLINK-4582
>             Project: Flink
>          Issue Type: New Feature
>          Components: Kinesis Connector, Streaming Connectors
>            Reporter: Tzu-Li (Gordon) Tai
>            Assignee: Ying Xu
>            Priority: Major
>
> AWS DynamoDB is a NoSQL database service that has a CDC-like (change data 
> capture) feature called DynamoDB Streams 
> (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html),
>  which is a stream feed of item-level table activities.
> The DynamoDB Streams shard abstraction follows that of Kinesis Streams with 
> only a slight difference in resharding behaviours, so it is possible to build 
> on the internals of our Flink Kinesis Consumer for an exactly-once DynamoDB 
> Streams source.
> I propose an API something like this:
> {code}
> DataStream dynamoItemsCdc = 
>   FlinkKinesisConsumer.asDynamoDBStream(tableNames, schema, config)
> {code}
> The feature adds more connectivity to popular AWS services for Flink, and 
> combining what Flink has for exactly-once semantics, out-of-core state 
> backends, and queryable state with CDC can have very strong use cases. For 
> this feature there should only be an extra dependency to the AWS Java SDK for 
> DynamoDB, which has Apache License 2.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to