Hi, I am developing a beam job to sink mutable data to dynamodb. I found that DynamoDBIO will throw an error if multiple write requests to the same key are made in a short time.
DynamoDBIO.Write uses the batchWriteItem method from the AWS SDK to sink items, and there is a limitation in the AWS SDK that a call to batchWriteItem cannot contain duplicate keys. Currently DynamoDBIO.Write performs no key deduplication before flushing a batch, which could cause "ValidationException: Provided list of item keys contains duplicates", if consecutive updates to a single key is within the batch size (currently hardcoded to be 25). I have created an issue on JIRA at https://issues.apache.org/jira/browse/BEAM-10706?jql=text%20~%20%22dynamodbio%22 AWS support team confirmed to me that the Java SDK for dynamodb does not currently handle deduplication. Taking reference from the Python sdk boto3, which supports this, I modified the code of DynamoDBIO, which then solved the problem for my application. The change is applied to 2.23.0, where I also modified the test and ran it successfully. Shall I apply the change to master and then create a PR? However I just changed the ver1 aws module but not the ver2 one, plus I haven't submitted a PR before so may need some guidance (I have read the contribution guide though). Thanks!