Hello all! So I’ve been using the ElasticSearchIO sink for a project (unfortunately it’s Elasticsearch 5.x, and so I’ve been messing around with the latest RC) and I’m finding that it doesn’t allow for changing the document ID, but only lets you pass in a record, which means that the document ID is auto-generated. See this line for what specifically is happening:
https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L838 <https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L838> Essentially the data part of the document is being placed but it doesn’t allow for other properties, such as the document ID, to be set. This leads to two problems: 1. Beam doesn’t necessarily guarantee exactly-once execution for a given item in a PCollection, as I understand it. This means that you may get more than one record in Elastic for a given item in a PCollection that you pass in. 2. You can’t do partial updates to an index. If you run a batch job once, and then run the batch job again on the same index without clearing it, you just double everything in there. Is there any good way around this? I’d be happy to try writing up a PR for this in theory, but not sure how to best approach it. Also would like to figure out a way to get around this in the meantime, if anyone has any ideas. Best, Chet P.S. CCed echauc...@gmail.com <mailto:echauc...@gmail.com> because it seems like he’s been doing work related to the elastic sink.