[ https://issues.apache.org/jira/browse/BEAM-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487245#comment-16487245 ]
Tim Robertson edited comment on BEAM-4389 at 5/23/18 1:47 PM: -------------------------------------------------------------- I was just pondering that [~echauchot]. [edited response follows] Default behaviour when explicitly controlling the document ID is a full document upsert already (create or replace doc). This will add partial updates only. Elasticsearch also has the notion of scripted updates (useful for e.g. incrementing counters) which I don't propose we support. Thanks for the input was (Author: timrobertson100): I was just pondering that [~echauchot]. [edited response follows] Default behaviour when explicitly controlling the document ID is a full document upsert already (create or replace doc). This will add partial updates only. Elasticsearch also has the notion of scripted updates (useful for e.g. incrementing counters) which I don't propose we support. > Enable updates and upserts for Elasticsearch > -------------------------------------------- > > Key: BEAM-4389 > URL: https://issues.apache.org/jira/browse/BEAM-4389 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch > Affects Versions: 2.4.0 > Reporter: Tim Robertson > Assignee: Tim Robertson > Priority: Major > > Expose a configuration option on the {{ElasticsearchIO}} to enable partial > updates rather than full document inserts. > Rationale: We have the case where different pipelines process different > categories of information of the target entity (e.g. one for taxonomic > processing, another for geospatial processing). A read and merge is not > possible inside the batch call, meaning the only way to do it is through a > join. The join approach is slow, and also stops the ability to run a single > process in isolation (e.g. reprocess the geospatial component of all docs). > Use of this configuration parameter has to be used in conjunction with > controlling the document ID (possible since BEAM-3201) to make sense. > The client API would include a {{withUseUpdate(...)}} such as: > {code} > source.apply( > ElasticsearchIO.write() > .withConnectionConfiguration(connectionConfiguration) > .withIdFn(new ExtractValueFn("id")) > .withUseUpdate(true) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)