[jira] [Comment Edited] (BEAM-4389) Enable updates and upserts for Elasticsearch
[ https://issues.apache.org/jira/browse/BEAM-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487245#comment-16487245 ] Tim Robertson edited comment on BEAM-4389 at 5/23/18 1:47 PM: -- I was just pondering that [~echauchot]. [edited response follows] Default behaviour when explicitly controlling the document ID is a full document upsert already (create or replace doc). This will add partial updates only. Elasticsearch also has the notion of scripted updates (useful for e.g. incrementing counters) which I don't propose we support. was (Author: timrobertson100): I was just pondering that [~echauchot] - what about something like (pseudo code): {code} // Mode being "partial update" or "document upsert", if not set default is "insert" public Write withUpdateMode(Mode mode) {code} It looks like v2 and v5 both support this. I'd favour that we only support insert (default), partial update, and document upsert (i.e. not support scripted upserts) Thanks for the input > Enable updates and upserts for Elasticsearch > > > Key: BEAM-4389 > URL: https://issues.apache.org/jira/browse/BEAM-4389 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch >Affects Versions: 2.4.0 >Reporter: Tim Robertson >Assignee: Tim Robertson >Priority: Major > > Expose a configuration option on the {{ElasticsearchIO}} to enable partial > updates rather than full document inserts. > Rationale: We have the case where different pipelines process different > categories of information of the target entity (e.g. one for taxonomic > processing, another for geospatial processing). A read and merge is not > possible inside the batch call, meaning the only way to do it is through a > join. The join approach is slow, and also stops the ability to run a single > process in isolation (e.g. reprocess the geospatial component of all docs). > Use of this configuration parameter has to be used in conjunction with > controlling the document ID (possible since BEAM-3201) to make sense. > The client API would include a {{withUseUpdate(...)}} such as: > {code} > source.apply( > ElasticsearchIO.write() > .withConnectionConfiguration(connectionConfiguration) > .withIdFn(new ExtractValueFn("id")) > .withUseUpdate(true) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (BEAM-4389) Enable updates and upserts for Elasticsearch
[ https://issues.apache.org/jira/browse/BEAM-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487245#comment-16487245 ] Tim Robertson edited comment on BEAM-4389 at 5/23/18 1:47 PM: -- I was just pondering that [~echauchot]. [edited response follows] Default behaviour when explicitly controlling the document ID is a full document upsert already (create or replace doc). This will add partial updates only. Elasticsearch also has the notion of scripted updates (useful for e.g. incrementing counters) which I don't propose we support. Thanks for the input was (Author: timrobertson100): I was just pondering that [~echauchot]. [edited response follows] Default behaviour when explicitly controlling the document ID is a full document upsert already (create or replace doc). This will add partial updates only. Elasticsearch also has the notion of scripted updates (useful for e.g. incrementing counters) which I don't propose we support. > Enable updates and upserts for Elasticsearch > > > Key: BEAM-4389 > URL: https://issues.apache.org/jira/browse/BEAM-4389 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch >Affects Versions: 2.4.0 >Reporter: Tim Robertson >Assignee: Tim Robertson >Priority: Major > > Expose a configuration option on the {{ElasticsearchIO}} to enable partial > updates rather than full document inserts. > Rationale: We have the case where different pipelines process different > categories of information of the target entity (e.g. one for taxonomic > processing, another for geospatial processing). A read and merge is not > possible inside the batch call, meaning the only way to do it is through a > join. The join approach is slow, and also stops the ability to run a single > process in isolation (e.g. reprocess the geospatial component of all docs). > Use of this configuration parameter has to be used in conjunction with > controlling the document ID (possible since BEAM-3201) to make sense. > The client API would include a {{withUseUpdate(...)}} such as: > {code} > source.apply( > ElasticsearchIO.write() > .withConnectionConfiguration(connectionConfiguration) > .withIdFn(new ExtractValueFn("id")) > .withUseUpdate(true) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)