[jira] [Comment Edited] (BEAM-4389) Enable updates and upserts for Elasticsearch

2018-05-23 Thread Tim Robertson (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487245#comment-16487245
 ] 

Tim Robertson edited comment on BEAM-4389 at 5/23/18 1:47 PM:
--

I was just pondering that [~echauchot].
[edited response follows]

Default behaviour when explicitly controlling the document ID is a full 
document upsert already (create or replace doc). This will add partial updates 
only.

Elasticsearch also has the notion of scripted updates (useful for e.g. 
incrementing counters) which I don't propose we support.



was (Author: timrobertson100):
I was just pondering that [~echauchot] - what about something like (pseudo 
code):

{code}
// Mode being "partial update" or "document upsert", if not set default is 
"insert"
public Write withUpdateMode(Mode mode)
{code}

It looks like v2 and v5 both support this. I'd favour that we only support 
insert (default), partial update, and document upsert  (i.e. not support 
scripted upserts)

Thanks for the input
 

> Enable updates and upserts for Elasticsearch
> 
>
> Key: BEAM-4389
> URL: https://issues.apache.org/jira/browse/BEAM-4389
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Affects Versions: 2.4.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
>
> Expose a configuration option on the {{ElasticsearchIO}} to enable partial 
> updates rather than full document inserts. 
> Rationale: We have the case where different pipelines process different 
> categories of information of the target entity (e.g. one for taxonomic 
> processing, another for geospatial processing). A read and merge is not 
> possible inside the batch call, meaning the only way to do it is through a 
> join. The join approach is slow, and also stops the ability to run a single 
> process in isolation (e.g. reprocess the geospatial component of all docs).
> Use of this configuration parameter has to be used in conjunction with 
> controlling the document ID (possible since BEAM-3201) to make sense.
> The client API would include a {{withUseUpdate(...)}} such as:
> {code}
> source.apply(
>   ElasticsearchIO.write()
> .withConnectionConfiguration(connectionConfiguration)
> .withIdFn(new ExtractValueFn("id"))
> .withUseUpdate(true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-4389) Enable updates and upserts for Elasticsearch

2018-05-23 Thread Tim Robertson (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487245#comment-16487245
 ] 

Tim Robertson edited comment on BEAM-4389 at 5/23/18 1:47 PM:
--

I was just pondering that [~echauchot].
[edited response follows]

Default behaviour when explicitly controlling the document ID is a full 
document upsert already (create or replace doc). This will add partial updates 
only.

Elasticsearch also has the notion of scripted updates (useful for e.g. 
incrementing counters) which I don't propose we support.

Thanks for the input



was (Author: timrobertson100):
I was just pondering that [~echauchot].
[edited response follows]

Default behaviour when explicitly controlling the document ID is a full 
document upsert already (create or replace doc). This will add partial updates 
only.

Elasticsearch also has the notion of scripted updates (useful for e.g. 
incrementing counters) which I don't propose we support.


> Enable updates and upserts for Elasticsearch
> 
>
> Key: BEAM-4389
> URL: https://issues.apache.org/jira/browse/BEAM-4389
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Affects Versions: 2.4.0
>Reporter: Tim Robertson
>Assignee: Tim Robertson
>Priority: Major
>
> Expose a configuration option on the {{ElasticsearchIO}} to enable partial 
> updates rather than full document inserts. 
> Rationale: We have the case where different pipelines process different 
> categories of information of the target entity (e.g. one for taxonomic 
> processing, another for geospatial processing). A read and merge is not 
> possible inside the batch call, meaning the only way to do it is through a 
> join. The join approach is slow, and also stops the ability to run a single 
> process in isolation (e.g. reprocess the geospatial component of all docs).
> Use of this configuration parameter has to be used in conjunction with 
> controlling the document ID (possible since BEAM-3201) to make sense.
> The client API would include a {{withUseUpdate(...)}} such as:
> {code}
> source.apply(
>   ElasticsearchIO.write()
> .withConnectionConfiguration(connectionConfiguration)
> .withIdFn(new ExtractValueFn("id"))
> .withUseUpdate(true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)