> we decided to postpone the feature That makes sense.
I believe the ES6 branch is in-part working (I've looked at the code but not used it) which you can see here [1] and the jira to watch or contribute is [2]. It would be a useful addition to test independently and report any observations or improvement requests on that jira. The offer to assist in your first PR remains open for the future - please don't hesitate to ask. Thanks, Tim [1] https://github.com/jsteggink/beam/tree/BEAM-3199/sdks/java/io/elasticsearch-6/src/main/java/org/apache/beam/sdk/io/elasticsearch [2] https://issues.apache.org/jira/browse/BEAM-3199 On Mon, Jul 30, 2018 at 10:55 AM, Wout Scheepers < [email protected]> wrote: > Hey Tim, > > > > Thanks for your proposal to mentor me through my first PR. > > As we’re definitely planning to upgrade to ES6 when Beam supports it, we > decided to postpone the feature (we have a fix that works for us, for now). > > When Beam supports ES6, I’ll be happy to make a contribution to get bulk > deletes working. > > > > For reference, I opened a ticket (https://issues.apache.org/ > jira/browse/BEAM-5042). > > > > Cheers, > > Wout > > > > > > *From: *Tim Robertson <[email protected]> > *Reply-To: *"[email protected]" <[email protected]> > *Date: *Friday, 27 July 2018 at 17:43 > *To: *"[email protected]" <[email protected]> > *Subject: *Re: ElasticsearchIO bulk delete > > > > Hi Wout, > > > > This is great, thank you. I wrote the partial update support you reference > and I'll be happy to mentor you through your first PR - welcome aboard. Can > you please open a Jira to reference this work and we'll assign it to you? > > > > We discussed having the "_xxx" fields in the document and triggering > actions based on that in the partial update jira but opted to avoid > it. Based on that discussion the ActionFn would likely be the preferred > approach. Would that be possible? > > > > It will be important to provide unit and integration tests as well. > > > > Please be aware that there is a branch and work underway for ES6 already > which is rather different on the write() path so this may become redundant > rather quickly. > > > > Thanks, > > Tim > > > > @timrobertson100 on the Beam slack channel > > > > > > > > On Fri, Jul 27, 2018 at 2:53 PM, Wout Scheepers <Wout.Scheepers@vente- > exclusive.com> wrote: > > Hey all, > > > > A while ago, I patched ElasticsearchIO to be able to do partial updates > and deletes. > > However, I did not consider my patch pull-request-worthy as the json > parsing was done inefficient (parsed it twice per document). > > > > Since Beam 2.5.0 partial updates are supported, so the only thing I’m > missing is the ability to send bulk *delete* requests. > > We’re using entity updates for event sourcing in our data lake and need to > persist deleted entities in elastic. > > We’ve been using my patch in production for the last year, but I would > like to contribute to get the functionality we need into one of the next > releases. > > > > I’ve created a gist that works for me, but is still inefficient (parsing > twice: once to check the ‘_action` field, once to get the metadata). > > Each document I want to delete needs an additional ‘_action’ field with > the value ‘delete’. It doesn’t matter the document still contains the > redundant field, as the delete action only requires the metadata. > > I’ve added the method isDelete() and made some changes to the > processElement() method. > > https://gist.github.com/wscheep/26cca4bda0145ffd38faf7efaf2c21b9 > > > > I would like to make my solution more generic to fit into the current > ElasticsearchIO and create a proper pull request. > > As this would be my first pull request for beam, can anyone point me in > the right direction before I spent too much time creating something that > will be rejected? > > > > Some questions on the top of my mind are: > > - Is it a good idea it to make the ‘action’ part for the bulk api > generic? > - Should it be even more generic? (e.g.: set an ‘ActionFn’ on the > ElasticsearchIO) > - If I want to avoid parsing twice, the parsing should be done outside > of the getDocumentMetaData() method. Would this be acceptable? > - Is it possible to avoid passing the action as a field in the > document? > - Is there another or better way to get the delete functionality in > general? > > > > All feedback is more than welcome. > > > Cheers, > Wout > > > > > > >
