I’m second for this question. We have a similar (maybe a bit less painful) issue for KafkaIO and it would be useful to have a general strategy for such cases about how to deal with that.
> On 24 Jan 2020, at 21:54, Kenneth Knowles <[email protected]> wrote: > > Would it make sense to have different version-specialized connectors with a > common core library and common API package? > > On Fri, Jan 24, 2020 at 11:52 AM Chamikara Jayalath <[email protected] > <mailto:[email protected]>> wrote: > Thanks for the contribution. I agree with Alexey that we should try to add > any new features brought in with the new PR into existing connector instead > of trying to maintain two implementations. > > Thanks, > Cham > > On Fri, Jan 24, 2020 at 9:01 AM Alexey Romanenko <[email protected] > <mailto:[email protected]>> wrote: > Hi Ludovic, > > Thank you for working on this and sharing the details with us. This is really > great job! > > As I recall, we already have some support of Elasticsearch7 in current > ElasticsearchIO (afaik, at least they are compatible), thanks to Zhong Chen > and Etienne Chauchot, who were working on adding this [1][2] and it should be > released in Beam 2.19. > > Would you think you can leverage this in your work on adding new > Elasticsearch7 features? IMHO, supporting two different related IOs can be > quite tough task and I‘d rather raise my hand to add a new functionality into > existing IO than creating a new one, if it’s possible. > > [1] https://issues.apache.org/jira/browse/BEAM-5192 > <https://issues.apache.org/jira/browse/BEAM-5192> > [2] https://github.com/apache/beam/pull/10433 > <https://github.com/apache/beam/pull/10433> > >> On 22 Jan 2020, at 19:23, Ludovic Boutros <[email protected] >> <mailto:[email protected]>> wrote: >> >> Dear all, >> >> I have written a completely reworked Elasticsearch 7+ IO module. >> It can be found here: >> https://github.com/ludovic-boutros/beam/tree/fresh-reworked-elasticsearch-io-v7/sdks/java/io/elasticsearch7 >> >> <https://github.com/ludovic-boutros/beam/tree/fresh-reworked-elasticsearch-io-v7/sdks/java/io/elasticsearch7> >> >> This is a quite advance WIP work but I'm a quite new user of Apache Beam and >> I would like to get some help on this :) >> >> I can create a JIRA issue now but I prefer to wait for your wise avises >> first. >> >> Why a new module ? >> >> The current module was compliant with Elasticsearch 2.x, 5.x and 6.x. This >> seems to be a good point but so many things have been changed since >> Elasticsearch 2.x. > > > Probably this is not correct anymore due to > https://github.com/apache/beam/pull/10433 > <https://github.com/apache/beam/pull/10433> ? > >> Elasticsearch 7.x is now partially supported (document type are removed, >> occ, updates...). >> >> A fresh new module, only compliant with the last version of Elasticsearch, >> can easily benefit a lot from the last evolutions of Elasticsearch (Java >> High Level Http Client). >> >> It is therefore far simpler than the current one. >> >> Error management >> >> Currently, errors are caught and transformed into simple exceptions. This is >> not always what is needed. If we would like to do specific processing on >> these errors (send documents in error topics for instance), it is not >> possible with the current module. > > > Seems like this is some sort of a dead letter queue implementation.. This > will be a very good feature to add to the existing connector. > >> >> Philosophy >> >> This module directly uses the Elasticsearch Java client classes as inputs >> and outputs. >> >> This way you can configure any options you need directly in the >> `DocWriteRequest` objects. >> >> For instance: >> - If you need to use external versioning >> (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning >> >> <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning>), >> you can. >> - If you need to use an ingest pipelines, you can. >> - If you need to configure an update document/script, you can. >> - If you need to use upserts, you can. >> >> Actually, you should be able to do everything you can do directly with >> Elasticsearch. >> >> Furthermore, it should be easier to keep updating the module with future >> Elasticsearch evolutions. >> >> Write outputs >> >> Two outputs are available: >> - Successful indexing output ; >> - Failed indexing output. >> >> They are available in a `WriteResult` object. >> >> These two outputs are represented by >> `PCollection<BulkItemResponseContainer>` objects. >> >> A `BulkItemResponseContainer` contains: >> - the original index request ; >> - the Elasticsearch response ; >> - a batch id. >> >> You can apply any process afterwards (reprocessing, alerting, ...). >> >> Read input >> >> You can read documents from Elasticsearch with this module. >> You can specify a `QueryBuilder` in order to filter the retrieved documents. >> By default, it retrieves the whole document collection. >> >> If the Elasticsearch index is sharded, multiple slices can be used during >> fetch. That many bundles are created. The maximum bundle count is equal to >> the index shard count. >> >> Thank you ! >> >> Ludovic >
