[ 
https://issues.apache.org/jira/browse/BEAM-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Egbert updated BEAM-6920:
-------------------------
    Description: 
I am building a pipeline that needs to process all records in a set of indexes, 
each suffixed with a timestamp. I have an alias that matches all of these 
indexes at once. However, I cannot use the alias name in ElasticsearchIO as it 
will try to read stats from this specific index. Because it is an alias and not 
an actual index, the response contains no count for the alias name itself and 
therefore Beam (Dataflow?) will estimate the size as being 0. This makes the 
pipeline end without even executing the query on the alias, even though that 
would have returned loads of documents.

This should be easy to fix as the results of /<aliasname>/_stats only contains 
indexes references by that alias, so instead of looking for a key <aliasname> 
in the `indices` key in the returned JSON, it should consider all returned 
indexes and add the estimated sizes together.

  was:
I am building a pipeline that needs to process all records in a set of indexes, 
each suffixed with a timestamp. I have an alias that matches all of these 
indexes at once. However, I cannot use these alias name in ElasticsearchIO as 
it will try to read stats from this specific index, as it is an alias and not 
an actual index, will estimate the size as being 0. This makes the pipeline end 
without even executing the query on the alias, which will return loads of 
documents.

This should be easy to fix as the results of /<aliasname>/_stats only contains 
indexes references by that alias, so instead of looking for a key <aliasname> 
in the `indices` key in the returned JSON, it should consider all returned 
indexes and add the estimated sizes together.


> Expand aliases in ElasticsearchIO.Read
> --------------------------------------
>
>                 Key: BEAM-6920
>                 URL: https://issues.apache.org/jira/browse/BEAM-6920
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-elasticsearch
>    Affects Versions: 2.11.0
>            Reporter: Egbert
>            Priority: Major
>
> I am building a pipeline that needs to process all records in a set of 
> indexes, each suffixed with a timestamp. I have an alias that matches all of 
> these indexes at once. However, I cannot use the alias name in 
> ElasticsearchIO as it will try to read stats from this specific index. 
> Because it is an alias and not an actual index, the response contains no 
> count for the alias name itself and therefore Beam (Dataflow?) will estimate 
> the size as being 0. This makes the pipeline end without even executing the 
> query on the alias, even though that would have returned loads of documents.
> This should be easy to fix as the results of /<aliasname>/_stats only 
> contains indexes references by that alias, so instead of looking for a key 
> <aliasname> in the `indices` key in the returned JSON, it should consider all 
> returned indexes and add the estimated sizes together.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to