[jira] [Commented] (CALCITE-4069) Elasticsearch the result of count(*) will be added to the result list twice when use multiple aggregations without group by

Andrei Sereda (Jira) Wed, 17 Jun 2020 15:25:16 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138874#comment-17138874
 ]


Andrei Sereda commented on CALCITE-4069:
----------------------------------------

What would happen in the case below ? 

{code:sql}
select count(*), sum(field1),  min(field2), count(*) from foo
{code}

> Elasticsearch the result of count(*) will be added to the result list twice 
> when use multiple aggregations without group by
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-4069
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4069
>             Project: Calcite
>          Issue Type: Improvement
>          Components: elasticsearch-adapter
>            Reporter: yeyonghao
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> when you write a sql:
> {code:java}
> select count(*) from mytable
> {code}
> in:
> {code:java}
> org.apache.calcite.adapter.elasticsearch.ElasticsearchTable#aggregate
> {code}
> As an optimization, totalHit will be the result of the query for count *:
> {code:java}
> // elastic exposes total number of documents matching a query in 
> "/hits/total" path
> // this can be used for simple "select count(*) from table"
> final long total = res.searchHits().total().value();
> {code}
> It is then added to the result set:
> {code:java}
> if (groupBy.isEmpty()) {
>   // put totals automatically for count(*) expression(s), unless they contain 
> group by
>   for (String expr : countAll) {
>     result.forEach(m -> m.put(expr, total));
>   }
> }
> {code}
> This is fine if there is only one count * aggregate function in SQL.
> But when you write multiple aggregate functions in your SQL:
> {code:java}
> select count(*),sum(field1) from mytable{code}
> You'll notice that the result of count * is repeatedly added to the result 
> set (it doesn't affect the final result, but I think it can be optimized).
> The result of count * is first added to the result set along with other 
> aggregate function result values in the following code:
> {code:java}
> final List<Map<String, Object>> result = new ArrayList<>();
> if (res.aggregations() != null) {
>   // collect values
>   ElasticsearchJson.visitValueNodes(res.aggregations(), m -> {
>     // using 'Collectors.toMap' will trigger Java 8 bug here
>     Map<String, Object> newMap = new LinkedHashMap<>();
>     for (String key: m.keySet()) {
>       newMap.put(fieldMap.getOrDefault(key, key), m.get(key));
>     }
>     result.add(newMap);
>   });
> } else {
>   // probably no group by. add single result
>   result.add(new LinkedHashMap<>());
> }
> {code}
> The second time is added again in the following code:
> {code:java}
> // elastic exposes total number of documents matching a query in 
> "/hits/total" path
> // this can be used for simple "select count(*) from table"
> final long total = res.searchHits().total().value();
> if (groupBy.isEmpty()) {
>   // put totals automatically for count(*) expression(s), unless they contain 
> group by
>   for (String expr : countAll) {
>     result.forEach(m -> m.put(expr, total));
>   }
> }
> {code}
> Although such operations have no effect on the final result when there are 
> multiple aggregate functions, it is obvious that they can be 
> optimized.TotalHit is added to the result set only when groupby. isEmpty() 
> and aggregation. size() == 1 simultaneously satisfy:
> {code:java}
> if (groupBy.isEmpty() && aggregations.size() == 1) {
>    ...
> }
> {code}
> This avoids repeatedly adding count * to the result set.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-4069) Elasticsearch the result of count(*) will be added to the result list twice when use multiple aggregations without group by

Reply via email to