[ https://issues.apache.org/jira/browse/CALCITE-4069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138874#comment-17138874 ]
Andrei Sereda edited comment on CALCITE-4069 at 6/17/20, 10:24 PM: ------------------------------------------------------------------- What would happen in the case below ? {code:sql} select count(*), sum(field1), min(field2), count(*) from foo {code} Assume {{aggregations.size() > 1}} ? was (Author: asereda): What would happen in the case below ? {code:sql} select count(*), sum(field1), min(field2), count(*) from foo {code} > Elasticsearch the result of count(*) will be added to the result list twice > when use multiple aggregations without group by > --------------------------------------------------------------------------------------------------------------------------- > > Key: CALCITE-4069 > URL: https://issues.apache.org/jira/browse/CALCITE-4069 > Project: Calcite > Issue Type: Improvement > Components: elasticsearch-adapter > Reporter: yeyonghao > Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > when you write a sql: > {code:java} > select count(*) from mytable > {code} > in: > {code:java} > org.apache.calcite.adapter.elasticsearch.ElasticsearchTable#aggregate > {code} > As an optimization, totalHit will be the result of the query for count *: > {code:java} > // elastic exposes total number of documents matching a query in > "/hits/total" path > // this can be used for simple "select count(*) from table" > final long total = res.searchHits().total().value(); > {code} > It is then added to the result set: > {code:java} > if (groupBy.isEmpty()) { > // put totals automatically for count(*) expression(s), unless they contain > group by > for (String expr : countAll) { > result.forEach(m -> m.put(expr, total)); > } > } > {code} > This is fine if there is only one count * aggregate function in SQL. > But when you write multiple aggregate functions in your SQL: > {code:java} > select count(*),sum(field1) from mytable{code} > You'll notice that the result of count * is repeatedly added to the result > set (it doesn't affect the final result, but I think it can be optimized). > The result of count * is first added to the result set along with other > aggregate function result values in the following code: > {code:java} > final List<Map<String, Object>> result = new ArrayList<>(); > if (res.aggregations() != null) { > // collect values > ElasticsearchJson.visitValueNodes(res.aggregations(), m -> { > // using 'Collectors.toMap' will trigger Java 8 bug here > Map<String, Object> newMap = new LinkedHashMap<>(); > for (String key: m.keySet()) { > newMap.put(fieldMap.getOrDefault(key, key), m.get(key)); > } > result.add(newMap); > }); > } else { > // probably no group by. add single result > result.add(new LinkedHashMap<>()); > } > {code} > The second time is added again in the following code: > {code:java} > // elastic exposes total number of documents matching a query in > "/hits/total" path > // this can be used for simple "select count(*) from table" > final long total = res.searchHits().total().value(); > if (groupBy.isEmpty()) { > // put totals automatically for count(*) expression(s), unless they contain > group by > for (String expr : countAll) { > result.forEach(m -> m.put(expr, total)); > } > } > {code} > Although such operations have no effect on the final result when there are > multiple aggregate functions, it is obvious that they can be > optimized.TotalHit is added to the result set only when groupby. isEmpty() > and aggregation. size() == 1 simultaneously satisfy: > {code:java} > if (groupBy.isEmpty() && aggregations.size() == 1) { > ... > } > {code} > This avoids repeatedly adding count * to the result set. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)