Hi Andrei, I haven't gone entirely over the new PR, but I think there are cases where the result of the queries are going to be wrong (when values collide with sentinels). Another approach would be to introduce a rule in order to push the aggregation in ElasticSearch *only* if the field in questions is *not* nullable. This can make queries a bit more verbose but it guarantees that there will be no suprises to the end-user.
select max(amount), date from orders group by amount, date -> Aggregation not pushed on ES select max(amount), date from orders where amount is not null and date is not null group by amount, date -> Aggregation pushed on ES Sorry for the late reply! Best, Stamatis Στις Κυρ, 2 Δεκ 2018 στις 2:01 π.μ., ο/η Julian Hyde <jh...@apache.org> έγραψε: > Interesting. > > One of the benefits that a SQL layer such as Calcite can bring is that it > hides the details necessary to make operations like this work. > > Julian > > > > On Dec 1, 2018, at 5:22 AM, Kevin Risden <kris...@apache.org> wrote: > > > > I haven't had a chance to review but saw that Elastic has the same issue > > with aggregations. > > > > https://github.com/elastic/elasticsearch/issues/35745 > > > > Kevin Risden > > > > On Wed, Nov 28, 2018, 20:46 Andrei Sereda <and...@sereda.cc wrote: > > > >> Greetings , > >> We have discovered an issue with ES aggregations when grouping on > >> non-textual fields (date, long). Currently the following sql fails > because > >> for missing value > >> < > >> > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13 > >>> > >> we inject __MISSING__ sentinel which is not date / number parseable (it > >> can’t be null either) : > >> > >> select max(amount), date from orders group by date -- special ES type > >> > >> The solution is to make sentinel type specific : > >> > >> 1. Integer.MIN_VALUE for integers > >> 2. 9999-12-31 for dates etc. > >> > >> For low cardinality types like boolean, byte, short I’m not sure what > to do > >> since there is high probability of collision between missing field and > >> actual value (eg. what value to choose for missing boolean?) : > >> > >> select max(amount), isActive from orders group by isActive -- boolean > type > >> > >> Let me know if you solved this problem differently before. Composite > >> aggregations > >> < > >> > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html > >>> > >> (available since 6.1) should help in future. > >> > >> PR: https://github.com/apache/calcite/pull/946 > >> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689 > >> > >> Many Thanks in Advance, > >> Andrei. > >> > >