Interesting.

One of the benefits that a SQL layer such as Calcite can bring is that it hides 
the details necessary to make operations like this work.

Julian


> On Dec 1, 2018, at 5:22 AM, Kevin Risden <kris...@apache.org> wrote:
> 
> I haven't had a chance to review but saw that Elastic has the same issue
> with aggregations.
> 
> https://github.com/elastic/elasticsearch/issues/35745
> 
> Kevin Risden
> 
> On Wed, Nov 28, 2018, 20:46 Andrei Sereda <and...@sereda.cc wrote:
> 
>> Greetings ,
>> We have discovered an issue with ES aggregations when grouping on
>> non-textual fields (date, long). Currently the following sql fails because
>> for missing value
>> <
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13
>>> 
>> we inject __MISSING__ sentinel which is not date / number parseable (it
>> can’t be null either) :
>> 
>> select max(amount), date from orders group by date -- special ES type
>> 
>> The solution is to make sentinel type specific :
>> 
>>   1. Integer.MIN_VALUE for integers
>>   2. 9999-12-31 for dates etc.
>> 
>> For low cardinality types like boolean, byte, short I’m not sure what to do
>> since there is high probability of collision between missing field and
>> actual value (eg. what value to choose for missing boolean?) :
>> 
>> select max(amount), isActive from orders group by isActive -- boolean type
>> 
>> Let me know if you solved this problem differently before. Composite
>> aggregations
>> <
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
>>> 
>> (available since 6.1) should help in future.
>> 
>> PR: https://github.com/apache/calcite/pull/946
>> JIRA: https://issues.apache.org/jira/browse/CALCITE-2689
>> 
>> Many Thanks in Advance,
>> Andrei.
>> 

Reply via email to