[jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-19 Thread ZheHu (Jira)
ZheHu created CALCITE-4896:
--

 Summary: GROUP BY might get wrong results under certain 
circumstances in Elasticsearch Adapter
 Key: CALCITE-4896
 URL: https://issues.apache.org/jira/browse/CALCITE-4896
 Project: Calcite
  Issue Type: Bug
  Components: elasticsearch-adapter
Affects Versions: 1.28.0
Reporter: ZheHu


In Elasticsearch Adapter, if one field is grouped by along with missing value, 
SQL user might get incorrect results. Take the following case as an example:
{code:java}
ES mappings: {"int_field1" : integer, "int_field2" : integer}

doc1 = {"int_field1":1, "int_field2": -2147483648}
doc1 = {"int_field1":2}
{code}

When I try "select count(1) as CNT from view group by int_field2", the result 
is 2.

For Integer type, the missing value is replaced by Integer.MIN_VALUE, so doc1 
and doc2 will be divided in the same group.

Any other data types like short、long、float、double, they also have such problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Justin Swanhart
If I am understanding correctly, the field2 value is NULL.

Count(0) should be 2 and count(field2)==2 because of the way aggregates
treat NULL values.

On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:

> ZheHu created CALCITE-4896:
> --
>
>  Summary: GROUP BY might get wrong results under certain
> circumstances in Elasticsearch Adapter
>  Key: CALCITE-4896
>  URL: https://issues.apache.org/jira/browse/CALCITE-4896
>  Project: Calcite
>   Issue Type: Bug
>   Components: elasticsearch-adapter
> Affects Versions: 1.28.0
> Reporter: ZheHu
>
>
> In Elasticsearch Adapter, if one field is grouped by along with missing
> value, SQL user might get incorrect results. Take the following case as an
> example:
> {code:java}
> ES mappings: {"int_field1" : integer, "int_field2" : integer}
>
> doc1 = {"int_field1":1, "int_field2": -2147483648}
> doc1 = {"int_field1":2}
> {code}
>
> When I try "select count(1) as CNT from view group by int_field2", the
> result is 2.
>
> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
> doc1 and doc2 will be divided in the same group.
>
> Any other data types like short、long、float、double, they also have such
> problem.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.1#820001)
>


Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Justin Swanhart
Typo.  Count(int_field) should be 1.  Oops.

On Sun, Nov 21, 2021, 6:56 AM Justin Swanhart  wrote:

> If I am understanding correctly, the field2 value is NULL.
>
> Count(0) should be 2 and count(field2)==2 because of the way aggregates
> treat NULL values.
>
> On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:
>
>> ZheHu created CALCITE-4896:
>> --
>>
>>  Summary: GROUP BY might get wrong results under certain
>> circumstances in Elasticsearch Adapter
>>  Key: CALCITE-4896
>>  URL: https://issues.apache.org/jira/browse/CALCITE-4896
>>  Project: Calcite
>>   Issue Type: Bug
>>   Components: elasticsearch-adapter
>> Affects Versions: 1.28.0
>> Reporter: ZheHu
>>
>>
>> In Elasticsearch Adapter, if one field is grouped by along with missing
>> value, SQL user might get incorrect results. Take the following case as an
>> example:
>> {code:java}
>> ES mappings: {"int_field1" : integer, "int_field2" : integer}
>>
>> doc1 = {"int_field1":1, "int_field2": -2147483648}
>> doc1 = {"int_field1":2}
>> {code}
>>
>> When I try "select count(1) as CNT from view group by int_field2", the
>> result is 2.
>>
>> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
>> doc1 and doc2 will be divided in the same group.
>>
>> Any other data types like short、long、float、double, they also have such
>> problem.
>>
>>
>>
>> --
>> This message was sent by Atlassian Jira
>> (v8.20.1#820001)
>>
>


Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Zhe Hu
Yes, when you group an Integer field in Elasticsearch Adapter, NULL and 
Integer.MIN_VALUE will be in the same group, which is not correct for SQL users.


On 11/21/2021 19:56,Justin Swanhart wrote:
If I am understanding correctly, the field2 value is NULL.

Count(0) should be 2 and count(field2)==2 because of the way aggregates
treat NULL values.

On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:

ZheHu created CALCITE-4896:
--

Summary: GROUP BY might get wrong results under certain
circumstances in Elasticsearch Adapter
Key: CALCITE-4896
URL: https://issues.apache.org/jira/browse/CALCITE-4896
Project: Calcite
Issue Type: Bug
Components: elasticsearch-adapter
Affects Versions: 1.28.0
Reporter: ZheHu


In Elasticsearch Adapter, if one field is grouped by along with missing
value, SQL user might get incorrect results. Take the following case as an
example:
{code:java}
ES mappings: {"int_field1" : integer, "int_field2" : integer}

doc1 = {"int_field1":1, "int_field2": -2147483648}
doc1 = {"int_field1":2}
{code}

When I try "select count(1) as CNT from view group by int_field2", the
result is 2.

For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
doc1 and doc2 will be divided in the same group.

Any other data types like short、long、float、double, they also have such
problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)



Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Justin Swanhart
I should not send emails before I have coffee.

On Sun, Nov 21, 2021, 6:57 AM Justin Swanhart  wrote:

> Typo.  Count(int_field) should be 1.  Oops.
>
> On Sun, Nov 21, 2021, 6:56 AM Justin Swanhart  wrote:
>
>> If I am understanding correctly, the field2 value is NULL.
>>
>> Count(0) should be 2 and count(field2)==2 because of the way aggregates
>> treat NULL values.
>>
>> On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:
>>
>>> ZheHu created CALCITE-4896:
>>> --
>>>
>>>  Summary: GROUP BY might get wrong results under certain
>>> circumstances in Elasticsearch Adapter
>>>  Key: CALCITE-4896
>>>  URL: https://issues.apache.org/jira/browse/CALCITE-4896
>>>  Project: Calcite
>>>   Issue Type: Bug
>>>   Components: elasticsearch-adapter
>>> Affects Versions: 1.28.0
>>> Reporter: ZheHu
>>>
>>>
>>> In Elasticsearch Adapter, if one field is grouped by along with missing
>>> value, SQL user might get incorrect results. Take the following case as an
>>> example:
>>> {code:java}
>>> ES mappings: {"int_field1" : integer, "int_field2" : integer}
>>>
>>> doc1 = {"int_field1":1, "int_field2": -2147483648}
>>> doc1 = {"int_field1":2}
>>> {code}
>>>
>>> When I try "select count(1) as CNT from view group by int_field2", the
>>> result is 2.
>>>
>>> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
>>> doc1 and doc2 will be divided in the same group.
>>>
>>> Any other data types like short、long、float、double, they also have such
>>> problem.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian Jira
>>> (v8.20.1#820001)
>>>
>>


Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Justin Swanhart
Thanks for explaining because I did not get that at all, but it makes
sense.  Some column stores use a reserved value for NULL too which reduces
the overall range of values available.

On Sun, Nov 21, 2021 at 7:09 AM Zhe Hu  wrote:

> Yes, when you group an Integer field in Elasticsearch Adapter, NULL and
> Integer.MIN_VALUE will be in the same group, which is not correct for SQL
> users.
>
>
> On 11/21/2021 19:56,Justin Swanhart wrote:
> If I am understanding correctly, the field2 value is NULL.
>
> Count(0) should be 2 and count(field2)==2 because of the way aggregates
> treat NULL values.
>
> On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:
>
> ZheHu created CALCITE-4896:
> --
>
> Summary: GROUP BY might get wrong results under certain
> circumstances in Elasticsearch Adapter
> Key: CALCITE-4896
> URL: https://issues.apache.org/jira/browse/CALCITE-4896
> Project: Calcite
> Issue Type: Bug
> Components: elasticsearch-adapter
> Affects Versions: 1.28.0
> Reporter: ZheHu
>
>
> In Elasticsearch Adapter, if one field is grouped by along with missing
> value, SQL user might get incorrect results. Take the following case as an
> example:
> {code:java}
> ES mappings: {"int_field1" : integer, "int_field2" : integer}
>
> doc1 = {"int_field1":1, "int_field2": -2147483648}
> doc1 = {"int_field1":2}
> {code}
>
> When I try "select count(1) as CNT from view group by int_field2", the
> result is 2.
>
> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
> doc1 and doc2 will be divided in the same group.
>
> Any other data types like short、long、float、double, they also have such
> problem.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.1#820001)
>
>


Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Zhe Hu
I do agree it’s reasonable in Calcite(however not 100% correct).
And as far as I know, in Elasticsearch, reserved value for NULL is often used 
when people know specific data range.
As for this issue, someone else may find a better solution. Are you interested? 
Swanhart.




On 11/21/2021 21:30,Justin Swanhart wrote:
Thanks for explaining because I did not get that at all, but it makes
sense.  Some column stores use a reserved value for NULL too which reduces
the overall range of values available.

On Sun, Nov 21, 2021 at 7:09 AM Zhe Hu  wrote:

Yes, when you group an Integer field in Elasticsearch Adapter, NULL and
Integer.MIN_VALUE will be in the same group, which is not correct for SQL
users.


On 11/21/2021 19:56,Justin Swanhart wrote:
If I am understanding correctly, the field2 value is NULL.

Count(0) should be 2 and count(field2)==2 because of the way aggregates
treat NULL values.

On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:

ZheHu created CALCITE-4896:
--

Summary: GROUP BY might get wrong results under certain
circumstances in Elasticsearch Adapter
Key: CALCITE-4896
URL: https://issues.apache.org/jira/browse/CALCITE-4896
Project: Calcite
Issue Type: Bug
Components: elasticsearch-adapter
Affects Versions: 1.28.0
Reporter: ZheHu


In Elasticsearch Adapter, if one field is grouped by along with missing
value, SQL user might get incorrect results. Take the following case as an
example:
{code:java}
ES mappings: {"int_field1" : integer, "int_field2" : integer}

doc1 = {"int_field1":1, "int_field2": -2147483648}
doc1 = {"int_field1":2}
{code}

When I try "select count(1) as CNT from view group by int_field2", the
result is 2.

For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
doc1 and doc2 will be divided in the same group.

Any other data types like short、long、float、double, they also have such
problem.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)




Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter

2021-11-21 Thread Justin Swanhart
I do not have enough familiarity with calcite to offer any opinions on
this.  It is very implementation specific and an RDBMS will expose those
reserved values as NULL.

On Sun, Nov 21, 2021, 8:54 AM Zhe Hu  wrote:

> I do agree it’s reasonable in Calcite(however not 100% correct).
> And as far as I know, in Elasticsearch, reserved value for NULL is often
> used when people know specific data range.
> As for this issue, someone else may find a better solution. Are you
> interested? Swanhart.
>
>
>
>
> On 11/21/2021 21:30,Justin Swanhart wrote:
> Thanks for explaining because I did not get that at all, but it makes
> sense.  Some column stores use a reserved value for NULL too which reduces
> the overall range of values available.
>
> On Sun, Nov 21, 2021 at 7:09 AM Zhe Hu  wrote:
>
> Yes, when you group an Integer field in Elasticsearch Adapter, NULL and
> Integer.MIN_VALUE will be in the same group, which is not correct for SQL
> users.
>
>
> On 11/21/2021 19:56,Justin Swanhart wrote:
> If I am understanding correctly, the field2 value is NULL.
>
> Count(0) should be 2 and count(field2)==2 because of the way aggregates
> treat NULL values.
>
> On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira)  wrote:
>
> ZheHu created CALCITE-4896:
> --
>
> Summary: GROUP BY might get wrong results under certain
> circumstances in Elasticsearch Adapter
> Key: CALCITE-4896
> URL: https://issues.apache.org/jira/browse/CALCITE-4896
> Project: Calcite
> Issue Type: Bug
> Components: elasticsearch-adapter
> Affects Versions: 1.28.0
> Reporter: ZheHu
>
>
> In Elasticsearch Adapter, if one field is grouped by along with missing
> value, SQL user might get incorrect results. Take the following case as an
> example:
> {code:java}
> ES mappings: {"int_field1" : integer, "int_field2" : integer}
>
> doc1 = {"int_field1":1, "int_field2": -2147483648}
> doc1 = {"int_field1":2}
> {code}
>
> When I try "select count(1) as CNT from view group by int_field2", the
> result is 2.
>
> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so
> doc1 and doc2 will be divided in the same group.
>
> Any other data types like short、long、float、double, they also have such
> problem.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.1#820001)
>
>
>