[jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
ZheHu created CALCITE-4896: -- Summary: GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter Key: CALCITE-4896 URL: https://issues.apache.org/jira/browse/CALCITE-4896 Project: Calcite Issue Type: Bug Components: elasticsearch-adapter Affects Versions: 1.28.0 Reporter: ZheHu In Elasticsearch Adapter, if one field is grouped by along with missing value, SQL user might get incorrect results. Take the following case as an example: {code:java} ES mappings: {"int_field1" : integer, "int_field2" : integer} doc1 = {"int_field1":1, "int_field2": -2147483648} doc1 = {"int_field1":2} {code} When I try "select count(1) as CNT from view group by int_field2", the result is 2. For Integer type, the missing value is replaced by Integer.MIN_VALUE, so doc1 and doc2 will be divided in the same group. Any other data types like short、long、float、double, they also have such problem. -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
If I am understanding correctly, the field2 value is NULL. Count(0) should be 2 and count(field2)==2 because of the way aggregates treat NULL values. On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: > ZheHu created CALCITE-4896: > -- > > Summary: GROUP BY might get wrong results under certain > circumstances in Elasticsearch Adapter > Key: CALCITE-4896 > URL: https://issues.apache.org/jira/browse/CALCITE-4896 > Project: Calcite > Issue Type: Bug > Components: elasticsearch-adapter > Affects Versions: 1.28.0 > Reporter: ZheHu > > > In Elasticsearch Adapter, if one field is grouped by along with missing > value, SQL user might get incorrect results. Take the following case as an > example: > {code:java} > ES mappings: {"int_field1" : integer, "int_field2" : integer} > > doc1 = {"int_field1":1, "int_field2": -2147483648} > doc1 = {"int_field1":2} > {code} > > When I try "select count(1) as CNT from view group by int_field2", the > result is 2. > > For Integer type, the missing value is replaced by Integer.MIN_VALUE, so > doc1 and doc2 will be divided in the same group. > > Any other data types like short、long、float、double, they also have such > problem. > > > > -- > This message was sent by Atlassian Jira > (v8.20.1#820001) >
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
Typo. Count(int_field) should be 1. Oops. On Sun, Nov 21, 2021, 6:56 AM Justin Swanhart wrote: > If I am understanding correctly, the field2 value is NULL. > > Count(0) should be 2 and count(field2)==2 because of the way aggregates > treat NULL values. > > On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: > >> ZheHu created CALCITE-4896: >> -- >> >> Summary: GROUP BY might get wrong results under certain >> circumstances in Elasticsearch Adapter >> Key: CALCITE-4896 >> URL: https://issues.apache.org/jira/browse/CALCITE-4896 >> Project: Calcite >> Issue Type: Bug >> Components: elasticsearch-adapter >> Affects Versions: 1.28.0 >> Reporter: ZheHu >> >> >> In Elasticsearch Adapter, if one field is grouped by along with missing >> value, SQL user might get incorrect results. Take the following case as an >> example: >> {code:java} >> ES mappings: {"int_field1" : integer, "int_field2" : integer} >> >> doc1 = {"int_field1":1, "int_field2": -2147483648} >> doc1 = {"int_field1":2} >> {code} >> >> When I try "select count(1) as CNT from view group by int_field2", the >> result is 2. >> >> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so >> doc1 and doc2 will be divided in the same group. >> >> Any other data types like short、long、float、double, they also have such >> problem. >> >> >> >> -- >> This message was sent by Atlassian Jira >> (v8.20.1#820001) >> >
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
Yes, when you group an Integer field in Elasticsearch Adapter, NULL and Integer.MIN_VALUE will be in the same group, which is not correct for SQL users. On 11/21/2021 19:56,Justin Swanhart wrote: If I am understanding correctly, the field2 value is NULL. Count(0) should be 2 and count(field2)==2 because of the way aggregates treat NULL values. On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: ZheHu created CALCITE-4896: -- Summary: GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter Key: CALCITE-4896 URL: https://issues.apache.org/jira/browse/CALCITE-4896 Project: Calcite Issue Type: Bug Components: elasticsearch-adapter Affects Versions: 1.28.0 Reporter: ZheHu In Elasticsearch Adapter, if one field is grouped by along with missing value, SQL user might get incorrect results. Take the following case as an example: {code:java} ES mappings: {"int_field1" : integer, "int_field2" : integer} doc1 = {"int_field1":1, "int_field2": -2147483648} doc1 = {"int_field1":2} {code} When I try "select count(1) as CNT from view group by int_field2", the result is 2. For Integer type, the missing value is replaced by Integer.MIN_VALUE, so doc1 and doc2 will be divided in the same group. Any other data types like short、long、float、double, they also have such problem. -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
I should not send emails before I have coffee. On Sun, Nov 21, 2021, 6:57 AM Justin Swanhart wrote: > Typo. Count(int_field) should be 1. Oops. > > On Sun, Nov 21, 2021, 6:56 AM Justin Swanhart wrote: > >> If I am understanding correctly, the field2 value is NULL. >> >> Count(0) should be 2 and count(field2)==2 because of the way aggregates >> treat NULL values. >> >> On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: >> >>> ZheHu created CALCITE-4896: >>> -- >>> >>> Summary: GROUP BY might get wrong results under certain >>> circumstances in Elasticsearch Adapter >>> Key: CALCITE-4896 >>> URL: https://issues.apache.org/jira/browse/CALCITE-4896 >>> Project: Calcite >>> Issue Type: Bug >>> Components: elasticsearch-adapter >>> Affects Versions: 1.28.0 >>> Reporter: ZheHu >>> >>> >>> In Elasticsearch Adapter, if one field is grouped by along with missing >>> value, SQL user might get incorrect results. Take the following case as an >>> example: >>> {code:java} >>> ES mappings: {"int_field1" : integer, "int_field2" : integer} >>> >>> doc1 = {"int_field1":1, "int_field2": -2147483648} >>> doc1 = {"int_field1":2} >>> {code} >>> >>> When I try "select count(1) as CNT from view group by int_field2", the >>> result is 2. >>> >>> For Integer type, the missing value is replaced by Integer.MIN_VALUE, so >>> doc1 and doc2 will be divided in the same group. >>> >>> Any other data types like short、long、float、double, they also have such >>> problem. >>> >>> >>> >>> -- >>> This message was sent by Atlassian Jira >>> (v8.20.1#820001) >>> >>
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
Thanks for explaining because I did not get that at all, but it makes sense. Some column stores use a reserved value for NULL too which reduces the overall range of values available. On Sun, Nov 21, 2021 at 7:09 AM Zhe Hu wrote: > Yes, when you group an Integer field in Elasticsearch Adapter, NULL and > Integer.MIN_VALUE will be in the same group, which is not correct for SQL > users. > > > On 11/21/2021 19:56,Justin Swanhart wrote: > If I am understanding correctly, the field2 value is NULL. > > Count(0) should be 2 and count(field2)==2 because of the way aggregates > treat NULL values. > > On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: > > ZheHu created CALCITE-4896: > -- > > Summary: GROUP BY might get wrong results under certain > circumstances in Elasticsearch Adapter > Key: CALCITE-4896 > URL: https://issues.apache.org/jira/browse/CALCITE-4896 > Project: Calcite > Issue Type: Bug > Components: elasticsearch-adapter > Affects Versions: 1.28.0 > Reporter: ZheHu > > > In Elasticsearch Adapter, if one field is grouped by along with missing > value, SQL user might get incorrect results. Take the following case as an > example: > {code:java} > ES mappings: {"int_field1" : integer, "int_field2" : integer} > > doc1 = {"int_field1":1, "int_field2": -2147483648} > doc1 = {"int_field1":2} > {code} > > When I try "select count(1) as CNT from view group by int_field2", the > result is 2. > > For Integer type, the missing value is replaced by Integer.MIN_VALUE, so > doc1 and doc2 will be divided in the same group. > > Any other data types like short、long、float、double, they also have such > problem. > > > > -- > This message was sent by Atlassian Jira > (v8.20.1#820001) > >
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
I do agree it’s reasonable in Calcite(however not 100% correct). And as far as I know, in Elasticsearch, reserved value for NULL is often used when people know specific data range. As for this issue, someone else may find a better solution. Are you interested? Swanhart. On 11/21/2021 21:30,Justin Swanhart wrote: Thanks for explaining because I did not get that at all, but it makes sense. Some column stores use a reserved value for NULL too which reduces the overall range of values available. On Sun, Nov 21, 2021 at 7:09 AM Zhe Hu wrote: Yes, when you group an Integer field in Elasticsearch Adapter, NULL and Integer.MIN_VALUE will be in the same group, which is not correct for SQL users. On 11/21/2021 19:56,Justin Swanhart wrote: If I am understanding correctly, the field2 value is NULL. Count(0) should be 2 and count(field2)==2 because of the way aggregates treat NULL values. On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: ZheHu created CALCITE-4896: -- Summary: GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter Key: CALCITE-4896 URL: https://issues.apache.org/jira/browse/CALCITE-4896 Project: Calcite Issue Type: Bug Components: elasticsearch-adapter Affects Versions: 1.28.0 Reporter: ZheHu In Elasticsearch Adapter, if one field is grouped by along with missing value, SQL user might get incorrect results. Take the following case as an example: {code:java} ES mappings: {"int_field1" : integer, "int_field2" : integer} doc1 = {"int_field1":1, "int_field2": -2147483648} doc1 = {"int_field1":2} {code} When I try "select count(1) as CNT from view group by int_field2", the result is 2. For Integer type, the missing value is replaced by Integer.MIN_VALUE, so doc1 and doc2 will be divided in the same group. Any other data types like short、long、float、double, they also have such problem. -- This message was sent by Atlassian Jira (v8.20.1#820001)
Re: [jira] [Created] (CALCITE-4896) GROUP BY might get wrong results under certain circumstances in Elasticsearch Adapter
I do not have enough familiarity with calcite to offer any opinions on this. It is very implementation specific and an RDBMS will expose those reserved values as NULL. On Sun, Nov 21, 2021, 8:54 AM Zhe Hu wrote: > I do agree it’s reasonable in Calcite(however not 100% correct). > And as far as I know, in Elasticsearch, reserved value for NULL is often > used when people know specific data range. > As for this issue, someone else may find a better solution. Are you > interested? Swanhart. > > > > > On 11/21/2021 21:30,Justin Swanhart wrote: > Thanks for explaining because I did not get that at all, but it makes > sense. Some column stores use a reserved value for NULL too which reduces > the overall range of values available. > > On Sun, Nov 21, 2021 at 7:09 AM Zhe Hu wrote: > > Yes, when you group an Integer field in Elasticsearch Adapter, NULL and > Integer.MIN_VALUE will be in the same group, which is not correct for SQL > users. > > > On 11/21/2021 19:56,Justin Swanhart wrote: > If I am understanding correctly, the field2 value is NULL. > > Count(0) should be 2 and count(field2)==2 because of the way aggregates > treat NULL values. > > On Fri, Nov 19, 2021, 8:21 PM ZheHu (Jira) wrote: > > ZheHu created CALCITE-4896: > -- > > Summary: GROUP BY might get wrong results under certain > circumstances in Elasticsearch Adapter > Key: CALCITE-4896 > URL: https://issues.apache.org/jira/browse/CALCITE-4896 > Project: Calcite > Issue Type: Bug > Components: elasticsearch-adapter > Affects Versions: 1.28.0 > Reporter: ZheHu > > > In Elasticsearch Adapter, if one field is grouped by along with missing > value, SQL user might get incorrect results. Take the following case as an > example: > {code:java} > ES mappings: {"int_field1" : integer, "int_field2" : integer} > > doc1 = {"int_field1":1, "int_field2": -2147483648} > doc1 = {"int_field1":2} > {code} > > When I try "select count(1) as CNT from view group by int_field2", the > result is 2. > > For Integer type, the missing value is replaced by Integer.MIN_VALUE, so > doc1 and doc2 will be divided in the same group. > > Any other data types like short、long、float、double, they also have such > problem. > > > > -- > This message was sent by Atlassian Jira > (v8.20.1#820001) > > >