[ 
https://issues.apache.org/jira/browse/IMPALA-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982782#comment-16982782
 ] 

ASF subversion and git services commented on IMPALA-6660:
---------------------------------------------------------

Commit bf031a2142db67d2c8741059d2a3ee551f02e4a4 in impala's branch 
refs/heads/master from norbert.luksa
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=bf031a2 ]

IMPALA-6660: Change -0/+0 floating point to compare as equal in hash table

Currently -0.0/+0.0 values are hashed to different values due to
their different binary representation, while -0.0==+0.0 is true in
C++. This caused them to be distinct values in hash maps despite
being treated as equal in comparisons.

This commit fixes the hashing of -0.0/+0.0, thus changing the
behaviour of hash joins and aggregations (since aggregations
follow the behaviour of the join). That way, the canonical form for
-0/+0 is changed to +0.

Tests:
 - Added e2e tests for aggregation (group by and distinct) and
   join queries with -0.0 and +0.0 present.

Change-Id: I6bb1a817c81c452d041238c19cb6c9f602a5d565
Reviewed-on: http://gerrit.cloudera.org:8080/14588
Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


> -0/+0 floating point do not compare as equal in hash table
> ----------------------------------------------------------
>
>                 Key: IMPALA-6660
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6660
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, 
> Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>            Reporter: Tim Armstrong
>            Assignee: Norbert Luksa
>            Priority: Major
>              Labels: correctness, ramp-up
>
> This can happen because we hash the binary representation of the numbers. 
> -0/0 should be treated as equal for hash joins.
> {noformat}
> [localhost:21000] > select * from (select cast("-0" as float) c1) v1,  
> (select cast("0" as float) c2) v2 where v1.c1 = v2.c2;
> Fetched 0 row(s) in 0.12s
> [localhost:21000] > select * from (select cast("0" as float) c1) v1,  (select 
> cast("0" as float) c2) v2 where v1.c1 = v2.c2;
> +----+----+
> | c1 | c2 |
> +----+----+
> | 0  | 0  |
> +----+----+
> Fetched 1 row(s) in 0.11s
> [localhost:21000] > select * from (select cast("-0" as float) c1) v1,  
> (select cast("-0" as float) c2) v2 where v1.c1 = v2.c2;
> +----+----+
> | c1 | c2 |
> +----+----+
> | -0 | -0 |
> +----+----+
> Fetched 1 row(s) in 0.11s
> {noformat}
> With aggregations, we get separate groups. I could see the argument either 
> way on whether this is the preferred behaviour for group by, since group by 
> already handles equality of NULL differently. The behaviour here is tied to 
> the behaviour in the join right now, so we should make sure to add a test for 
> this case when fixing the join.
> {noformat}
> [localhost:21000] > select distinct * from (values(cast("-0" as float)), 
> (cast("0" as float))) v;
> +---------------------+
> | cast('-0' as float) |
> +---------------------+
> | -0                  |
> | 0                   |
> +---------------------+
> {noformat}
> *Workaround*
> Casting the floating point numbers to decimal fixes the problem.
> *Proposed solution*
> The frontend could wrap floating point expressions in the hash join or hash 
> aggregation in a normalisation function that converts -0 to +0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to