[ https://issues.apache.org/jira/browse/IMPALA-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982782#comment-16982782 ]
ASF subversion and git services commented on IMPALA-6660: --------------------------------------------------------- Commit bf031a2142db67d2c8741059d2a3ee551f02e4a4 in impala's branch refs/heads/master from norbert.luksa [ https://gitbox.apache.org/repos/asf?p=impala.git;h=bf031a2 ] IMPALA-6660: Change -0/+0 floating point to compare as equal in hash table Currently -0.0/+0.0 values are hashed to different values due to their different binary representation, while -0.0==+0.0 is true in C++. This caused them to be distinct values in hash maps despite being treated as equal in comparisons. This commit fixes the hashing of -0.0/+0.0, thus changing the behaviour of hash joins and aggregations (since aggregations follow the behaviour of the join). That way, the canonical form for -0/+0 is changed to +0. Tests: - Added e2e tests for aggregation (group by and distinct) and join queries with -0.0 and +0.0 present. Change-Id: I6bb1a817c81c452d041238c19cb6c9f602a5d565 Reviewed-on: http://gerrit.cloudera.org:8080/14588 Reviewed-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> > -0/+0 floating point do not compare as equal in hash table > ---------------------------------------------------------- > > Key: IMPALA-6660 > URL: https://issues.apache.org/jira/browse/IMPALA-6660 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, > Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0 > Reporter: Tim Armstrong > Assignee: Norbert Luksa > Priority: Major > Labels: correctness, ramp-up > > This can happen because we hash the binary representation of the numbers. > -0/0 should be treated as equal for hash joins. > {noformat} > [localhost:21000] > select * from (select cast("-0" as float) c1) v1, > (select cast("0" as float) c2) v2 where v1.c1 = v2.c2; > Fetched 0 row(s) in 0.12s > [localhost:21000] > select * from (select cast("0" as float) c1) v1, (select > cast("0" as float) c2) v2 where v1.c1 = v2.c2; > +----+----+ > | c1 | c2 | > +----+----+ > | 0 | 0 | > +----+----+ > Fetched 1 row(s) in 0.11s > [localhost:21000] > select * from (select cast("-0" as float) c1) v1, > (select cast("-0" as float) c2) v2 where v1.c1 = v2.c2; > +----+----+ > | c1 | c2 | > +----+----+ > | -0 | -0 | > +----+----+ > Fetched 1 row(s) in 0.11s > {noformat} > With aggregations, we get separate groups. I could see the argument either > way on whether this is the preferred behaviour for group by, since group by > already handles equality of NULL differently. The behaviour here is tied to > the behaviour in the join right now, so we should make sure to add a test for > this case when fixing the join. > {noformat} > [localhost:21000] > select distinct * from (values(cast("-0" as float)), > (cast("0" as float))) v; > +---------------------+ > | cast('-0' as float) | > +---------------------+ > | -0 | > | 0 | > +---------------------+ > {noformat} > *Workaround* > Casting the floating point numbers to decimal fixes the problem. > *Proposed solution* > The frontend could wrap floating point expressions in the hash join or hash > aggregation in a normalisation function that converts -0 to +0. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org