[ https://issues.apache.org/jira/browse/IMPALA-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on IMPALA-6660 started by Norbert Luksa. --------------------------------------------- > -0/+0 floating point do not compare as equal in hash table > ---------------------------------------------------------- > > Key: IMPALA-6660 > URL: https://issues.apache.org/jira/browse/IMPALA-6660 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, > Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0 > Reporter: Tim Armstrong > Assignee: Norbert Luksa > Priority: Major > Labels: correctness, ramp-up > > This can happen because we hash the binary representation of the numbers. > -0/0 should be treated as equal for hash joins. > {noformat} > [localhost:21000] > select * from (select cast("-0" as float) c1) v1, > (select cast("0" as float) c2) v2 where v1.c1 = v2.c2; > Fetched 0 row(s) in 0.12s > [localhost:21000] > select * from (select cast("0" as float) c1) v1, (select > cast("0" as float) c2) v2 where v1.c1 = v2.c2; > +----+----+ > | c1 | c2 | > +----+----+ > | 0 | 0 | > +----+----+ > Fetched 1 row(s) in 0.11s > [localhost:21000] > select * from (select cast("-0" as float) c1) v1, > (select cast("-0" as float) c2) v2 where v1.c1 = v2.c2; > +----+----+ > | c1 | c2 | > +----+----+ > | -0 | -0 | > +----+----+ > Fetched 1 row(s) in 0.11s > {noformat} > With aggregations, we get separate groups. I could see the argument either > way on whether this is the preferred behaviour for group by, since group by > already handles equality of NULL differently. The behaviour here is tied to > the behaviour in the join right now, so we should make sure to add a test for > this case when fixing the join. > {noformat} > [localhost:21000] > select distinct * from (values(cast("-0" as float)), > (cast("0" as float))) v; > +---------------------+ > | cast('-0' as float) | > +---------------------+ > | -0 | > | 0 | > +---------------------+ > {noformat} > *Workaround* > Casting the floating point numbers to decimal fixes the problem. > *Proposed solution* > The frontend could wrap floating point expressions in the hash join or hash > aggregation in a normalisation function that converts -0 to +0. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org