Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11535 )

Change subject: IMPALA-6661 Make NaN values equal for grouping purposes.
......................................................................


Patch Set 16:

(3 comments)

A few more comments as I learn this code.

http://gerrit.cloudera.org:8080/#/c/11535/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11535/16//COMMIT_MSG@17
PS16, Line 17:   additional logic to consider NaN values as equal.
The comment mentions equality. Is the rule that NaN = NaN is always false? But 
that Nan <=> NaN is true?

Looks like the IS DISTINCT (<=>) code is implemented in operators-ir.cc, but 
that file is not in this patch set. Does the implementation of IS DISTINCT need 
to change?


http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/runtime/raw-value.cc@33
PS16, Line 33: const float RawValue::CANONICAL_FLOAT_NAN = nanf("");
NaN is not a single bit pattern (which is why you've defined a canonical form), 
rather it is a family of bit patterns. Do the above functions guarantee to 
return the exact same bit pattern on all OSs, all versions, and all runs?

Or, should we use the LLVM trick to specify the exact bit pattern we want to 
use for NAN so that it is guaranteed to be the same on all hosts in the 
cluster, even if they use slightly different CPUs or library versions?


http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3074
PS16, Line 3074: ---- QUERY
> I've assumed that NaN tests have existed in other places.  Exhaustive valid
The thought is that we might want to ensure all paths work to handle the NaN 
case, not just the grouping case.

For example, we we have tests for all three (I believe) places that this can 
occur:

* SELECT clause: SELECT sqrt(-1) op floatCol ...
* WHERE clause: WHERE sqrt(-1) op floatCol ...
* JOIN: what is tested here.

Along with two operators

* = (NaN's are not equal)
* <=> (NaN's are equal)

We also have both the code gen and interpreted cases:

* SELECT: never code generated (when in the root fragment)
* WHERE, JOIN: code generated if enough rows, interpreted otherwise

The thought is that we might want to cover all these cases just to save the 
hassle of fixing any gaps later.



--
To view, visit http://gerrit.cloudera.org:8080/11535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Gerrit-Change-Number: 11535
Gerrit-PatchSet: 16
Gerrit-Owner: Michal Ostrowski <mostr...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: Michal Ostrowski <mostr...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com>
Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Oct 2018 18:04:58 +0000
Gerrit-HasComments: Yes

Reply via email to