Paul Rogers has posted comments on this change. ( http://gerrit.cloudera.org:8080/11535 )
Change subject: IMPALA-6661 Make NaN values equal for grouping purposes. ...................................................................... Patch Set 16: (3 comments) A few more comments as I learn this code. http://gerrit.cloudera.org:8080/#/c/11535/16//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11535/16//COMMIT_MSG@17 PS16, Line 17: additional logic to consider NaN values as equal. The comment mentions equality. Is the rule that NaN = NaN is always false? But that Nan <=> NaN is true? Looks like the IS DISTINCT (<=>) code is implemented in operators-ir.cc, but that file is not in this patch set. Does the implementation of IS DISTINCT need to change? http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/runtime/raw-value.cc File be/src/runtime/raw-value.cc: http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/runtime/raw-value.cc@33 PS16, Line 33: const float RawValue::CANONICAL_FLOAT_NAN = nanf(""); NaN is not a single bit pattern (which is why you've defined a canonical form), rather it is a family of bit patterns. Do the above functions guarantee to return the exact same bit pattern on all OSs, all versions, and all runs? Or, should we use the LLVM trick to specify the exact bit pattern we want to use for NAN so that it is guaranteed to be the same on all hosts in the cluster, even if they use slightly different CPUs or library versions? http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test File testdata/workloads/functional-query/queries/QueryTest/exprs.test: http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3074 PS16, Line 3074: ---- QUERY > I've assumed that NaN tests have existed in other places. Exhaustive valid The thought is that we might want to ensure all paths work to handle the NaN case, not just the grouping case. For example, we we have tests for all three (I believe) places that this can occur: * SELECT clause: SELECT sqrt(-1) op floatCol ... * WHERE clause: WHERE sqrt(-1) op floatCol ... * JOIN: what is tested here. Along with two operators * = (NaN's are not equal) * <=> (NaN's are equal) We also have both the code gen and interpreted cases: * SELECT: never code generated (when in the root fragment) * WHERE, JOIN: code generated if enough rows, interpreted otherwise The thought is that we might want to cover all these cases just to save the hassle of fixing any gaps later. -- To view, visit http://gerrit.cloudera.org:8080/11535 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086 Gerrit-Change-Number: 11535 Gerrit-PatchSet: 16 Gerrit-Owner: Michal Ostrowski <mostr...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: Michal Ostrowski <mostr...@cloudera.com> Gerrit-Reviewer: Paul Rogers <par0...@yahoo.com> Gerrit-Reviewer: Thomas Marshall <thomasmarsh...@cmu.edu> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Mon, 22 Oct 2018 18:04:58 +0000 Gerrit-HasComments: Yes