[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user bavardage commented on the issue: https://github.com/apache/spark/pull/21794 yep fair - the intent I think was clarity rather than necessarily perf: it's misleading to have a method named 'nan safe' which has no special handling of nans. I'll look at opening a different PR which could increase clarity/may have minor perf benefit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21794 I think we'd have to close this due to the behavior change, but would merge an optimization of the existing behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21794 `spark-sql` suggests that -0 and 0 are considered the same though. `SELECT -0.0 == 0.0;` returns `true`. It's probably essential not to change behavior here, but if performance is the issue, I think the method can be optimized. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user bavardage commented on the issue: https://github.com/apache/spark/pull/21794 it does seem that spark currently does distinguish -0 and 0, at least as far as groupbys go ``` scala> case class Thing(x : Float) defined class Thing scala> val df = Seq(Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f)).toDF 2018-07-17 13:17:08 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException df: org.apache.spark.sql.DataFrame = [x: float] scala> df.groupBy($"x").count res0: org.apache.spark.sql.DataFrame = [x: float, count: bigint] scala> res0.collect res1: Array[org.apache.spark.sql.Row] = Array([-0.0,4], [0.0,4]) ``` doubles are hashed via `doubleToLongBits` https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L338 which gives different bitwise representations of positive and negative 0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21794 BTW if it becomes necessary to not change the semantics, I think the methods could at least be streamlined a bit: ``` if (x < y) { -1 } else if (x > y) { 1 } else if (x == y) { 0 } else { if (java.lang.Double.isNaN(x)) { if (java.lang.Double.isNaN(y)) { 0 } else { 1 } } else { -1 } } ``` More tests can't hurt, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21794 It would be good to add test cases for them since it is not covered now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21794 I think this is required since SparkSQL does not distinguish 0.0 from -0.0. Am I correct? cc @gatorsmile @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user bavardage commented on the issue: https://github.com/apache/spark/pull/21794 There is a way in which the `nanSafeCompare*` methods do differ from java built-in, but that wasn't captured in the test cases (or in the suggestive naming of the method), namely the handling of `-0.0` vs `0.0`. `java.lang.Double.compare(-0.0d, 0.0d) == -1` `Utils.nanSafeCompareDoubles(-0.0d, 0.0d) == 0` If that behaviour is indeed required, I could abandon this PR and instead fix the test cases to capture this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user bavardage commented on the issue: https://github.com/apache/spark/pull/21794 cc @JoshRosen who introduced this code originally, I think --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21794 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21794: [SPARK-24834][CORE] use java comparison for float and do...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21794 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org