[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 Do we need to consider `GenerateSafeProjection`, too? In other words, if the generated code or runtime does not use data in `Unsafe`, this `+0.0/-0.0` problem may still exist. Am I correct? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 Is it better to update this PR title now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 @srowen #21794 is what I thought. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user adoron commented on the issue: https://github.com/apache/spark/pull/23043 @cloud-fan changing writeDouble/writeFloat in UnsafeWriter indeed do the trick! I'll fix the PR. I was thinking about making the change in `Platform::putDouble` so all accesses get affected, in UnsafeRow and UnsafeWriter as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23043 `UnsafeRow.set` is not the only place to write float/double as binary data, can you check other places like UnsafeWriter? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user sabanas commented on the issue: https://github.com/apache/spark/pull/23043 @adoron indeed this doesn't pass through setFloat, but all values go through - https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L209 which goes through - https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L209 so using such code for example solves your example - ` override def update(i: Int, value: Any): Unit = { val ignoreMinusZeroValue = value match { case v: Double => if (v == 0d) 0d else value case v: Float => if (v == 0f) 0f else value case _ => value } values(i) = ignoreMinusZeroValue } ` not sure if that holds for other cases mentioned in this PR though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user adoron commented on the issue: https://github.com/apache/spark/pull/23043 @cloud-fan that's what I thought as well at first, but the flow doesn't go through that code - running `Seq(0.0d, 0.0d, -0.0d).toDF("i").groupBy("i").count().collect()` and adding a breakpoint. The reason for -0.0 and 0.0 being put in different buckets of "group by" is in UnsafeFixedWidthAggregationMap::getAggregationBufferFromUnsafeRow(): ``` public UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key) { return getAggregationBufferFromUnsafeRow(key, key.hashCode()); } ``` The hashing is done on the UnsafeRow, and by this point the whole row is hashed as a unit and it's hard to find the double columns and their value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23043 Looking at `UnsafeRow.putFloat`, it normalizes the value of `Float.NaN`. I think we should do the same there for `-0.0`, and other related places (check how we handle Float.NaN). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23043 They do, FWIW: ``` scala> java.lang.Double.doubleToLongBits(0.0) res1: Long = 0 scala> java.lang.Double.doubleToLongBits(-0.0) res2: Long = -9223372036854775808 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23043 Before rushing to a fix that replaces -0.0 to 0.0, I'd like to know how this bug happens. One possible reason might be, 0.0 and -0.0 have different binary format. Spark use unsafe API to write float/double, maybe we can investigate that first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23043 @kiszk This spun out of https://issues.apache.org/jira/browse/SPARK-24834 and https://github.com/apache/spark/pull/21794 ; is that what you may be thinking of? I'm not aware of others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/23043 IIUC, we discussed handling `+0.0` and `-0.0` before in another PR. @srowen do you remember the previous discussion? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23043 This only works for attribute, not literal or intermedia result. Is there a better place to fix it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user adoron commented on the issue: https://github.com/apache/spark/pull/23043 @srowen @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23043 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23043 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23043 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org