[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
Do we need to consider `GenerateSafeProjection`, too? In other words, if 
the generated code or runtime does not use data in `Unsafe`, this `+0.0/-0.0` 
problem may still exist.  
Am I correct?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
Is it better to update this PR title now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
@srowen #21794 is what I thought.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-17 Thread adoron
Github user adoron commented on the issue:

https://github.com/apache/spark/pull/23043
  
@cloud-fan changing writeDouble/writeFloat in UnsafeWriter indeed do the 
trick!
I'll fix the PR. I was thinking about making the change in 
`Platform::putDouble` so all accesses get affected, in UnsafeRow and 
UnsafeWriter as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23043
  
`UnsafeRow.set` is not the only place to write float/double as binary data, 
can you check other places like UnsafeWriter?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-16 Thread sabanas
Github user sabanas commented on the issue:

https://github.com/apache/spark/pull/23043
  
@adoron indeed this doesn't pass through setFloat, but all values go 
through - 

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L209

which goes through - 


https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/rows.scala#L209

so using such code for example solves your example - 
`
  override def update(i: Int, value: Any): Unit = {
val ignoreMinusZeroValue = value match {
  case v: Double => if (v == 0d) 0d else value
  case v: Float => if (v == 0f) 0f else value
  case _ => value
}
values(i) = ignoreMinusZeroValue
  }
`

not sure if that holds for other cases mentioned in this PR though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-16 Thread adoron
Github user adoron commented on the issue:

https://github.com/apache/spark/pull/23043
  
@cloud-fan that's what I thought as well at first, but the flow doesn't go 
through that code -
running `Seq(0.0d, 0.0d, -0.0d).toDF("i").groupBy("i").count().collect()` 
and adding a breakpoint.

The reason for -0.0 and 0.0 being put in different buckets of "group by" is 
in UnsafeFixedWidthAggregationMap::getAggregationBufferFromUnsafeRow():
```
public UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key) {
return getAggregationBufferFromUnsafeRow(key, key.hashCode());
}
```
The hashing is done on the UnsafeRow, and by this point the whole row is 
hashed as a unit and it's hard to find the double columns and their value.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23043
  
Looking at `UnsafeRow.putFloat`, it normalizes the value of `Float.NaN`. I 
think we should do the same there for `-0.0`, and other related places (check 
how we handle Float.NaN). 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-16 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23043
  
They do, FWIW:
```
scala> java.lang.Double.doubleToLongBits(0.0)
res1: Long = 0

scala> java.lang.Double.doubleToLongBits(-0.0)
res2: Long = -9223372036854775808
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-16 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23043
  
Before rushing to a fix that replaces -0.0 to 0.0, I'd like to know how 
this bug happens.

One possible reason might be, 0.0 and -0.0 have different binary format. 
Spark use unsafe API to write float/double, maybe we can investigate that first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23043
  
@kiszk This spun out of https://issues.apache.org/jira/browse/SPARK-24834 
and https://github.com/apache/spark/pull/21794 ; is that what you may be 
thinking of? I'm not aware of others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/23043
  
IIUC, we discussed handling `+0.0` and `-0.0` before in another PR. 
@srowen do you remember the previous discussion?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23043
  
This only works for attribute, not literal or intermedia result. Is there a 
better place to fix it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread adoron
Github user adoron commented on the issue:

https://github.com/apache/spark/pull/23043
  
@srowen @gatorsmile  @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23043
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23043
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23043: [SPARK-26021][SQL] replace minus zero with zero in Unsaf...

2018-11-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23043
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org