[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21312


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21312#discussion_r188196324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
@@ -133,6 +133,14 @@ private[arrow] abstract class ArrowFieldWriter {
 valueVector match {
   case fixedWidthVector: BaseFixedWidthVector => 
fixedWidthVector.reset()
   case variableWidthVector: BaseVariableWidthVector => 
variableWidthVector.reset()
+  case listVector: ListVector =>
--- End diff --

Yeah, I think so.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-15 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21312#discussion_r188184724
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
@@ -133,6 +133,14 @@ private[arrow] abstract class ArrowFieldWriter {
 valueVector match {
   case fixedWidthVector: BaseFixedWidthVector => 
fixedWidthVector.reset()
   case variableWidthVector: BaseVariableWidthVector => 
variableWidthVector.reset()
+  case listVector: ListVector =>
--- End diff --

good catch! So this bug was there since the day 1 when we have arrow 
writer, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21312#discussion_r187787882
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
@@ -311,6 +311,7 @@ private[arrow] class ArrayWriter(
   override def reset(): Unit = {
 super.reset()
 elementWriter.reset()
+valueVector.clear()
--- End diff --

I've also noticed that @BryanCutler added `reset` to `ListVector`. But we 
can only use `clear` for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21312#discussion_r187787511
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
@@ -311,6 +311,7 @@ private[arrow] class ArrayWriter(
   override def reset(): Unit = {
 super.reset()
 elementWriter.reset()
+valueVector.clear()
--- End diff --

Yeah, I think so.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21312#discussion_r187787343
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala 
---
@@ -311,6 +311,7 @@ private[arrow] class ArrayWriter(
   override def reset(): Unit = {
 super.reset()
 elementWriter.reset()
+valueVector.clear()
--- End diff --

Looks @BryanCutler added `reset()` interface in 0.9.0 mentioned in:


https://github.com/apache/spark/blob/eb386be1ed383323da6e757f63f3b8a7ced38cc4/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala#L132

at 
https://github.com/apache/arrow/commit/4dbce607d50031a405af39d36e08cd03c5ffc764 
and https://issues.apache.org/jira/browse/ARROW-1962

but if we think about backporting, probably I guess we can go this way as a 
bug fix as is? Roughly looks making sense. 

Would it be also safe to do:

```
valueVector match {
  case fixedWidthVector: BaseFixedWidthVector => 
fixedWidthVector.reset()
  case variableWidthVector: BaseVariableWidthVector => 
variableWidthVector.reset()
  case repeatedValueVector: BaseRepeatedValueVector => 
repeatedValueVector.clear()
  case _ =>
}
```

? @icexelloss, @BryanCutler and @viirya?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...

2018-05-12 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/21312

[SPARK-24259][SQL] ArrayWriter for Arrow produces wrong output

## What changes were proposed in this pull request?

Right now `ArrayWriter` used to output Arrow data for array type, doesn't 
do `clear` or `reset` after each batch. It produces wrong output.

## How was this patch tested?

Added test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-24259

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21312


commit 093728ef75f4cecbac5d5f4f82fcce0cc47759b5
Author: Liang-Chi Hsieh 
Date:   2018-05-13T00:29:09Z

Call clear after each batch.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org