[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21312 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r188196324 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -133,6 +133,14 @@ private[arrow] abstract class ArrowFieldWriter { valueVector match { case fixedWidthVector: BaseFixedWidthVector => fixedWidthVector.reset() case variableWidthVector: BaseVariableWidthVector => variableWidthVector.reset() + case listVector: ListVector => --- End diff -- Yeah, I think so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r188184724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -133,6 +133,14 @@ private[arrow] abstract class ArrowFieldWriter { valueVector match { case fixedWidthVector: BaseFixedWidthVector => fixedWidthVector.reset() case variableWidthVector: BaseVariableWidthVector => variableWidthVector.reset() + case listVector: ListVector => --- End diff -- good catch! So this bug was there since the day 1 when we have arrow writer, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r187787882 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter( override def reset(): Unit = { super.reset() elementWriter.reset() +valueVector.clear() --- End diff -- I've also noticed that @BryanCutler added `reset` to `ListVector`. But we can only use `clear` for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r187787511 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter( override def reset(): Unit = { super.reset() elementWriter.reset() +valueVector.clear() --- End diff -- Yeah, I think so. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21312#discussion_r187787343 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala --- @@ -311,6 +311,7 @@ private[arrow] class ArrayWriter( override def reset(): Unit = { super.reset() elementWriter.reset() +valueVector.clear() --- End diff -- Looks @BryanCutler added `reset()` interface in 0.9.0 mentioned in: https://github.com/apache/spark/blob/eb386be1ed383323da6e757f63f3b8a7ced38cc4/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala#L132 at https://github.com/apache/arrow/commit/4dbce607d50031a405af39d36e08cd03c5ffc764 and https://issues.apache.org/jira/browse/ARROW-1962 but if we think about backporting, probably I guess we can go this way as a bug fix as is? Roughly looks making sense. Would it be also safe to do: ``` valueVector match { case fixedWidthVector: BaseFixedWidthVector => fixedWidthVector.reset() case variableWidthVector: BaseVariableWidthVector => variableWidthVector.reset() case repeatedValueVector: BaseRepeatedValueVector => repeatedValueVector.clear() case _ => } ``` ? @icexelloss, @BryanCutler and @viirya? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21312 [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong output ## What changes were proposed in this pull request? Right now `ArrayWriter` used to output Arrow data for array type, doesn't do `clear` or `reset` after each batch. It produces wrong output. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-24259 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21312.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21312 commit 093728ef75f4cecbac5d5f4f82fcce0cc47759b5 Author: Liang-Chi HsiehDate: 2018-05-13T00:29:09Z Call clear after each batch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org