[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-16 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 @BryanCutler Thanks! I'm happy we can identify this possible bug. Looking forward to the fixing. --- - To unsubscribe, e-mail:

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-16 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21312 > Not sure why, but previously calling ListVector.clear, I must change the reset order to reset element writer first to pass the test @viirya I looked into this and found it to be a bug

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 Thanks @HyukjinKwon @BryanCutler @cloud-fan @icexelloss --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21312 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90626/ Test PASSed. ---

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90626/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3222/

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90626 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90626/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90621/ Test FAILed. ---

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90621/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90618/ Test PASSed. ---

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90618/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 Not sure why, but previously calling `ListVector.clear`, I must change the reset order to reset element writer first to pass the test: ```scala override def reset(): Unit = { +

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3218/

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90621 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90621/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3216/

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90618/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21312 It looks like the `ListVector` also needs `setLastSet` to be called with 0, which is only in `ListVector`. This is fine though, since `ListVector` is the only vector extending

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 Ok. I will use manual reset for now and leave a TODO comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21312 I'm okay with either way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 @BryanCutler I have such thought but wondered if it is good to do that. If you @HyukjinKwon @icexelloss are also agreed on manual reset like this, I'm fine with it. ---

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21312 Agree on both points. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21312 @viirya I looked into it a bit more and calling `clear()` won't cause any problems but it does trigger a reallocation of the vector buffers the next time writing. What do you think about

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 @icexelloss It only happens when there are more than one batch in each partition. Existing tests do not hit this condition. That is why the added test here is doing a `repartition`: `df =

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21312 @viirya Thanks for catching this! I think we have many tests that excise the array types. I am curious why this is not caught by existing tests, e.g:

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-14 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/21312 Thanks for catching this @viirya! Looks good from a first glance, but my only concern is that `clear()` will release the vector buffers, where `reset()` just zeros them out. Let me look into

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90547/ Test PASSed. ---

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90547/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90544/ Test PASSed. ---

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90544/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3173/

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90547/testReport)** for PR 21312 at commit

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21312 cc @HyukjinKwon @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3170/

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21312 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21312: [SPARK-24259][SQL] ArrayWriter for Arrow produces wrong ...

2018-05-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21312 **[Test build #90544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90544/testReport)** for PR 21312 at commit