[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98629/testReport)** for PR 22275 at commit [`7dc92c8`](https://github.com/apache/spark/commit/7dc92c8d0dca69e254088fd6e1f3e15da1f90fbe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98624/testReport)** for PR 22275 at commit [`725cd47`](https://github.com/apache/spark/commit/725cd4725004b9db1281e21de961679dec359e48). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4866/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98538/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98538/testReport)** for PR 22275 at commit [`bf2feec`](https://github.com/apache/spark/commit/bf2feec2ef023177d72ac1137dbd1b3a02eb9a89). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4806/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98538/testReport)** for PR 22275 at commit [`bf2feec`](https://github.com/apache/spark/commit/bf2feec2ef023177d72ac1137dbd1b3a02eb9a89). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98285/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98285/testReport)** for PR 22275 at commit [`6457e42`](https://github.com/apache/spark/commit/6457e420e3b8366c1373e7adb0bf56df03b9cc19). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98284/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98284 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98284/testReport)** for PR 22275 at commit [`7d19977`](https://github.com/apache/spark/commit/7d1997738c8c76f15068d306c858d6127733db7d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4644/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4643/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98285 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98285/testReport)** for PR 22275 at commit [`6457e42`](https://github.com/apache/spark/commit/6457e420e3b8366c1373e7adb0bf56df03b9cc19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 Apologies for the delay in circling back to this. I reorganized a little to simplify and expanded the comments to hopefully better describe the code. A quick summary of the changes: I changed the ArrowStreamSerializer to not have any state - that seemed to complicate things. So instead of saving the batch order indices, they are loaded on the last iteration of `load_stream`, and this was put in a special serializer `ArrowCollectSerializer` so that it is clear where it is used. I also consolidated all the batch ordering calls within `_collectAsArrow` so it is easier to follow the whole process. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98284 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98284/testReport)** for PR 22275 at commit [`7d19977`](https://github.com/apache/spark/commit/7d1997738c8c76f15068d306c858d6127733db7d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98202/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98202/testReport)** for PR 22275 at commit [`d6fefee`](https://github.com/apache/spark/commit/d6fefee68c30aa579b345c32d9f00b32bf9a505b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrowStreamSerializer(Serializer):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4587/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #98202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98202/testReport)** for PR 22275 at commit [`d6fefee`](https://github.com/apache/spark/commit/d6fefee68c30aa579b345c32d9f00b32bf9a505b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22275 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 Thanks for the review @holdenk ! I haven't had time to followup, but I'll take a look through this and see what I can do about making things clearer. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/22275 got it. so the size of the each batch could grow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 > generally, is this going to limit how much data to pass along because of the bit length of the index? So the index passed to python is the RecordBatch index, not an element index, and it would limit the number of batches to Int.MAX. I wouldn't expect that would be likely and you can always set the number of batches to 1 per partition, so that would be the limiting factor then. WDYT @felixcheung ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/22275 Sure, I'll take a look on Friday if it's not urgent --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 @holdenk I was wondering if you had any thoughts on this? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22275 Thanks @viirya ! What are your thoughts @HyukjinKwon ? I consolidated the batch order serializer from before into the ArrowStreamSerializer to simplify a little. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95441/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #95441 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95441/testReport)** for PR 22275 at commit [`d6fefee`](https://github.com/apache/spark/commit/d6fefee68c30aa579b345c32d9f00b32bf9a505b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ArrowStreamSerializer(Serializer):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2686/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22275 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22275: [SPARK-25274][PYTHON][SQL] In toPandas with Arrow send o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22275 **[Test build #95441 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95441/testReport)** for PR 22275 at commit [`d6fefee`](https://github.com/apache/spark/commit/d6fefee68c30aa579b345c32d9f00b32bf9a505b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org