[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 I just opened another PR for adding a configuration - https://github.com/apache/spark/pull/20678. Let me close this one. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-16 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 Thanks! Happy Lunar New Year! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 I just opened https://github.com/apache/spark/pull/20625. I believe this is the smallest and simplest change .. Will turn this PR to add a configuration later. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 Yup, I will. Sorry for delaying it. I was trying to make the fix small as possible as I can. Let me just open it as a simplest way. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 @HyukjinKwon Will you submit a fix for the binary type today? We are very close to RC4. This is kind of urgent if we still want to block it in the Spark 2.3.0 release. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20567 ^ this change LGTM. Can we make a PR for this change only and leave the fallback part for Spark 2.4? --- - To unsubscribe,

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 > The binary type bug sounds like a blocker, can we just fix it surgically by checking the supported data types before going to the arrow optimization path? For now we can stick with that the

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20567 The binary type bug sounds like a blocker, can we just fix it surgically by checking the supported data types before going to the arrow optimization path? For now we can stick with that the

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 The root cause is Arrow conversion in Python side interprets binaries as `str`, and I here avoided this by checking if the type is what we supported or not. This is the most trivial

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 What is the root cause? Do we have a trivial fix to resolve/block it? --- - To unsubscribe, e-mail:

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 There is one more thing - https://github.com/apache/spark/pull/20567#issuecomment-364639922 We didn't complete binary type support yet in Python side but there is a hole here .. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 The behavior inconsistency between `toPandas` and `createDataFrame` looks confusing to end users, I have to admit. In the current stage, we are unable to merge the fix for these new

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 I mean the actual change here is small. The diff maybe looks larger here because of removed `else`. Please check out the diff. It's quite a safe change. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 Then, let us wait for the release of Spark 2.3.0. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 Just FYI, except option 3., the complexity in other options and the PR size will be all similar - https://github.com/apache/spark/pull/20567#issuecomment-364806378 and

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 We are unable to contain option 3 in Spark 2.3.0. This is too big to merge it in the current stage. We still can do it in 2.3.1. If needed, I am fine to throw a better error message if

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 @gatorsmile and @rxin, The problem here is that `toPandas` just fails on unsupported types later and allows `BinaryType` with inconsistent conversion

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 RC3 is out. This change could be in 2.3.1 f the vote passes, or in 2.3.0 If the vote fails. For the reason above, we can't backport and change anything in the main codes until the release

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 RC3 is out. Just to avoid new regressions that might be introduced in the new PR. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 ^ I am not saying that we should merge it now. I can do the opposite for `createDataFrame` given https://github.com/apache/spark/pull/20567#issuecomment-365100434 . My point is why it should

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 > Is there any specific worry from this change, that might shake the 2.3.0 release speficially? In this way, we can't backport anything. I am surprised that this PR is considered to be excluded

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 > I thought this is another step. We need to make them consistent first. Based on the comments from @icexelloss , I do not think we should blindly switch back to the original version. At

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87355/ Test PASSed. ---

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20567 **[Test build #87355 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87355/testReport)** for PR 20567 at commit

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 The feedback is partially from @rxin Maybe he can provide more inputs later. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 > This issue does not cause the regression since spark.sql.execution.arrow.enabled is off by default. It doesn't block the release but we can still backport it because it fixes an

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 This issue does not cause the regression since `spark.sql.execution.arrow.enabled` is off by default. We need to make it configurable before merging it. Merging this to 2.3.0 might cause the

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20567 **[Test build #87355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87355/testReport)** for PR 20567 at commit

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20567 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/827/

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20567 > My proposal is to merge the fix after the 2.3 release. We can still backport it to SPARK 2.3, but it will be available in SPARK 2.3.1. Mind if I ask to elaborate why? Want to know why

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20567 My proposal is to merge the fix after the 2.3 release. We can still backport it to SPARK 2.3, but it will be available in SPARK 2.3.1. ---