Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
I just opened another PR for adding a configuration -
https://github.com/apache/spark/pull/20678. Let me close this one.
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
Thanks! Happy Lunar New Year!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
I just opened https://github.com/apache/spark/pull/20625. I believe this is
the smallest and simplest change ..
Will turn this PR to add a configuration later.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
Yup, I will. Sorry for delaying it. I was trying to make the fix small as
possible as I can. Let me just open it as a simplest way.
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
@HyukjinKwon Will you submit a fix for the binary type today? We are very
close to RC4. This is kind of urgent if we still want to block it in the Spark
2.3.0 release.
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
Sure.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20567
^ this change LGTM. Can we make a PR for this change only and leave the
fallback part for Spark 2.4?
---
-
To unsubscribe,
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
> The binary type bug sounds like a blocker, can we just fix it surgically
by checking the supported data types before going to the arrow optimization
path? For now we can stick with that the
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20567
The binary type bug sounds like a blocker, can we just fix it surgically by
checking the supported data types before going to the arrow optimization path?
For now we can stick with that the
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
The root cause is Arrow conversion in Python side interprets binaries as
`str`, and I here avoided this by checking if the type is what we supported or
not.
This is the most trivial
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
What is the root cause? Do we have a trivial fix to resolve/block it?
---
-
To unsubscribe, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
There is one more thing -
https://github.com/apache/spark/pull/20567#issuecomment-364639922 We didn't
complete binary type support yet in Python side but there is a hole here ..
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
The behavior inconsistency between `toPandas` and `createDataFrame` looks
confusing to end users, I have to admit.
In the current stage, we are unable to merge the fix for these new
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
I mean the actual change here is small. The diff maybe looks larger here
because of removed `else`. Please check out the diff. It's quite a safe change.
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
Then, let us wait for the release of Spark 2.3.0. Thanks!
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
Just FYI, except option 3., the complexity in other options and the PR size
will be all similar -
https://github.com/apache/spark/pull/20567#issuecomment-364806378 and
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
We are unable to contain option 3 in Spark 2.3.0. This is too big to merge
it in the current stage. We still can do it in 2.3.1.
If needed, I am fine to throw a better error message if
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
@gatorsmile and @rxin,
The problem here is that `toPandas` just fails on unsupported types later
and allows `BinaryType` with inconsistent conversion
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
RC3 is out. This change could be in 2.3.1 f the vote passes, or in 2.3.0 If
the vote fails. For the reason above, we can't backport and change anything in
the main codes until the release
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
RC3 is out. Just to avoid new regressions that might be introduced in the
new PR.
---
-
To unsubscribe, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
^ I am not saying that we should merge it now. I can do the opposite for
`createDataFrame` given
https://github.com/apache/spark/pull/20567#issuecomment-365100434 . My point is
why it should
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
> Is there any specific worry from this change, that might shake the 2.3.0
release speficially? In this way, we can't backport anything. I am surprised
that this PR is considered to be excluded
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
> I thought this is another step. We need to make them consistent first.
Based on the comments from @icexelloss , I do not think we should blindly
switch back to the original version. At
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20567
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87355/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20567
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20567
**[Test build #87355 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87355/testReport)**
for PR 20567 at commit
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
The feedback is partially from @rxin Maybe he can provide more inputs
later.
---
-
To unsubscribe, e-mail:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
> This issue does not cause the regression since
spark.sql.execution.arrow.enabled is off by default.
It doesn't block the release but we can still backport it because it fixes
an
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
This issue does not cause the regression since
`spark.sql.execution.arrow.enabled` is off by default. We need to make it
configurable before merging it. Merging this to 2.3.0 might cause the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20567
**[Test build #87355 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87355/testReport)**
for PR 20567 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20567
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20567
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/827/
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20567
> My proposal is to merge the fix after the 2.3 release. We can still
backport it to SPARK 2.3, but it will be available in SPARK 2.3.1.
Mind if I ask to elaborate why? Want to know why
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/20567
My proposal is to merge the fix after the 2.3 release. We can still
backport it to SPARK 2.3, but it will be available in SPARK 2.3.1.
---
34 matches
Mail list logo