[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22140 We are very conservative when backporting the PR to the released version. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 > Thanks for your understanding. Normally, we are very conservative to introduce any potential behavior change to the released version. Yes, I know. It seemed to me at the time as failing fast rather than later and improving the error message, but best to be safe. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22140 @BryanCutler @HyukjinKwon Thanks for your understanding. Normally, we are very conservative to introduce any potential behavior change to the released version. I just reverted it from branch 2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 > Can we just simply take this out from branch-2.3? Thanks @HyukjinKwon , that is fine with me. What do you think @gatorsmile ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22140 Yea, actually I wouldn't at least backport this to branch-2.3 since the release is very close. Looks a bug to me as well. One nitpicking is the case with RDD operation: ```python >>> from pyspark.sql import Row >>> row_class = Row("c1", "c2") >>> row = row_class(1, 2, 3) >>> spark.sparkContext.parallelize([row]).map(lambda r: r.c1).collect() [1] ``` This is really unlikely and I even wonder if it makes any sense, but still there might be a case although the creation of the namedtuple like row itself should be disallowed, as fixed here. Can we just simply take this out from branch-2.3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 @gatorsmile it seemed like a straightforward bug to me. Rows with extra values lead to incorrect output and exceptions when used in `DataFrames`, so it did not seem like there was any possible this would break existing code. For example ``` In [1]: MyRow = Row('a','b') In [2]: print(MyRow(1,2,3)) Row(a=1, b=2) In [3]: spark.createDataFrame([MyRow(1,2,3)]) Out[3]: DataFrame[a: bigint, b: bigint] In [4]: spark.createDataFrame([MyRow(1,2,3)]).show() 18/09/08 21:55:48 ERROR Executor: Exception in task 2.0 in stage 2.0 (TID 7) java.lang.IllegalStateException: Input row doesn't have expected number of values required by the schema. 2 fields are required while 3 values are provided. In [5]: spark.createDataFrame([MyRow(1,2,3)], schema="x: int, y: int").show() ValueError: Length of object (3) does not match with length of fields (2) ``` Maybe I was too hasty with backporting and this needed some discussion. Do you know of a use case that this change would break? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22140 ``` @xuanyuanking Could you please update the document? ``` #22369 Thanks for reminding, I'll pay attention in future work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22140 @BryanCutler What is the reason to backport this PR? This sounds a behavior change. @xuanyuanking Could you please update the document? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22140 Thanks @BryanCutler @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 merged to master, branch 2.4 and 2.3. Thanks @xuanyuanking ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 > yea, to me it looks less sense actually but seems at least working for now: good point, I guess it only fails when you supply a schema. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95756/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22140 **[Test build #95756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95756/testReport)** for PR 22140 at commit [`eb3f506`](https://github.com/apache/spark/commit/eb3f506817e6cb99230853ffd5c50e3299527d4b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22140 **[Test build #95756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95756/testReport)** for PR 22140 at commit [`eb3f506`](https://github.com/apache/spark/commit/eb3f506817e6cb99230853ffd5c50e3299527d4b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2901/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22140 @BryanCutler, for https://github.com/apache/spark/pull/22140#issuecomment-414802978, yea, to me it looks less sense actually but seems at least working for now: ```python from pyspark.sql import Row rowClass = Row("c1", "c2") spark.createDataFrame([rowClass(1)]).show() ``` ``` +---+ | c1| +---+ | 1| +---+ ``` I think we should consider disallowing it in 3.0.0 given the test above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22140 gental ping @HyukjinKwon @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22140 AFAIC, the fix should forbid illegal extra value passing. If less values than fields it should get a `AttributeError` while accessing as the currently implement, not ban it here? What do you think :) @HyukjinKwon @BryanCutler Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/22140 Does it make any sense to have less values than fields? Maybe we should check that they are equal, wdyt @HyukjinKwon ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22140 cc @BryanCutler as well since we discussed an issue about this code path before. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22140 cc @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94920/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22140 **[Test build #94920 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94920/testReport)** for PR 22140 at commit [`b8c6522`](https://github.com/apache/spark/commit/b8c6522bccde51584e9878144924fd7b92f8785f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2296/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22140 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22140: [SPARK-25072][PySpark] Forbid extra value for custom Row
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22140 **[Test build #94920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94920/testReport)** for PR 22140 at commit [`b8c6522`](https://github.com/apache/spark/commit/b8c6522bccde51584e9878144924fd7b92f8785f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org