[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-05-11 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 closing now, will revisit for Spark 3.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-05-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 holding off is fine; however, I am less sure about the configuration if that's not something you guys feel strongly. --- -

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-05-01 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 @HyukjinKwon , I was also thinking about holding off on this until 3.0.0 and then make a clean switch. What do you think about that @holdenk ? ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 Probably that'd work but also it'd be trickier to add / remove that configuration. Another similar option maybe just close this for now and target this for 3.0.0 since we already started to

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-26 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20280 I'm worried that people might have two rows with different meanings but the same type and their application will start producing garbage #s. I think a lot of people go from RDDs of Rows to DFs in

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-23 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 Thanks @HyukjinKwon and @felixcheung , I'm a bit worried too that this might break someones code, but it doesn't affect `createDataFrame` from `Row`s, it's only when the Row is serialized like

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 @BryanCutler, mind if I ask to clarify what happens for end-to-end cases in the PR description (like before & after with explaining the reasons)? the change looks small but possibly a breaking

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 BTW, I believe it's not so easy to pass a configuration from a very quick look because the exception usually would be thrown in a Python worker process. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 If the renaming scenario works in most of cases as expected, I think it'd be worthwhile to have a configuration; however, the previous behaviour looks actually odd because it's going to work

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-21 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20280 I'm kinda worry the example you give above is actually fairly common - construct with kwargs, and then (re-)name the columns. perhaps worthwhile to consider a config switch? ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 Right. Will triple check for sure but I am with you for now. Yup, something in the migration guide makes much more sense to me too. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 oops, I missed this. will take a look shortly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-20 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 @HyukjinKwon @holdenk and @MrBago have any thoughts on moving forward with this change? --- - To unsubscribe, e-mail:

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 Also, this will cause a breaking change if `Row`s are defined with kwargs and schema changes field names, like this: ``` data = [Row(key=i, value=str(i)) for i in range(100)]

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 > I think we should raise an error if __from_dict__ is set and the user tries to index using a position or a slice. I'd also like to follow up with another PR to address some of the

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89530/ Test PASSed. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #89530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89530/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 Let me restate what I think the intended behavior of Row is: If a `Row` is made from kwargs, then the order of the fields can not be relied upon and whenever accessing data, it must be

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 Looking at this again, I'm back to thinking this is the right fix. Based on #14469, if the `Row` objects were made with named arguments, then the intent is for elements to be looked up by

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2452/

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #89530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89530/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-13 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20280 Awesome, looking forward to the update. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-10 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 Hey @holdenk , yeah I've been meaning to circle back to this and get some kind of resolution. I'll try to take another look later this week. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-04-06 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/20280 Hey @BryanCutler is this still on your radar? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 Thanks @HyukjinKwon and @MrBago for reviewing. After thinking about this some more, I don't think this is the right solution. Like @HyukjinKwon pointed out, the supplied schema names should

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 ^ Yup, let's leave the performance issue out. I think we might have to raise an error too but it's kind of a radical change. As a note, sorted fields are documented:

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20280 BTW the performance issue is orthogonal to the serialization issue raised in this jira/PR. Maybe we should avoid scope creep in this thread. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20280 Ahh. @zero323 too if you are available because I think we had a talk about this somewhere long time ago. Yea, I am aware of the issue itself. Will take a look soon. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread MrBago
Github user MrBago commented on the issue: https://github.com/apache/spark/pull/20280 I think we should raise an error if `__from_dict__` is set and the user tries to index using a position or a slice. Indexing by field name takes the same code path for Rows that are and are

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86205/ Test PASSed. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #86205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86205/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #86205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86205/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #86204 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86204/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86204/ Test FAILed. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #86204 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86204/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #86193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)** for PR 20280 at commit

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86193/ Test PASSed. ---

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20280 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 @MrBago @HyukjinKwon I think the above behavior of the `Row` class is a little screwy, but at least this fixes it to be more consistent. I'm not sure if there is a way to rectify the two

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20280 After looking into this, it seems like the behavior of the `Row` class is as follows: If a `Row` is made from kwargs, then the order of the fields can not be relied upon and whenever

[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20280 **[Test build #86193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86193/testReport)** for PR 20280 at commit