[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19505 Sure, I'd close this. @icexelloss Of course you can open a separate JIRA and another PR. Thanks! --- - To unsubscribe,

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19505 @ueshin Maybe close this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/19505 @viirya @cloud-fan I updated my original summary. I think it answers `group_transform` question. I also added more example to each type. @HyukjinKwon @viirya I agree we can move this to

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19505 The group_transform udfs looks a bit weird to me. @icexelloss Can you explain the use case of it? --- - To unsubscribe, e-mail:

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19505 +1 for separate JIRA to clarify the proposal and +0 for 3. out of those three, too. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19505 @icexelloss The summary and the proposal 3 looks great. To prevent confusing, can you also put the usage of each function type in proposal 3? E.g., group_map is for `groupby().apply()`, transform is

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19505 Btw, I think the scope of this change is more than just a follow-up. Should we create another JIRA for it? --- - To unsubscribe,

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/19505 @cloud-fan asked: " what's the difference between transform and group_transform? Seems we don't need to care about it both in usage and implementation. " My answer is:

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-19 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/19505 Here is a summary of the current proposal during some offline disuccsion: I. Use only `pandas_udf` -- The main issues with this approach as a few people

[GitHub] spark issue #19505: [WIP][SPARK-20396][SQL][PySpark][FOLLOW-UP] groupby().ap...

2017-10-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19505 I meant to ask if others agree with the current change as I could not see the ongoing discussion at that time. --- - To