[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2019-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845603#comment-16845603 ] Josh Rosen commented on SPARK-19468: Chiming in to add a strong +1 here, since this seems like it

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2019-02-05 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761309#comment-16761309 ] Mitesh commented on SPARK-19468: Also this may be a dupe of SPARK-19981 > Dataset slow because of

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2019-02-05 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761262#comment-16761262 ] Mitesh commented on SPARK-19468: Also curious why in the fix for SPARK-19931, it was only fixed for

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2019-02-05 Thread Mitesh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761207#comment-16761207 ] Mitesh commented on SPARK-19468: +1 I'm seeing the same behavior. It seems like any physical operator

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-03-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15906827#comment-15906827 ] Liang-Chi Hsieh commented on SPARK-19468: - We need a holistic solution for this issue. I

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-03-06 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896941#comment-15896941 ] Apache Spark commented on SPARK-19468: -- User 'viirya' has created a pull request for this issue:

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891658#comment-15891658 ] Kazuaki Ishizaki commented on SPARK-19468: -- Interesting. For {{val joined1 = ds1.joinWith(ds2,

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-02-06 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854242#comment-15854242 ] koert kuipers commented on SPARK-19468: --- so to summarize: RDD does what we would expect, DataFrame

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-02-06 Thread koert kuipers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854152#comment-15854152 ] koert kuipers commented on SPARK-19468: --- inserting unnecessary shuffles makes things very slow. and

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-02-06 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853809#comment-15853809 ] Sean Owen commented on SPARK-19468: --- I am unclear whether this is a bug report. You're saying one thing