[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-05 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14039 I haven't got anything more concrete to offer at this time than the descriptions in the relevant JIRA's, but I do have this running in production with 1.6, and it does work. Essentially, you

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61738/ Test PASSed. ---

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61738/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14039 @markhamstra Thanks for the comment. I think the reuse of fragments highly depends on user's queries, catalyst optimizer, cluster resources... Reusing `ShuffledRowRDD` shuffle data in a single job

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14039 @srowen My understanding is that shuffle data in stages are possibly shared in a job. However, once the job is finished, the current implementation cannot reuse the shuffle data anymore. So, we can

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61738/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61717/ Test PASSed. ---

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61717/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61717/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61715/ Test PASSed. ---

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61715 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61715/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61715 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61715/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/14039 Actually, they can be reused -- not in Spark as distributed, but it is an open question whether reusing shuffle files within Spark SQL is something that we should be doing and want to support.

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14039 @srowen thanks for the comment. Yea, I noticed that and I'm fixing this to remove only shuffle files generated by `ShuffleExchange`. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-04 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14039 I don't think we do this in general. The shuffle files are supposed to remain to potentially be reused if the stage needs to be re-executed. --- If your project is set up for it, you can reply to

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61702/ Test FAILed. ---

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61702/consoleFull)** for PR 14039 at commit

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61702/consoleFull)** for PR 14039 at commit