[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-27 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/19184 Thanks @mridulm , @jerryshao , @viirya . closing this PR. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-27 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19184 Thanks @jerryshao and @mridulm for investigating this further. It is very reasonable. I think we don't need this fix as the spill won't be too frequent in window operations now. ---

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-27 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19184 Thanks to @jerryshao for pointing me to SPARK-21595. The tests which @rajeshbalamohan did were with a version which did not include the changes in SPARK-21595; and unfortunately my local repo

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-27 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19184 After discussed with @mridulm offline. Though the patch here cannot address the issue of `getSortedIterator` - which uses a PriorityQueue, somehow it solves the problem of `getIterator(...)`

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-26 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19184 @jerryshao Actually the second half of your comment is not valid in this case. The PR is not targeting the merge sort in this case, but relevant when iterating over all tuples.

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-26 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19184 Hi @mridulm , sorry for late response. I agree with you that the scenario is different between here and shuffle, but the underlying structure and solutions to spill data is the same, so the

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-24 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/19184 @viirya @jerryshao To take a step back here. This specific issue is applicable to window operations and not to shuffle. In shuffle, you a much larger volume of data written per

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19184 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19184 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81628/ Test PASSed. ---

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19184 **[Test build #81628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81628/testReport)** for PR 19184 at commit

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19184 @rajeshbalamohan Thanks for updating. I think we need a complete fix as previous comments from the reviewers @jerryshao @kiszk @jiangxb1987 suggested. Can you try to fix this according to the

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19184 **[Test build #81628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81628/testReport)** for PR 19184 at commit

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-11 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/19184 Thanks @viirya . I have updated the patch to address your comments. This fixes the "too many files open" issue for (e.g Q67, Q72, Q14 etc) which involves window functions; but for

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19184 cc @cloud-fan @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19184 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81614/ Test PASSed. ---

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19184 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19184 **[Test build #81614 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81614/testReport)** for PR 19184 at commit

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread rajeshbalamohan
Github user rajeshbalamohan commented on the issue: https://github.com/apache/spark/pull/19184 I got into this with the limit of 32K. "unlimited" is another option which can be a workaround for this. But that may not be a preferable option in production systems. For e.g, with Q67 I

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19184 hmm, shouldn't we just change system config to increase the limit of open file? --- - To unsubscribe, e-mail:

[GitHub] spark issue #19184: [SPARK-21971][CORE] Too many open files in Spark due to ...

2017-09-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19184 **[Test build #81614 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81614/testReport)** for PR 19184 at commit