Re: Too many open files issue

2015-09-03 Thread Sigurd Knippenberg
I don't think that is the issue. I have it setup to run in a thread pool but I have set the pool size to 1 for this test until I get this resolved. I am having some problems with using the Spark web portal since it is picking a random port and with the way my environment is setup, by time I have

Re: Too many open files issue

2015-09-02 Thread Steve Loughran
On 31 Aug 2015, at 19:49, Sigurd Knippenberg > wrote: I know I can adjust the max open files allowed by the OS but I'd rather fix the underlaying issue. bumping up the OS handle limits is step #1 of installing a hadoop cluster

Re: Too many open files issue

2015-09-02 Thread Saisai Shao
Here is the code in which NewHadoopRDD register close handler and be called when the task is completed ( https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L136 ). >From my understanding, possibly the reason is that this `foreach` code in your

Re: Too many open files issue

2015-09-02 Thread Steve Loughran
ah, now that does sound suspicious... On 2 Sep 2015, at 14:09, Sigurd Knippenberg > wrote: Yep. I know. It's was set to 32K when I ran this test. If I bump it to 64K the issue goes away. It still doesn't make sense to me that the Spark job

Re: Too many open files issue

2015-09-02 Thread Sigurd Knippenberg
Yep. I know. It's was set to 32K when I ran this test. If I bump it to 64K the issue goes away. It still doesn't make sense to me that the Spark job doesn't release its file handles until the end of the job instead of doing that while my loop iterates. Sigurd On Wed, Sep 2, 2015 at 4:33 AM,

Too many open files issue

2015-08-31 Thread Sigurd Knippenberg
I am running in a 'too many open files' issue and before I posted this I have searched on the web to see if anyone had a solution already to my particular problem but I did not see anything that helped. I know I can adjust the max open files allowed by the OS but I'd rather fix the underlaying