Hi Song,

For what I know in sort-based shuffle.

Normally parallel opened file numbers for sort-based shuffle is much smaller 
than hash-based shuffle.

In hash based shuffle, parallel opened file numbers is C * R (where C is core 
number used and R is the reducer number), as you can see the file numbers are 
related to reducer number, no matter how large the shuffle size is.

While in sort-based shuffle, final map output file is only 1, to achieve this 
we need to do by-partition sorting, this will generate some intermediate 
spilling files, but spilled file numbers are related to shuffle size and memory 
size for shuffle, no relation to reducer number.

So If you met “too many open files” in sort-based shuffle, I guess that you 
have so many spilled files while doing shuffle write, one possible way to 
alleviate this is to increase the shuffle memory usage, also change the ulimit 
is a possible way.

I guess in Yarn you have to do system configuration manually, Spark cannot set 
ulimit automatically for you, I don’t think it’s an issue Spark should take 
care.

Thanks
Jerry

From: Chen Song [mailto:chen.song...@gmail.com]
Sent: Tuesday, October 21, 2014 9:10 AM
To: Andrew Ash
Cc: Sunny Khatri; Lisonbee, Todd; u...@spark.incubator.apache.org
Subject: Re: Shuffle files

My observation is opposite. When my job runs under default 
spark.shuffle.manager, I don't see this exception. However, when it runs with 
SORT based, I start seeing this error? How would that be possible?

I am running my job in YARN, and I noticed that the YARN process limits (cat 
/proc/$PID/limits) are not consistent with system wide limits (shown by limit 
-a), I don't know how that happened. Is there a way to let Spark driver to 
propagate this setting (limit -n <number>) to spark executors before startup?




On Tue, Oct 7, 2014 at 11:53 PM, Andrew Ash 
<and...@andrewash.com<mailto:and...@andrewash.com>> wrote:
You will need to restart your Mesos workers to pick up the new limits as well.

On Tue, Oct 7, 2014 at 4:02 PM, Sunny Khatri 
<sunny.k...@gmail.com<mailto:sunny.k...@gmail.com>> wrote:
@SK:
Make sure ulimit has taken effect as Todd mentioned. You can verify via ulimit 
-a. Also make sure you have proper kernel parameters set in /etc/sysctl.conf 
(MacOSX)

On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd 
<todd.lison...@intel.com<mailto:todd.lison...@intel.com>> wrote:

Are you sure the new ulimit has taken effect?

How many cores are you using?  How many reducers?

        "In general if a node in your cluster has C assigned cores and you run
        a job with X reducers then Spark will open C*X files in parallel and
        start writing. Shuffle consolidation will help decrease the total
        number of files created but the number of file handles open at any
        time doesn't change so it won't help the ulimit problem."

Quoted from Patrick at:
http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html

Thanks,

Todd

-----Original Message-----
From: SK [mailto:skrishna...@gmail.com<mailto:skrishna...@gmail.com>]
Sent: Tuesday, October 7, 2014 2:12 PM
To: u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org>
Subject: Re: Shuffle files

- We set ulimit to 500000. But I still get the same "too many open files"
warning.

- I tried setting consolidateFiles to True, but that did not help either.

I am using a Mesos cluster.   Does Mesos have any limit on the number of
open files?

thanks






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>


---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>





--
Chen Song

Reply via email to