Hi Song, For what I know in sort-based shuffle.
Normally parallel opened file numbers for sort-based shuffle is much smaller than hash-based shuffle. In hash based shuffle, parallel opened file numbers is C * R (where C is core number used and R is the reducer number), as you can see the file numbers are related to reducer number, no matter how large the shuffle size is. While in sort-based shuffle, final map output file is only 1, to achieve this we need to do by-partition sorting, this will generate some intermediate spilling files, but spilled file numbers are related to shuffle size and memory size for shuffle, no relation to reducer number. So If you met “too many open files” in sort-based shuffle, I guess that you have so many spilled files while doing shuffle write, one possible way to alleviate this is to increase the shuffle memory usage, also change the ulimit is a possible way. I guess in Yarn you have to do system configuration manually, Spark cannot set ulimit automatically for you, I don’t think it’s an issue Spark should take care. Thanks Jerry From: Chen Song [mailto:chen.song...@gmail.com] Sent: Tuesday, October 21, 2014 9:10 AM To: Andrew Ash Cc: Sunny Khatri; Lisonbee, Todd; u...@spark.incubator.apache.org Subject: Re: Shuffle files My observation is opposite. When my job runs under default spark.shuffle.manager, I don't see this exception. However, when it runs with SORT based, I start seeing this error? How would that be possible? I am running my job in YARN, and I noticed that the YARN process limits (cat /proc/$PID/limits) are not consistent with system wide limits (shown by limit -a), I don't know how that happened. Is there a way to let Spark driver to propagate this setting (limit -n <number>) to spark executors before startup? On Tue, Oct 7, 2014 at 11:53 PM, Andrew Ash <and...@andrewash.com<mailto:and...@andrewash.com>> wrote: You will need to restart your Mesos workers to pick up the new limits as well. On Tue, Oct 7, 2014 at 4:02 PM, Sunny Khatri <sunny.k...@gmail.com<mailto:sunny.k...@gmail.com>> wrote: @SK: Make sure ulimit has taken effect as Todd mentioned. You can verify via ulimit -a. Also make sure you have proper kernel parameters set in /etc/sysctl.conf (MacOSX) On Tue, Oct 7, 2014 at 3:57 PM, Lisonbee, Todd <todd.lison...@intel.com<mailto:todd.lison...@intel.com>> wrote: Are you sure the new ulimit has taken effect? How many cores are you using? How many reducers? "In general if a node in your cluster has C assigned cores and you run a job with X reducers then Spark will open C*X files in parallel and start writing. Shuffle consolidation will help decrease the total number of files created but the number of file handles open at any time doesn't change so it won't help the ulimit problem." Quoted from Patrick at: http://apache-spark-user-list.1001560.n3.nabble.com/quot-Too-many-open-files-quot-exception-on-reduceByKey-td2462.html Thanks, Todd -----Original Message----- From: SK [mailto:skrishna...@gmail.com<mailto:skrishna...@gmail.com>] Sent: Tuesday, October 7, 2014 2:12 PM To: u...@spark.incubator.apache.org<mailto:u...@spark.incubator.apache.org> Subject: Re: Shuffle files - We set ulimit to 500000. But I still get the same "too many open files" warning. - I tried setting consolidateFiles to True, but that did not help either. I am using a Mesos cluster. Does Mesos have any limit on the number of open files? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-files-tp15185p15869.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org> For additional commands, e-mail: user-h...@spark.apache.org<mailto:user-h...@spark.apache.org> -- Chen Song