Hi,
I'm running Spark on Yarn from an edge node, and the tasks on the run Data
Nodes. My job fails with the Too many open files error once it gets to
groupByKey(). Alternatively I can make it fail immediately if I repartition
the data when I create the RDD.
Where do I need to make sure that
Hi Arun,
The limit for the YARN user on the cluster nodes should be all that
matters. What version of Spark are you using? If you can turn on
sort-based shuffle it should solve this problem.
-Sandy
On Tue, Feb 10, 2015 at 1:16 PM, Arun Luthra arun.lut...@gmail.com wrote:
Hi,
I'm running