Hi,

I'm running Spark on Yarn from an edge node, and the tasks on the run Data
Nodes. My job fails with the "Too many open files" error once it gets to
groupByKey(). Alternatively I can make it fail immediately if I repartition
the data when I create the RDD.

Where do I need to make sure that ulimit -n is high enough?

On the edge node it is small, 1024, but on the data nodes, the "yarn" user
has a high limit, 32k. But is the yarn user the relevant user? And, is the
1024 limit for myself on the edge node a problem or is that limit not
relevant?

Arun

Reply via email to