Gents,

I'm building spark using the current master branch and deploying in to
Google Compute Engine on top of Hadoop 2.4/YARN via bdutil, Google's Hadoop
cluster provisioning tool. bdutils configures Spark with

spark.local.dir=/hadoop/spark/tmp,

but this option is ignored in combination with YARN. Bdutils also
configures YARN with:

  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/mnt/pd1/hadoop/yarn/nm-local-dir</value>
    <description>
      Directories on the local machine in which to application temp files.
    </description>
  </property>

This is the right directory for spark to store temporary data in. Still,
Spark is creating such directories as this:

/tmp/spark-51388ee6-9de6-411d-b9b9-ab6f9502d01e

and filling them up with gigabytes worth of output files, filling up the
very small root filesystem.

How can I diagnose why my Spark installation is not picking up the
yarn.nodemanager.local-dirs from yarn?

Alex

Reply via email to