Hello Spark community

SPARK_LOCAL_DIRS or
spark.local.dir
is supposed to accept a list.

I want to list one local (fast) drive, followed by a gpfs network drive,
similar to what is done here:
https://cug.org/proceedings/cug2016_proceedings/includes/files/pap129s2-file1.pdf
"Thus it is preferable to bias the data towards faster storage by including
multiple directories on the faster devices (e.g., SPARK LOCAL
DIRS=/tmp/spark1, /tmp/spark2, /tmp/spark3, /lus/scratch/sparkscratch/)."
The purpose of this is to get both benefits of speed and avoiding "out of
space" errors.

However, for me, Spark is only considering the 1st directory on the list:
export SPARK_LOCAL_DIRS="/tmp, /share/xxxx"

I am using Spark 3.4.1. Does anyone have any experience getting this to
work? If so can you suggest a simple example I can try and tell me which
version of Spark you are using?

Regards
Andrew




I am trying to use 2 local drives

-- 
Andrew Petersen, PhD
Advanced Computing, Office of Information Technology
2620 Hillsborough Street
datascience.oit.ncsu.edu

Reply via email to