Actually, that did work, thanks.
What I previously tried that did not work was
#BSUB -env "all,SPARK_LOCAL_DIRS=/tmp,/share/xxxx,SPARK_PID_DIR=..."

However, I am still getting "No space left on device" errors. It seems that
I need hierarchical directories, and round robin distribution is not good
enough. Any suggestions for getting Spark to write to dir2 when dir1 fails?
Or if round robin can be implemented so that the first task attempt writes
to dir1, but if the 1st attempt fails, the 2nd task attempt is on dir2?

On Fri, Jan 12, 2024 at 10:23 PM Koert Kuipers <ko...@tresata.com> wrote:

> try it without spaces?
> export SPARK_LOCAL_DIRS="/tmp,/share/xxxx"
>
> On Fri, Jan 12, 2024 at 5:00 PM Andrew Petersen <aapet...@ncsu.edu.invalid>
> wrote:
>
>> Hello Spark community
>>
>> SPARK_LOCAL_DIRS or
>> spark.local.dir
>> is supposed to accept a list.
>>
>> I want to list one local (fast) drive, followed by a gpfs network drive,
>> similar to what is done here:
>>
>> https://cug.org/proceedings/cug2016_proceedings/includes/files/pap129s2-file1.pdf
>> "Thus it is preferable to bias the data towards faster storage by
>> including multiple directories on the faster devices (e.g., SPARK LOCAL
>> DIRS=/tmp/spark1, /tmp/spark2, /tmp/spark3, /lus/scratch/sparkscratch/)."
>> The purpose of this is to get both benefits of speed and avoiding "out of
>> space" errors.
>>
>> However, for me, Spark is only considering the 1st directory on the list:
>> export SPARK_LOCAL_DIRS="/tmp, /share/xxxx"
>>
>> I am using Spark 3.4.1. Does anyone have any experience getting this to
>> work? If so can you suggest a simple example I can try and tell me which
>> version of Spark you are using?
>>
>> Regards
>> Andrew
>>
>>
>>
>>
>> I am trying to use 2 local drives
>>
>> --
>> Andrew Petersen, PhD
>> Advanced Computing, Office of Information Technology
>> 2620 Hillsborough Street
>> datascience.oit.ncsu.edu
>>
>
> CONFIDENTIALITY NOTICE: This electronic communication and any files
> transmitted with it are confidential, privileged and intended solely for
> the use of the individual or entity to whom they are addressed. If you are
> not the intended recipient, you are hereby notified that any disclosure,
> copying, distribution (electronic or otherwise) or forwarding of, or the
> taking of any action in reliance on the contents of this transmission is
> strictly prohibited. Please notify the sender immediately by e-mail if you
> have received this email by mistake and delete this email from your system.
>
> Is it necessary to print this email? If you care about the environment
> like we do, please refrain from printing emails. It helps to keep the
> environment forested and litter-free.



-- 
Andrew Petersen, PhD
Advanced Computing, Office of Information Technology
2620 Hillsborough Street
datascience.oit.ncsu.edu

Reply via email to