Re: [spark.local.dir] comma separated list does not work
Actually, that did work, thanks. What I previously tried that did not work was #BSUB -env "all,SPARK_LOCAL_DIRS=/tmp,/share/,SPARK_PID_DIR=..." However, I am still getting "No space left on device" errors. It seems that I need hierarchical directories, and round robin distribution is not good enough. Any suggestions for getting Spark to write to dir2 when dir1 fails? Or if round robin can be implemented so that the first task attempt writes to dir1, but if the 1st attempt fails, the 2nd task attempt is on dir2? On Fri, Jan 12, 2024 at 10:23 PM Koert Kuipers wrote: > try it without spaces? > export SPARK_LOCAL_DIRS="/tmp,/share/" > > On Fri, Jan 12, 2024 at 5:00 PM Andrew Petersen > wrote: > >> Hello Spark community >> >> SPARK_LOCAL_DIRS or >> spark.local.dir >> is supposed to accept a list. >> >> I want to list one local (fast) drive, followed by a gpfs network drive, >> similar to what is done here: >> >> https://cug.org/proceedings/cug2016_proceedings/includes/files/pap129s2-file1.pdf >> "Thus it is preferable to bias the data towards faster storage by >> including multiple directories on the faster devices (e.g., SPARK LOCAL >> DIRS=/tmp/spark1, /tmp/spark2, /tmp/spark3, /lus/scratch/sparkscratch/)." >> The purpose of this is to get both benefits of speed and avoiding "out of >> space" errors. >> >> However, for me, Spark is only considering the 1st directory on the list: >> export SPARK_LOCAL_DIRS="/tmp, /share/" >> >> I am using Spark 3.4.1. Does anyone have any experience getting this to >> work? If so can you suggest a simple example I can try and tell me which >> version of Spark you are using? >> >> Regards >> Andrew >> >> >> >> >> I am trying to use 2 local drives >> >> -- >> Andrew Petersen, PhD >> Advanced Computing, Office of Information Technology >> 2620 Hillsborough Street >> datascience.oit.ncsu.edu >> > > CONFIDENTIALITY NOTICE: This electronic communication and any files > transmitted with it are confidential, privileged and intended solely for > the use of the individual or entity to whom they are addressed. If you are > not the intended recipient, you are hereby notified that any disclosure, > copying, distribution (electronic or otherwise) or forwarding of, or the > taking of any action in reliance on the contents of this transmission is > strictly prohibited. Please notify the sender immediately by e-mail if you > have received this email by mistake and delete this email from your system. > > Is it necessary to print this email? If you care about the environment > like we do, please refrain from printing emails. It helps to keep the > environment forested and litter-free. -- Andrew Petersen, PhD Advanced Computing, Office of Information Technology 2620 Hillsborough Street datascience.oit.ncsu.edu
Re: [spark.local.dir] comma separated list does not work
Without spaces was the first thing I tried. The information in the pdf file inspired me to try the space. On Fri, Jan 12, 2024 at 10:23 PM Koert Kuipers wrote: > try it without spaces? > export SPARK_LOCAL_DIRS="/tmp,/share/" > > On Fri, Jan 12, 2024 at 5:00 PM Andrew Petersen > wrote: > >> Hello Spark community >> >> SPARK_LOCAL_DIRS or >> spark.local.dir >> is supposed to accept a list. >> >> I want to list one local (fast) drive, followed by a gpfs network drive, >> similar to what is done here: >> >> https://cug.org/proceedings/cug2016_proceedings/includes/files/pap129s2-file1.pdf >> "Thus it is preferable to bias the data towards faster storage by >> including multiple directories on the faster devices (e.g., SPARK LOCAL >> DIRS=/tmp/spark1, /tmp/spark2, /tmp/spark3, /lus/scratch/sparkscratch/)." >> The purpose of this is to get both benefits of speed and avoiding "out of >> space" errors. >> >> However, for me, Spark is only considering the 1st directory on the list: >> export SPARK_LOCAL_DIRS="/tmp, /share/" >> >> I am using Spark 3.4.1. Does anyone have any experience getting this to >> work? If so can you suggest a simple example I can try and tell me which >> version of Spark you are using? >> >> Regards >> Andrew >> >> >> >> >> I am trying to use 2 local drives >> >> -- >> Andrew Petersen, PhD >> Advanced Computing, Office of Information Technology >> 2620 Hillsborough Street >> datascience.oit.ncsu.edu >> > > CONFIDENTIALITY NOTICE: This electronic communication and any files > transmitted with it are confidential, privileged and intended solely for > the use of the individual or entity to whom they are addressed. If you are > not the intended recipient, you are hereby notified that any disclosure, > copying, distribution (electronic or otherwise) or forwarding of, or the > taking of any action in reliance on the contents of this transmission is > strictly prohibited. Please notify the sender immediately by e-mail if you > have received this email by mistake and delete this email from your system. > > Is it necessary to print this email? If you care about the environment > like we do, please refrain from printing emails. It helps to keep the > environment forested and litter-free. -- Andrew Petersen, PhD Advanced Computing, Office of Information Technology 2620 Hillsborough Street datascience.oit.ncsu.edu
Re: [spark.local.dir] comma separated list does not work
try it without spaces? export SPARK_LOCAL_DIRS="/tmp,/share/" On Fri, Jan 12, 2024 at 5:00 PM Andrew Petersen wrote: > Hello Spark community > > SPARK_LOCAL_DIRS or > spark.local.dir > is supposed to accept a list. > > I want to list one local (fast) drive, followed by a gpfs network drive, > similar to what is done here: > > https://cug.org/proceedings/cug2016_proceedings/includes/files/pap129s2-file1.pdf > "Thus it is preferable to bias the data towards faster storage by > including multiple directories on the faster devices (e.g., SPARK LOCAL > DIRS=/tmp/spark1, /tmp/spark2, /tmp/spark3, /lus/scratch/sparkscratch/)." > The purpose of this is to get both benefits of speed and avoiding "out of > space" errors. > > However, for me, Spark is only considering the 1st directory on the list: > export SPARK_LOCAL_DIRS="/tmp, /share/" > > I am using Spark 3.4.1. Does anyone have any experience getting this to > work? If so can you suggest a simple example I can try and tell me which > version of Spark you are using? > > Regards > Andrew > > > > > I am trying to use 2 local drives > > -- > Andrew Petersen, PhD > Advanced Computing, Office of Information Technology > 2620 Hillsborough Street > datascience.oit.ncsu.edu > -- CONFIDENTIALITY NOTICE: This electronic communication and any files transmitted with it are confidential, privileged and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution (electronic or otherwise) or forwarding of, or the taking of any action in reliance on the contents of this transmission is strictly prohibited. Please notify the sender immediately by e-mail if you have received this email by mistake and delete this email from your system. Is it necessary to print this email? If you care about the environment like we do, please refrain from printing emails. It helps to keep the environment forested and litter-free.
[spark.local.dir] comma separated list does not work
Hello Spark community SPARK_LOCAL_DIRS or spark.local.dir is supposed to accept a list. I want to list one local (fast) drive, followed by a gpfs network drive, similar to what is done here: https://cug.org/proceedings/cug2016_proceedings/includes/files/pap129s2-file1.pdf "Thus it is preferable to bias the data towards faster storage by including multiple directories on the faster devices (e.g., SPARK LOCAL DIRS=/tmp/spark1, /tmp/spark2, /tmp/spark3, /lus/scratch/sparkscratch/)." The purpose of this is to get both benefits of speed and avoiding "out of space" errors. However, for me, Spark is only considering the 1st directory on the list: export SPARK_LOCAL_DIRS="/tmp, /share/" I am using Spark 3.4.1. Does anyone have any experience getting this to work? If so can you suggest a simple example I can try and tell me which version of Spark you are using? Regards Andrew I am trying to use 2 local drives -- Andrew Petersen, PhD Advanced Computing, Office of Information Technology 2620 Hillsborough Street datascience.oit.ncsu.edu