Hi Ken, sorry can i just ask you a question on this? Thibaut (post below) was actually able to use your function to set Julia up on our departmental SGE cluster and it works fine. That facility is down with a disk failure for the time being though, so I have been trying to get going on a different PBS cluster. I can do interactive on one node, but I am stuck going across nodes. I'm not sure I understand what you mean when you say "convert to IB interface names". The names on my PBS_NODEFILE are the same for each node, i.e. all CPUs on node 1 are called node 1.
I wrote this up in more detail in this SO post. http://stackoverflow.com/questions/25089733/julia-on-pbs-cluster-what-to-give-to-addprocs It ends up throwing an SSH error. Do you have any clue what might be going wrong here? Thanks! Florian On Wednesday, 7 May 2014 20:52:17 UTC+1, ken...@sdsc.edu wrote: > > > I've run parallel julia on a Torque cluster with Infiniband. I start an > interactive session with qsub -I, > look for allocated nodes in $PBS_NODEFILE, convert to IB interface names, > and addprocs. > > filestream = open(ENV["PBS_NODEFILE"]) > seekstart(filestream) > linearray = readlines(filestream) > strippedarray = similar(linearray) > for i in 1:length(linearray) > strippedarray[i] = strip(linearray[i]) * "-ipoib.ipoib" > end > for i in 1:length(strippedarray) > singlearray = [strip(strippedarray[i])] > addprocs(singlearray) > end > print(workers()) > > To start an interactive job, depending on your node configuration and > queue names: > qsub -I -l nodes=2:ppn=32,walltime=00:30:00 -q normal > > When you get your nodes, start julia with the above setup file with: > julia --load setupfilename > > This should addprocs then give you the julia prompt. > > But it looks like something is wrong with your modules? > > On Friday, April 25, 2014 5:09:57 AM UTC-7, Isaac wrote: >> >> Hi All, >> >> I also tried to submit the julia jobs on the cluster but failed. I wrote >> the job script as follows: >> f >> >> >> >> >> >> >> >> >> >> >> *or((i = 1; i < 10; i++))doecho "# cd /data#PBS -l >> walltime=00:10:00module add gcc/4.7.2module add julia/0.2.0module load >> juliainclude("test.jl")test($i)">test1job$i;qsub test1job$i;done* >> I got the errors: >> julia/0.2.0(16):ERROR:151: Module 'julia/0.2.0' depends on one of the >> module(s) 'gcc/4.7.2' >> julia/0.2.0(16):ERROR:102: Tcl command execution failed: prereq gcc/4.7.2 >> >> /cm/local/apps/torque/current/mom_priv/jobs/1053.cluster.SC: line 7: >> syntax error near unexpected token `a0d0.jl' >> /cm/local/apps/torque/current/mom_priv/jobs/1053.cluster.SC: line 7: >> `include(a0d0.jl)' >> >> Does anybody know how to write the job script to submit julia job on a >> cluster? Could you give an example? >> Thanks in advance. >> >> Isaac >> >> >> >> >>