[Wien] running wien2k on cluster
Hello, i have problem running on a cluster, I can't run paralel calculation dayfile start (mer. avril 26 21:48:01 CET 2017) with lapw0 (40/99 to go) cycle 1 (mer. avril 26 21:48:01 CET 2017) (40/99 to go) > lapw0 -p(21:48:01) starting parallel lapw0 at mer. avril 26 21:48:16 > CET 2017 .machine0 : 24 processors 0.030u 0.080s 0:12.95 0.8% 0+0k 0+304io 0pf+0w > lapw1 -p -c (21:48:24) starting parallel lapw1 at mer. avril 26 > 21:48:39 CET 2017 -> starting parallel LAPW1 jobs at mer. avril 26 21:48:44 CET 2017 running LAPW1 in parallel mode (using .machines) running lapw1c in single mode i have this error in job.out /tmp/slurmd/job03057/slurm_script: line 12: hostlist : commande introuvable .machine lapw0: :24 granularity:1 extrafine:1 i'm using this slurm script #!/bin/bash #SBATCH --mem=1024 #SBATCH --ntasks=12 #SBATCH --nodes=2 #SBATCH --output=job.out # set .machines for parallel job # lapw0 running on one node echo -n "lapw0: " > .machines echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines echo "$i:24" >> .machines echo granularity:1 >> .machines echo extrafine:1 >> .machines run_lapw -p -NI ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] running wien2k on cluster
As constructed your .machines file will only run lapw0 using mpi, and is missing lines for how to run lapw1. User error. On Wed, Apr 26, 2017 at 4:15 PM, ahmed amine wrote: > Hello, > > i have problem running on a cluster, I can't run paralel calculation > > dayfile > start (mer. avril 26 21:48:01 CET 2017) with lapw0 (40/99 to go) > > cycle 1 (mer. avril 26 21:48:01 CET 2017) (40/99 to go) > >> lapw0 -p(21:48:01) starting parallel lapw0 at mer. avril 26 21:48:16 >> CET 2017 > .machine0 : 24 processors > 0.030u 0.080s 0:12.95 0.8% 0+0k 0+304io 0pf+0w >> lapw1 -p -c (21:48:24) starting parallel lapw1 at mer. avril 26 >> 21:48:39 CET 2017 > -> starting parallel LAPW1 jobs at mer. avril 26 21:48:44 CET 2017 > running LAPW1 in parallel mode (using .machines) > running lapw1c in single mode > > i have this error in job.out > /tmp/slurmd/job03057/slurm_script: line 12: hostlist : commande introuvable > > .machine > lapw0: :24 > granularity:1 > extrafine:1 > > i'm using this slurm script > > #!/bin/bash > #SBATCH --mem=1024 > #SBATCH --ntasks=12 > #SBATCH --nodes=2 > #SBATCH --output=job.out > > # set .machines for parallel job > # lapw0 running on one node > echo -n "lapw0: " > .machines > echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines > echo "$i:24" >> .machines > > echo granularity:1 >> .machines > echo extrafine:1 >> .machines > > run_lapw -p -NI > -- Professor Laurence Marks "Research is to see what everybody else has seen, and to think what nobody else has thought", Albert Szent-Gyorgi www.numis.northwestern.edu ; Corrosion in 4D: MURI4D.numis.northwestern.edu Partner of the CFW 100% program for gender equity, www.cfw.org/100-percent Co-Editor, Acta Cryst A ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html
Re: [Wien] running wien2k on cluster
Also, it looks like job.out tells you that the problem is: slurm_script: line 12: hostlist : command not found My guess is that whomever setup slurm didn't get it from the slurm download website [1] and install it. If pip is installed, it can likely be installed using [2]: sudo pip install python-hostlist For one mpi job on each node using slurm, refer to [3]. References [1] https://slurm.schedmd.com/download.html [2] https://packaging.python.org/installing/#use-pip-for-installing [3] https://www.nsc.liu.se/systems/triolith/software/triolith-software-apps-wien2k.html On 4/26/2017 3:22 PM, Laurence Marks wrote: As constructed your .machines file will only run lapw0 using mpi, and is missing lines for how to run lapw1. User error. On Wed, Apr 26, 2017 at 4:15 PM, ahmed amine wrote: Hello, i have problem running on a cluster, I can't run paralel calculation dayfile start (mer. avril 26 21:48:01 CET 2017) with lapw0 (40/99 to go) cycle 1 (mer. avril 26 21:48:01 CET 2017) (40/99 to go) lapw0 -p(21:48:01) starting parallel lapw0 at mer. avril 26 21:48:16 CET 2017 .machine0 : 24 processors 0.030u 0.080s 0:12.95 0.8% 0+0k 0+304io 0pf+0w lapw1 -p -c (21:48:24) starting parallel lapw1 at mer. avril 26 21:48:39 CET 2017 -> starting parallel LAPW1 jobs at mer. avril 26 21:48:44 CET 2017 running LAPW1 in parallel mode (using .machines) running lapw1c in single mode i have this error in job.out /tmp/slurmd/job03057/slurm_script: line 12: hostlist : commande introuvable .machine lapw0: :24 granularity:1 extrafine:1 i'm using this slurm script #!/bin/bash #SBATCH --mem=1024 #SBATCH --ntasks=12 #SBATCH --nodes=2 #SBATCH --output=job.out # set .machines for parallel job # lapw0 running on one node echo -n "lapw0: " > .machines echo -n $(hostlist -e $SLURM_JOB_NODELIST | tail -1) >> .machines echo "$i:24" >> .machines echo granularity:1 >> .machines echo extrafine:1 >> .machines run_lapw -p -NI ___ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html