Re: [OMPI users] Unable to find the following executable
Which executable is it not finding? mpirun? Your application? On Wed, Nov 17, 2010 at 7:49 PM, Tushar Andriyas wrote: > Hi there, > > I am new to using mpi commands and was stuck in problem with running a > code. When I submit my job through a batch file, the job exits with the > message that the executable could not be found on the machines. I have tried > a lot of options such as PBS -V and so on on but the problem persists. If > someone is interested, I can send the full info on the cluster, the compiler > and openmpi settings and other stuff. BTW the launcher is torque (which you > might have guessed). The code does not have a forum so I am in a deep mire. > > Thanks, > Tushar > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Unable to find the following executable
Hello Tushar, Have you tried supplying the full path of the executable just to check ? Rangam From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Wednesday, November 17, 2010 8:49 PM To: us...@open-mpi.org Subject: [OMPI users] Unable to find the following executable Hi there, I am new to using mpi commands and was stuck in problem with running a code. When I submit my job through a batch file, the job exits with the message that the executable could not be found on the machines. I have tried a lot of options such as PBS -V and so on on but the problem persists. If someone is interested, I can send the full info on the cluster, the compiler and openmpi settings and other stuff. BTW the launcher is torque (which you might have guessed). The code does not have a forum so I am in a deep mire. Thanks, Tushar
Re: [OMPI users] Unable to find the following executable
Hi there, Thanks for the expedite reply. The thing is that although the mpirun is setup correctly (since a simple hello world works), when I run the main SWMF.exe executable, the cluster machines somehow fail to find the executable (SWMF.exe). So, I have attached the sample error file from one of the runs (SWMF.e143438) and also the MAKEFILES so that you could better gauge the problem. The makefiles have Linux as the OS and pgf90 as compiler with mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a batch file (job.bats) and the scheduler is Torque (version I am not sure but I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). I have also attached an error file from one of the clusters (WASATCH viz SWMF.e143439) and UINTA (SWMF.e143440) with the *whole path of the exe as Srirangam mentioned as follows (in the batch file)*. mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` I have tried both mpirun and mpiexec but nothing seems to work. Tushar On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > Have you tried supplying the full path of the executable just to check ? > Rangam > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Wednesday, November 17, 2010 8:49 PM > To: us...@open-mpi.org > Subject: [OMPI users] Unable to find the following executable > > Hi there, > > I am new to using mpi commands and was stuck in problem with running a > code. When I submit my job through a batch file, the job exits with the > message that the executable could not be found on the machines. I have tried > a lot of options such as PBS -V and so on on but the problem persists. If > someone is interested, I can send the full info on the cluster, the compiler > and openmpi settings and other stuff. BTW the launcher is torque (which you > might have guessed). The code does not have a forum so I am in a deep mire. > > Thanks, > Tushar > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > job.bats Description: Binary data Makefile.def Description: Binary data Makefile.conf Description: Binary data SWMF.e143438 Description: Binary data SWMF.e143439 Description: Binary data SWMF.e143440 Description: Binary data
Re: [OMPI users] Unable to find the following executable
Is you "hello world" test program in the same directory as SWMF? Is it possible that the path you are specifying is not available on all of the remote machines? That's the most common problem we see. On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas wrote: > Hi there, > > Thanks for the expedite reply. The thing is that although the mpirun is > setup correctly (since a simple hello world works), when I run the main > SWMF.exe executable, the cluster machines somehow fail to find the > executable (SWMF.exe). > > So, I have attached the sample error file from one of the runs > (SWMF.e143438) and also the MAKEFILES so that you could better gauge the > problem. The makefiles have Linux as the OS and pgf90 as compiler with > mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a > batch file (job.bats) and the scheduler is Torque (version I am not sure but > I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). > > I have also attached an error file from one of the clusters (WASATCH viz > SWMF.e143439) and UINTA (SWMF.e143440) with the *whole path of the exe as > Srirangam mentioned as follows (in the batch file)*. > > mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi > /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` > > I have tried both mpirun and mpiexec but nothing seems to work. > > Tushar > > > On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu> wrote: > >> Hello Tushar, >> Have you tried supplying the full path of the executable just to check ? >> Rangam >> >> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf >> Of Tushar Andriyas [thugnomic...@gmail.com] >> Sent: Wednesday, November 17, 2010 8:49 PM >> To: us...@open-mpi.org >> Subject: [OMPI users] Unable to find the following executable >> >> Hi there, >> >> I am new to using mpi commands and was stuck in problem with running a >> code. When I submit my job through a batch file, the job exits with the >> message that the executable could not be found on the machines. I have tried >> a lot of options such as PBS -V and so on on but the problem persists. If >> someone is interested, I can send the full info on the cluster, the compiler >> and openmpi settings and other stuff. BTW the launcher is torque (which you >> might have guessed). The code does not have a forum so I am in a deep mire. >> >> Thanks, >> Tushar >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Unable to find the following executable
no its not in the same directory as SWMF. I guess the path is the same since all the machines in a cluster are configured d same way. How do I know if this is not the case? On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain wrote: > Is you "hello world" test program in the same directory as SWMF? Is it > possible that the path you are specifying is not available on all of the > remote machines? That's the most common problem we see. > > > On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas > wrote: > >> Hi there, >> >> Thanks for the expedite reply. The thing is that although the mpirun is >> setup correctly (since a simple hello world works), when I run the main >> SWMF.exe executable, the cluster machines somehow fail to find the >> executable (SWMF.exe). >> >> So, I have attached the sample error file from one of the runs >> (SWMF.e143438) and also the MAKEFILES so that you could better gauge the >> problem. The makefiles have Linux as the OS and pgf90 as compiler with >> mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a >> batch file (job.bats) and the scheduler is Torque (version I am not sure but >> I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). >> >> I have also attached an error file from one of the clusters (WASATCH viz >> SWMF.e143439) and UINTA (SWMF.e143440) with the *whole path of the exe as >> Srirangam mentioned as follows (in the batch file)*. >> >> mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi >> /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` >> >> I have tried both mpirun and mpiexec but nothing seems to work. >> >> Tushar >> >> >> On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < >> srirangam.v.addepa...@ttu.edu> wrote: >> >>> Hello Tushar, >>> Have you tried supplying the full path of the executable just to check ? >>> Rangam >>> >>> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf >>> Of Tushar Andriyas [thugnomic...@gmail.com] >>> Sent: Wednesday, November 17, 2010 8:49 PM >>> To: us...@open-mpi.org >>> Subject: [OMPI users] Unable to find the following executable >>> >>> Hi there, >>> >>> I am new to using mpi commands and was stuck in problem with running a >>> code. When I submit my job through a batch file, the job exits with the >>> message that the executable could not be found on the machines. I have tried >>> a lot of options such as PBS -V and so on on but the problem persists. If >>> someone is interested, I can send the full info on the cluster, the compiler >>> and openmpi settings and other stuff. BTW the launcher is torque (which you >>> might have guessed). The code does not have a forum so I am in a deep mire. >>> >>> Thanks, >>> Tushar >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Unable to find the following executable
You can qsub a simple "ls" on that path - that will tell you if the path is valid on all machines in that allocation. What typically happens is that home directories aren't remotely mounted, or are mounted on a different location. On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas wrote: > no its not in the same directory as SWMF. I guess the path is the same > since all the machines in a cluster are configured d same way. How do I know > if this is not the case? > > > On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain wrote: > >> Is you "hello world" test program in the same directory as SWMF? Is it >> possible that the path you are specifying is not available on all of the >> remote machines? That's the most common problem we see. >> >> >> On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas >> wrote: >> >>> Hi there, >>> >>> Thanks for the expedite reply. The thing is that although the mpirun is >>> setup correctly (since a simple hello world works), when I run the main >>> SWMF.exe executable, the cluster machines somehow fail to find the >>> executable (SWMF.exe). >>> >>> So, I have attached the sample error file from one of the runs >>> (SWMF.e143438) and also the MAKEFILES so that you could better gauge the >>> problem. The makefiles have Linux as the OS and pgf90 as compiler with >>> mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a >>> batch file (job.bats) and the scheduler is Torque (version I am not sure but >>> I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). >>> >>> I have also attached an error file from one of the clusters (WASATCH viz >>> SWMF.e143439) and UINTA (SWMF.e143440) with the *whole path of the exe >>> as Srirangam mentioned as follows (in the batch file)*. >>> >>> mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi >>> /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` >>> >>> I have tried both mpirun and mpiexec but nothing seems to work. >>> >>> Tushar >>> >>> >>> On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < >>> srirangam.v.addepa...@ttu.edu> wrote: >>> Hello Tushar, Have you tried supplying the full path of the executable just to check ? Rangam From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Wednesday, November 17, 2010 8:49 PM To: us...@open-mpi.org Subject: [OMPI users] Unable to find the following executable Hi there, I am new to using mpi commands and was stuck in problem with running a code. When I submit my job through a batch file, the job exits with the message that the executable could not be found on the machines. I have tried a lot of options such as PBS -V and so on on but the problem persists. If someone is interested, I can send the full info on the cluster, the compiler and openmpi settings and other stuff. BTW the launcher is torque (which you might have guessed). The code does not have a forum so I am in a deep mire. Thanks, Tushar ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Unable to find the following executable
It just gives back the info on folders in my home directory. Dont get me wrong but i m kinda new in this. So, could u type out d full command which i need to give? Tushar On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain wrote: > You can qsub a simple "ls" on that path - that will tell you if the path is > valid on all machines in that allocation. > > What typically happens is that home directories aren't remotely mounted, or > are mounted on a different location. > > > On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas > wrote: > >> no its not in the same directory as SWMF. I guess the path is the same >> since all the machines in a cluster are configured d same way. How do I know >> if this is not the case? >> >> >> On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain wrote: >> >>> Is you "hello world" test program in the same directory as SWMF? Is it >>> possible that the path you are specifying is not available on all of the >>> remote machines? That's the most common problem we see. >>> >>> >>> On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas >> > wrote: >>> Hi there, Thanks for the expedite reply. The thing is that although the mpirun is setup correctly (since a simple hello world works), when I run the main SWMF.exe executable, the cluster machines somehow fail to find the executable (SWMF.exe). So, I have attached the sample error file from one of the runs (SWMF.e143438) and also the MAKEFILES so that you could better gauge the problem. The makefiles have Linux as the OS and pgf90 as compiler with mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a batch file (job.bats) and the scheduler is Torque (version I am not sure but I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). I have also attached an error file from one of the clusters (WASATCH viz SWMF.e143439) and UINTA (SWMF.e143440) with the *whole path of the exe as Srirangam mentioned as follows (in the batch file)*. mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` I have tried both mpirun and mpiexec but nothing seems to work. Tushar On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > Have you tried supplying the full path of the executable just to check > ? > Rangam > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On > Behalf Of Tushar Andriyas [thugnomic...@gmail.com] > Sent: Wednesday, November 17, 2010 8:49 PM > To: us...@open-mpi.org > Subject: [OMPI users] Unable to find the following executable > > Hi there, > > I am new to using mpi commands and was stuck in problem with running a > code. When I submit my job through a batch file, the job exits with the > message that the executable could not be found on the machines. I have > tried > a lot of options such as PBS -V and so on on but the problem persists. If > someone is interested, I can send the full info on the cluster, the > compiler > and openmpi settings and other stuff. BTW the launcher is torque (which > you > might have guessed). The code does not have a forum so I am in a deep > mire. > > Thanks, > Tushar > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Unable to find the following executable
Hello Tushar, Try the following script. #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=2:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes #mpirun --preload-binary SWMF.exe ls /home/A00945081/SWMF_v2.3/run/SWMF.exe The objective is to check if your user directories are auto mounted on compute nodes and are available during run time. If the job returns information about SWMF.exe then it can be safely assumed that user directories are being auto mounted. Rangam From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Friday, November 19, 2010 8:35 AM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable It just gives back the info on folders in my home directory. Dont get me wrong but i m kinda new in this. So, could u type out d full command which i need to give? Tushar On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain mailto:r...@open-mpi.org>> wrote: You can qsub a simple "ls" on that path - that will tell you if the path is valid on all machines in that allocation. What typically happens is that home directories aren't remotely mounted, or are mounted on a different location. On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas mailto:thugnomic...@gmail.com>> wrote: no its not in the same directory as SWMF. I guess the path is the same since all the machines in a cluster are configured d same way. How do I know if this is not the case? On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain mailto:r...@open-mpi.org>> wrote: Is you "hello world" test program in the same directory as SWMF? Is it possible that the path you are specifying is not available on all of the remote machines? That's the most common problem we see. On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas mailto:thugnomic...@gmail.com>> wrote: Hi there, Thanks for the expedite reply. The thing is that although the mpirun is setup correctly (since a simple hello world works), when I run the main SWMF.exe executable, the cluster machines somehow fail to find the executable (SWMF.exe). So, I have attached the sample error file from one of the runs (SWMF.e143438) and also the MAKEFILES so that you could better gauge the problem. The makefiles have Linux as the OS and pgf90 as compiler with mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a batch file (job.bats) and the scheduler is Torque (version I am not sure but I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). I have also attached an error file from one of the clusters (WASATCH viz SWMF.e143439) and UINTA (SWMF.e143440) with the whole path of the exe as Srirangam mentioned as follows (in the batch file). mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` I have tried both mpirun and mpiexec but nothing seems to work. Tushar On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu>> wrote: Hello Tushar, Have you tried supplying the full path of the executable just to check ? Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] Sent: Wednesday, November 17, 2010 8:49 PM To: us...@open-mpi.org<mailto:us...@open-mpi.org> Subject: [OMPI users] Unable to find the following executable Hi there, I am new to using mpi commands and was stuck in problem with running a code. When I submit my job through a batch file, the job exits with the message that the executable could not be found on the machines. I have tried a lot of options such as PBS -V and so on on but the problem persists. If someone is interested, I can send the full info on the cluster, the compiler and openmpi settings and other stuff. BTW the launcher is torque (which you might have guessed). The code does not have a forum so I am in a deep mire. Thanks, Tushar ___ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org<mailto:us...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/users __
Re: [OMPI users] Unable to find the following executable
Hey Rangam, I tried out the batch script and the error file comes out empty and the output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE MACHINE) and the same with multiple machines in the run. So, does that mean that the exe is auto mounted ? What should I do next? Tushar On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > > Try the following script. > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > > #mpirun --preload-binary SWMF.exe > ls /home/A00945081/SWMF_v2.3/run/SWMF.exe > > The objective is to check if your user directories are auto mounted on > compute nodes and are available during run time. > > If the job returns information about SWMF.exe then it can be safely assumed > that user directories are being auto mounted. > > Rangam > > > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Friday, November 19, 2010 8:35 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > It just gives back the info on folders in my home directory. Dont get me > wrong but i m kinda new in this. So, could u type out d full command which i > need to give? > > Tushar > > On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain r...@open-mpi.org>> wrote: > You can qsub a simple "ls" on that path - that will tell you if the path is > valid on all machines in that allocation. > > What typically happens is that home directories aren't remotely mounted, or > are mounted on a different location. > > > On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas <mailto:thugnomic...@gmail.com>> wrote: > no its not in the same directory as SWMF. I guess the path is the same > since all the machines in a cluster are configured d same way. How do I know > if this is not the case? > > > On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain r...@open-mpi.org>> wrote: > Is you "hello world" test program in the same directory as SWMF? Is it > possible that the path you are specifying is not available on all of the > remote machines? That's the most common problem we see. > > > On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas <mailto:thugnomic...@gmail.com>> wrote: > Hi there, > > Thanks for the expedite reply. The thing is that although the mpirun is > setup correctly (since a simple hello world works), when I run the main > SWMF.exe executable, the cluster machines somehow fail to find the > executable (SWMF.exe). > > So, I have attached the sample error file from one of the runs > (SWMF.e143438) and also the MAKEFILES so that you could better gauge the > problem. The makefiles have Linux as the OS and pgf90 as compiler with > mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a > batch file (job.bats) and the scheduler is Torque (version I am not sure but > I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). > > I have also attached an error file from one of the clusters (WASATCH viz > SWMF.e143439) and UINTA (SWMF.e143440) with the whole path of the exe as > Srirangam mentioned as follows (in the batch file). > > mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi > /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` > > I have tried both mpirun and mpiexec but nothing seems to work. > > Tushar > > > On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>> > wrote: > Hello Tushar, > Have you tried supplying the full path of the executable just to check ? > Rangam > > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] > Sent: Wednesday, November 17, 2010 8:49 PM > To: us...@open-mpi.org<mailto:us...@open-mpi.org> > Subject: [OMPI users] Unable to find the following executable > > Hi there, > > I am new to using mpi commands and was stuck in problem with running a > code. When I submit my job through a batch file, the job exits with the > message that the executable could not be found on the machines. I have tried > a lot of options such as PBS -V and so on on but the problem persists. If >
Re: [OMPI users] Unable to find the following executable
Hello Tushar, After looking at the log files you attached it appears that there are multiple issues. [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. Usually they occur if there is a mismatch in mpirun version and mca blt selection. I suggest the following order to check if the job actually works on a single node #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=2:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes mpirun -np 8 -machinefile list_of_nodes /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log Rangam From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Friday, November 19, 2010 1:11 PM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Hey Rangam, I tried out the batch script and the error file comes out empty and the output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE MACHINE) and the same with multiple machines in the run. So, does that mean that the exe is auto mounted ? What should I do next? Tushar On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu>> wrote: Hello Tushar, Try the following script. #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=1:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes The objective is to check if your user directories are auto mounted on compute nodes and are available during run time. If the job returns information about SWMF.exe then it can be safely assumed that user directories are being auto mounted. Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] Sent: Friday, November 19, 2010 8:35 AM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable It just gives back the info on folders in my home directory. Dont get me wrong but i m kinda new in this. So, could u type out d full command which i need to give? Tushar On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain mailto:r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>>> wrote: You can qsub a simple "ls" on that path - that will tell you if the path is valid on all machines in that allocation. What typically happens is that home directories aren't remotely mounted, or are mounted on a different location. On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>> wrote: no its not in the same directory as SWMF. I guess the path is the same since all the machines in a cluster are configured d same way. How do I know if this is not the case? On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain mailto:r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>>> wrote: Is you "hello world" test program in the same directory as SWMF? Is it possible that the path you are specifying is not available on all of the remote machines? That's the most common problem we see. On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>> wrote: Hi there, Thanks for the expedite reply. The thing is that although the mpirun is setup correctly (since a simple hello world works), when I run the main SWMF.exe executable, the cluster machines somehow fail to find the executable (SWMF.exe). So, I have attached the sample error file from one of the runs (SWMF.e143438) and also the MAKEFILES so that you could better gauge the problem. The makefiles have Linux as the OS and pgf90 as compiler with mpif90 as the linker. I am using openmpi-1.2.7-pgi. Job is submitted using a batch file (job.bats) and the scheduler is Torque (version I am not sure but I can see three on the machines viz 2.0.0, 2.2.1, 2.5.2). I have also attached an error file from one of the clusters (WASATCH viz SWMF.e143439) and UINTA (SWMF.e143440) with the whole path of the exe as Srirangam mentioned as follows (in the batch file). mpirun --prefix /opt/libraries/openmpi/openmpi-1.2.7-pgi /home/A00945081/SWMF_v2.3/run/SWMF.exe > runlog_`date +%y%m%d%H%M` I have tried both mpirun and mpiexec but nothing seems to work. Tushar On Wed, Nov 17, 2010 at 8:12 PM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu><mailto:srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>
Re: [OMPI users] Unable to find the following executable
Hi Rangam, I ran the batch file that you gave and have attached the error file. Also, since the WASATCH cluster is kind of small, people usually run on UINTA. So, if possible could you look at the uinta error files? Tushar On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > After looking at the log files you attached it appears that there are > multiple issues. > > [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. > Another transport will be used instead, although this may result in > lower performance. > > Usually they occur if there is a mismatch in mpirun version and mca blt > selection. I suggest the following order to check if the job actually works > on a single node > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 -machinefile list_of_nodes > /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Friday, November 19, 2010 1:11 PM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Hey Rangam, > > I tried out the batch script and the error file comes out empty and the > output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE > MACHINE) and the same with multiple machines in the run. So, does that mean > that the exe is auto mounted ? What should I do next? > > Tushar > > On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>> > wrote: > Hello Tushar, > > Try the following script. > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=1:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > > > > > The objective is to check if your user directories are auto mounted on > compute nodes and are available during run time. > > If the job returns information about SWMF.exe then it can be safely assumed > that user directories are being auto mounted. > > Rangam > > > > > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] > Sent: Friday, November 19, 2010 8:35 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > It just gives back the info on folders in my home directory. Dont get me > wrong but i m kinda new in this. So, could u type out d full command which i > need to give? > > Tushar > > On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>>> > wrote: > You can qsub a simple "ls" on that path - that will tell you if the path is > valid on all machines in that allocation. > > What typically happens is that home directories aren't remotely mounted, or > are mounted on a different location. > > > On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas <mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com thugnomic...@gmail.com>>> wrote: > no its not in the same directory as SWMF. I guess the path is the same > since all the machines in a cluster are configured d same way. How do I know > if this is not the case? > > > On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>>> > wrote: > Is you "hello world" test program in the same directory as SWMF? Is it > possible that the path you are specifying is not available on all of the > remote machines? That's the most common problem we see. > > > On Thu, Nov 18, 2010 at 7:59 AM, Tushar Andriyas <mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com thugnomic...@gmail.com>>> wrote: > Hi there, > > Thanks for the expedite reply. The thing is that although the mpirun is > setup correctly (since a simple hello world works), when I run the main > SWMF.exe executable, the cluster machines somehow fail to find the > executable (SWMF.exe). > > So, I have attached the sample error file from one of the runs > (SWMF.e143438) and also the MAKEFILES so that you could better gauge the > p
Re: [OMPI users] Unable to find the following executable
Hello Tushar, MPIRUN is not able to spawn processes on the node allocated. This should help #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=2:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes mpirun -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log Rangam From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Saturday, November 20, 2010 10:48 AM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Hi Rangam, I ran the batch file that you gave and have attached the error file. Also, since the WASATCH cluster is kind of small, people usually run on UINTA. So, if possible could you look at the uinta error files? Tushar On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu>> wrote: Hello Tushar, After looking at the log files you attached it appears that there are multiple issues. [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. Usually they occur if there is a mismatch in mpirun version and mca blt selection. I suggest the following order to check if the job actually works on a single node #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=2:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes mpirun -np 8 -machinefile list_of_nodes /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] Sent: Friday, November 19, 2010 1:11 PM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Hey Rangam, I tried out the batch script and the error file comes out empty and the output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE MACHINE) and the same with multiple machines in the run. So, does that mean that the exe is auto mounted ? What should I do next? Tushar On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu><mailto:srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>>> wrote: Hello Tushar, Try the following script. #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=1:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes The objective is to check if your user directories are auto mounted on compute nodes and are available during run time. If the job returns information about SWMF.exe then it can be safely assumed that user directories are being auto mounted. Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>] Sent: Friday, November 19, 2010 8:35 AM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable It just gives back the info on folders in my home directory. Dont get me wrong but i m kinda new in this. So, could u type out d full command which i need to give? Tushar On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain mailto:r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org>>>> wrote: You can qsub a simple "ls" on that path - that will tell you if the path is valid on all machines in that allocation. What typically happens is that home directories aren't remotely mounted, or are mounted on a different location. On Thu, Nov 18, 2010 at 8:31 AM, Tushar Andriyas mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>>> wrote: no its not in the same directory as SWMF. I guess the path is the same since all the machines in a cluster are configured d same way. How do I know if this is not the case? On Thu, Nov 18, 2010 at 8:25 AM, Ralph Castain mailto:r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r..
Re: [OMPI users] Unable to find the following executable
Rangam, It does not want to run at all. Attached is the log file from the batch file run u sent. On Sat, Nov 20, 2010 at 10:32 AM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > MPIRUN is not able to spawn processes on the node allocated. This should > help > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Saturday, November 20, 2010 10:48 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Hi Rangam, > > I ran the batch file that you gave and have attached the error file. Also, > since the WASATCH cluster is kind of small, people usually run on UINTA. So, > if possible could you look at the uinta error files? > Tushar > > On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>> > wrote: > Hello Tushar, > After looking at the log files you attached it appears that there are > multiple issues. > > [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. > Another transport will be used instead, although this may result in > lower performance. > > Usually they occur if there is a mismatch in mpirun version and mca blt > selection. I suggest the following order to check if the job actually works > on a single node > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 -machinefile list_of_nodes > /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] > Sent: Friday, November 19, 2010 1:11 PM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Hey Rangam, > > I tried out the batch script and the error file comes out empty and the > output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE > MACHINE) and the same with multiple machines in the run. So, does that mean > that the exe is auto mounted ? What should I do next? > > Tushar > > On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu > ><mailto:srirangam.v.addepa...@ttu.edu srirangam.v.addepa...@ttu.edu>>> wrote: > Hello Tushar, > > Try the following script. > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=1:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > > > > > The objective is to check if your user directories are auto mounted on > compute nodes and are available during run time. > > If the job returns information about SWMF.exe then it can be safely assumed > that user directories are being auto mounted. > > Rangam > > > > > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org > ><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com > ><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>] > Sent: Friday, November 19, 2010 8:35 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > It just gives back the info on folders in my home directory. Dont get me > wrong but i m kinda new in this. So, could u type out d full command which i > need to give? > > Tushar > > On Thu, Nov 18, 2010 at 8:35 AM, Ralph Castain r...@open-mpi.org><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org > >><mailto:r...@open-mpi.org<mailto:r...@open-mpi.org> r...@open-mpi.org<mailto:r...@open-mpi.org>>>> wrote: > You can qsub a simple "ls"
Re: [OMPI users] Unable to find the following executable
Hello Tushar, Can you send me the output of ompi_info. Have you tried using just tcp instead of IB to narrow down. Rangam #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=1:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Saturday, November 20, 2010 12:11 PM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Rangam, It does not want to run at all. Attached is the log file from the batch file run u sent. On Sat, Nov 20, 2010 at 10:32 AM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu>> wrote: Hello Tushar, MPIRUN is not able to spawn processes on the node allocated. This should help #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=2:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes mpirun -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] Sent: Saturday, November 20, 2010 10:48 AM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Hi Rangam, I ran the batch file that you gave and have attached the error file. Also, since the WASATCH cluster is kind of small, people usually run on UINTA. So, if possible could you look at the uinta error files? Tushar On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu><mailto:srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>>> wrote: Hello Tushar, After looking at the log files you attached it appears that there are multiple issues. [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. Another transport will be used instead, although this may result in lower performance. Usually they occur if there is a mismatch in mpirun version and mca blt selection. I suggest the following order to check if the job actually works on a single node #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=2:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes mpirun -np 8 -machinefile list_of_nodes /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>] Sent: Friday, November 19, 2010 1:11 PM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Hey Rangam, I tried out the batch script and the error file comes out empty and the output file has /home/A00945081/SWM_v2.3/run/SWMF.exe (WHEN RUN ON A SINGLE MACHINE) and the same with multiple machines in the run. So, does that mean that the exe is auto mounted ? What should I do next? Tushar On Fri, Nov 19, 2010 at 10:05 AM, Addepalli, Srirangam V mailto:srirangam.v.addepa...@ttu.edu><mailto:srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>><mailto:srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu><mailto:srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>>>> wrote: Hello Tushar, Try the following script. #!/bin/sh #PBS -V #PBS -q wasatch #PBS -N SWMF #PBS -l nodes=1:ppn=8 # change to the run directory #cd $SWMF_v2.3/run cat `echo ${PBS_NODEFILE}` > list_of_nodes The objective is to check if your user directories are auto mounted on compute nodes and are available during run time. If the job returns information about SWMF.exe then it can be safely assumed that user directories are being auto mounted. Rangam From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>> [users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>
Re: [OMPI users] Unable to find the following executable
rmgr: urm (MCA v1.0, API v2.0, Component v1.2.7) MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.7) MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.7) MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.7) MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.7) MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.7) MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.7) MCA sds: env (MCA v1.0, API v1.0, Component v1.2.7) MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.7) MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.7) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.7) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.7) How do you invoke tcp? I no for sure that the launcher on the clusters is torque. Tushar On Sat, Nov 20, 2010 at 11:28 AM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > Hello Tushar, > Can you send me the output of ompi_info. > Have you tried using just tcp instead of IB to narrow down. > Rangam > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=1:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > > mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 8 > /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Saturday, November 20, 2010 12:11 PM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Rangam, > > It does not want to run at all. Attached is the log file from the batch > file run u sent. > > On Sat, Nov 20, 2010 at 10:32 AM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu>> > wrote: > Hello Tushar, > MPIRUN is not able to spawn processes on the node allocated. This should > help > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>] > Sent: Saturday, November 20, 2010 10:48 AM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Hi Rangam, > > I ran the batch file that you gave and have attached the error file. Also, > since the WASATCH cluster is kind of small, people usually run on UINTA. So, > if possible could you look at the uinta error files? > Tushar > > On Fri, Nov 19, 2010 at 12:31 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu<mailto:srirangam.v.addepa...@ttu.edu > ><mailto:srirangam.v.addepa...@ttu.edu srirangam.v.addepa...@ttu.edu>>> wrote: > Hello Tushar, > After looking at the log files you attached it appears that there are > multiple issues. > > [0,1,11]: Myrinet/GM on host wasatch-55 was unable to find any NICs. > Another transport will be used instead, although this may result in > lower performance. > > Usually they occur if there is a mismatch in mpirun version and mca blt > selection. I suggest the following order to check if the job actually works > on a single node > > #!/bin/sh > #PBS -V > #PBS -q wasatch > #PBS -N SWMF > #PBS -l nodes=2:ppn=8 > # change to the run directory > #cd $SWMF_v2.3/run > cat `echo ${PBS_NODEFILE}` > list_of_nodes > mpirun -np 8 -machinefile list_of_nodes > /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > > Rangam > > > > From: users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org > ><mailto:users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>> [ > users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org> users-boun...@open-mpi.org<mailto:users-boun...@open-mpi.org>>] On Behalf > Of Tushar Andriyas [thugnomic...@gmail.com<mailto:thugnomic...@gmail.com > ><mailto:thugnomic...@gmail.com<mailto:thugnomic...@gmail.com>>] > Sent: Friday, November 19, 2010 1:11 PM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable >
Re: [OMPI users] Unable to find the following executable
mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 8 /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log to run using tcp interface in job submission script. Rangam ___ From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of Tushar Andriyas [thugnomic...@gmail.com] Sent: Saturday, November 20, 2010 1:36 PM To: Open MPI Users Subject: Re: [OMPI users] Unable to find the following executable Ya sure, here is the list Open MPI: 1.2.7 Open MPI SVN revision: r19401 Open RTE: 1.2.7 Open RTE SVN revision: r19401 OPAL: 1.2.7 OPAL SVN revision: r19401 Prefix: /opt/libraries/openmpi/openmpi-1.2.7-pgi Configured architecture: x86_64-unknown-linux-gnu Configured by: A00017402 Configured on: Thu Sep 18 15:00:05 MDT 2008 Configure host: volvox.hpc.usu.edu<http://volvox.hpc.usu.edu> Built by: A00017402 Built on: Thu Sep 18 15:20:06 MDT 2008 Built host: volvox.hpc.usu.edu<http://volvox.hpc.usu.edu> C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: large C compiler: pgcc C compiler absolute: /opt/apps/pgi/linux86-64/7.2/bin/pgcc C++ compiler: pgCC C++ compiler absolute: /opt/apps/pgi/linux86-64/7.2/bin/pgCC Fortran77 compiler: pgf77 Fortran77 compiler abs: /opt/apps/pgi/linux86-64/7.2/bin/pgf77 Fortran90 compiler: pgf90 Fortran90 compiler abs: /opt/apps/pgi/linux86-64/7.2/bin/pgf90 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: no MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.7) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.7) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.7) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.7) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.7) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.7) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.7) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.7) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.7) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.7) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.7) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.7) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.7) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.7) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.7) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.7) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.7) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.7) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.7) MCA btl: gm (MCA v1.0, API v1.0.1, Component v1.2.7) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.7) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.7) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.7) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.7) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.7) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.7) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.7) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.7) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.7) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.7) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.7) MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.7) MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.7) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.7) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.7) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.7) MCA ras: localhost (MCA v1.0, API v1.3, C
Re: [OMPI users] Unable to find the following executable
I tried out the TCP connection and here is what the error file came out as. [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../orte/mca\ /pls/base/pls_base_orted_cmds.c at line 275 [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/\ mca/pls/tm/pls_tm_module.c at line 572 [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/\ mca/errmgr/hnp/errmgr_hnp.c at line 90 [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../orte/mca\ /pls/base/pls_base_orted_cmds.c at line 188 [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/\ mca/pls/tm/pls_tm_module.c at line 603 -- mpirun was unable to cleanly terminate the daemons for this job. Returned value\ Timeout instead of ORTE_SUCCESS. -- [wasatch-29:05044] OOB: Connection to HNP lost On Sat, Nov 20, 2010 at 2:03 PM, Addepalli, Srirangam V < srirangam.v.addepa...@ttu.edu> wrote: > mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 8 > /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log > > to run using tcp interface in job submission script. > > Rangam > ___ > From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of > Tushar Andriyas [thugnomic...@gmail.com] > Sent: Saturday, November 20, 2010 1:36 PM > To: Open MPI Users > Subject: Re: [OMPI users] Unable to find the following executable > > Ya sure, here is the list > > >Open MPI: 1.2.7 > Open MPI SVN revision: r19401 >Open RTE: 1.2.7 > Open RTE SVN revision: r19401 >OPAL: 1.2.7 > OPAL SVN revision: r19401 > Prefix: /opt/libraries/openmpi/openmpi-1.2.7-pgi > Configured architecture: x86_64-unknown-linux-gnu > Configured by: A00017402 > Configured on: Thu Sep 18 15:00:05 MDT 2008 > Configure host: volvox.hpc.usu.edu<http://volvox.hpc.usu.edu> > Built by: A00017402 >Built on: Thu Sep 18 15:20:06 MDT 2008 > Built host: volvox.hpc.usu.edu<http://volvox.hpc.usu.edu> > C bindings: yes >C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: large > C compiler: pgcc > C compiler absolute: /opt/apps/pgi/linux86-64/7.2/bin/pgcc >C++ compiler: pgCC > C++ compiler absolute: /opt/apps/pgi/linux86-64/7.2/bin/pgCC > Fortran77 compiler: pgf77 > Fortran77 compiler abs: /opt/apps/pgi/linux86-64/7.2/bin/pgf77 > Fortran90 compiler: pgf90 > Fortran90 compiler abs: /opt/apps/pgi/linux86-64/7.2/bin/pgf90 > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: no > Thread support: posix (mpi: no, progress: no) > Internal debug support: no > MPI parameter check: runtime > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > Heterogeneous support: yes > mpirun default --prefix: no > MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.7) > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.7) > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.7) > MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.7) > MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.7) > MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.7) > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.7) > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.7) > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) >MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.7) >MCA coll: self (MCA v1.0, API v1.0, Component v1.2.7) >MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.7) >MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.7) > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.7) > MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.7) > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.7) > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.7) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.7) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.7) > MCA rcache: vma (MCA v1.0, API v1.0, Compon
Re: [OMPI users] Unable to find the following executable
The same TCP connection if I run on the other cluster UINTA, I get the following error log mca/pls/tm/pls_tm_module.c at line 572 [uinta-0039:14508] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/\ mca/errmgr/hnp/errmgr_hnp.c at line 90 [uinta-0039:14508] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../orte/mca\ /pls/base/pls_base_orted_cmds.c at line 188 [uinta-0039:14508] [0,0,0] ORTE_ERROR_LOG: Timeout in file ../../../../../orte/\ mca/pls/tm/pls_tm_module.c at line 603 -- mpirun was unable to cleanly terminate the daemons for this job. Returned value\ Timeout instead of ORTE_SUCCESS. -- [uinta-0039:14510] OOB: Connection to HNP lost [uinta-0038:15165] OOB: Connection to HNP lost On Sat, Nov 20, 2010 at 5:35 PM, Tushar Andriyas wrote: > I tried out the TCP connection and here is what the error file came out as. > > > > [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file > ../../../../orte/mca\ > /pls/base/pls_base_orted_cmds.c at line 275 > [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file > ../../../../../orte/\ > mca/pls/tm/pls_tm_module.c at line 572 > [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file > ../../../../../orte/\ > mca/errmgr/hnp/errmgr_hnp.c at line 90 > [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file > ../../../../orte/mca\ > /pls/base/pls_base_orted_cmds.c at line 188 > [wasatch-29:05042] [0,0,0] ORTE_ERROR_LOG: Timeout in file > ../../../../../orte/\ > mca/pls/tm/pls_tm_module.c at line 603 > -- > mpirun was unable to cleanly terminate the daemons for this job. Returned > value\ > Timeout instead of ORTE_SUCCESS. > -- > [wasatch-29:05044] OOB: Connection to HNP lost > > > > > On Sat, Nov 20, 2010 at 2:03 PM, Addepalli, Srirangam V < > srirangam.v.addepa...@ttu.edu> wrote: > >> mpirun --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 8 >> /home/A00945081/SWMF_v2.3/run/SWMF.exe > run.log >> >> to run using tcp interface in job submission script. >> >> Rangam >> ___ >> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf >> Of Tushar Andriyas [thugnomic...@gmail.com] >> Sent: Saturday, November 20, 2010 1:36 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] Unable to find the following executable >> >> Ya sure, here is the list >> >> >>Open MPI: 1.2.7 >> Open MPI SVN revision: r19401 >>Open RTE: 1.2.7 >> Open RTE SVN revision: r19401 >>OPAL: 1.2.7 >> OPAL SVN revision: r19401 >> Prefix: /opt/libraries/openmpi/openmpi-1.2.7-pgi >> Configured architecture: x86_64-unknown-linux-gnu >> Configured by: A00017402 >> Configured on: Thu Sep 18 15:00:05 MDT 2008 >> Configure host: volvox.hpc.usu.edu<http://volvox.hpc.usu.edu> >> Built by: A00017402 >>Built on: Thu Sep 18 15:20:06 MDT 2008 >> Built host: volvox.hpc.usu.edu<http://volvox.hpc.usu.edu> >> C bindings: yes >>C++ bindings: yes >> Fortran77 bindings: yes (all) >> Fortran90 bindings: yes >> Fortran90 bindings size: large >> C compiler: pgcc >> C compiler absolute: /opt/apps/pgi/linux86-64/7.2/bin/pgcc >>C++ compiler: pgCC >> C++ compiler absolute: /opt/apps/pgi/linux86-64/7.2/bin/pgCC >> Fortran77 compiler: pgf77 >> Fortran77 compiler abs: /opt/apps/pgi/linux86-64/7.2/bin/pgf77 >> Fortran90 compiler: pgf90 >> Fortran90 compiler abs: /opt/apps/pgi/linux86-64/7.2/bin/pgf90 >> C profiling: yes >> C++ profiling: yes >> Fortran77 profiling: yes >> Fortran90 profiling: yes >> C++ exceptions: no >> Thread support: posix (mpi: no, progress: no) >> Internal debug support: no >> MPI parameter check: runtime >> Memory profiling support: no >> Memory debugging support: no >> libltdl support: yes >> Heterogeneous support: yes >> mpirun default --prefix: no >> MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.7) >> MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.7) >> MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.7) >>