Re: [OMPI users] Problem with running openMPI program
@Gus the applications in the links u have sent are really high level n i believe really expensive too as i will have 2 have a physical apparatus for various measurements along with the cluster. Am i right?
Re: [OMPI users] Problem with running openMPI program
Are there any application that i can implement on a small level, in a lab or something??? Also what do for clustering web servers? On Wed, Apr 29, 2009 at 2:46 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Ankush > > Glad to hear that your MPI and cluster project were successful. > > I don't know if you would call these "mathematical computation" > or "real life applications" of MPI and clusters, but here are a > few samples I am familiar with (Earth Science): > > Weather forecast: > http://www.wrf-model.org/index.php > http://www.mmm.ucar.edu/mm5/ > > Climate, Atmosphere and Ocean circulation modeling: > http://www.ccsm.ucar.edu/models/ccsm3.0/ > http://www.jamstec.go.jp/esc/index.en.html > http://www.metoffice.gov.uk/climatechange/ > http://www.gfdl.noaa.gov/fms > http://www.nemo-ocean.eu/ > > Earthquakes, computational seismology, and solid Earth dynamics: > http://www.gps.caltech.edu/~jtromp/research/index.html<http://www.gps.caltech.edu/%7Ejtromp/research/index.html> > http://www-esd.lbl.gov/GG/CCS/ > > A couple of other areas: > > Computational Fluid Dynamics, Finite Element Method, etc: > http://www.foamcfd.org/ > http://www.cimec.org.ar/twiki/bin/view/Cimec/PETScFEM > > Computational Chemistry, molecular dynamics, etc: > http://www.tddft.org/programs/octopus/wiki/index.php/Main_Page > http://classic.chem.msu.su/gran/gamess/ > http://ambermd.org/ > http://www.gromacs.org/ > http://www.charmm.org/ > > Gus Correa > > > Ankush Kaul wrote: > >> Thanks everyone(esp Gus and Jeff) for the support and guidance. We are >> almost at the verge of completing our project which could have not been >> possible without all u guys. >> >> I would like to know one more thing, what are real life applications that >> i can use the cluster for (except mathematical computation)? Can i use if >> for my web server, if yes then how? >> >> >> >> On Fri, Apr 24, 2009 at 12:01 AM, Jeff Squyres <jsquy...@cisco.com> jsquy...@cisco.com>> wrote: >> >>Excellent answer. One addendum -- we had a really nice FAQ entry >>about this kind of stuff on the LAM/MPI web site, which I was >>horrified to see that we had not copied to the Open MPI site. So I >>copied it over this morning. :-) >> >>Have a look at these 3 FAQ (brand new) entries: >> >> >> http://www.open-mpi.org/faq/?category=building#overwrite-pre-installed-ompi >> http://www.open-mpi.org/faq/?category=building#where-to-install >> >> http://www.open-mpi.org/faq/?category=running#do-i-need-a-common-filesystem >> >>Hope that helps. >> >> >> >> >>On Apr 23, 2009, at 10:34 AM, Gus Correa wrote: >> >>Hi Ankush >> >>Jeff already sent clarifications about image processing, >>and the portable API nature of OpenMPI (and other MPI >>implementations). >> >>As for "mpicc: command not found" this is again a problem with your >>PATH. >>Remember the "locate" command? :) >>Find where mpicc is installed, and put that directory on your PATH. >> >>In any case, I would suggest that you choose a central NFS mounted >>file system on your cluster master node, and install OpenMPI there, >>configuring and building it from source (not from yum). >>If this directory is mounted on all nodes, the same OpenMPI will be >>available on all nodes. >>This will give you a single standard version of OpenMPI across >>the board. >> >>Clustering can become a very confusing and tricky business if you >>have heterogeneous nodes, with different OS/Linux versions, >>different MPI versions etc, software installed in different >>locations >>on each node, etc, regardless of whether you use mpiselector or >>you set the PATH variable on each node, or you use environment >>modules >>package, or any other technique to setup your environment. >>Installing less software, rather than more software, >>and doing so in a standardized homogeneous way across all >>cluster nodes, >>will give you a cleaner environment, which is easier to understand, >>control, upgrade, and update. >> >>A relatively simple way to install a homogeneous cluster is >>to use the Rocks Clusters "rolls" suite, >>which is free and based on CentOS. >>It will
Re: [OMPI users] Problem with running openMPI program
i have gone through that course, but i still am not at a stage where i can develop a MPI program, so was looking for some IP programs on the net. Will try the imageproc.c<http://lam-mpi.lzu.edu.cn/tutorials/nd/part1/imageproc.c>program which i found http://lam-mpi.lzu.edu.cn/tutorials/nd/part1/ hope it runs on openmpi. On Thu, Apr 23, 2009 at 5:07 PM, Jeff Squyres <jsquy...@cisco.com> wrote: > Yes, they will run. Note that these are toy image processing examples; > they are no substitute for a real image processing application. > > You might want to look at a full MPI tutorial to get an understanding of > MPI itself: > > http://ci-tutor.ncsa.uiuc.edu/login.php > > Register (it's free), login, and look for the Introduction to MPI tutorial. > It's quite good. > > > > > On Apr 23, 2009, at 6:59 AM, Ankush Kaul wrote: > > I found some programs on this link : >> http://lam-mpi.lzu.edu.cn/tutorials/nd/part1/ >> >> will these program run on my openmpi cluster? >> >> actually i want to run some image processing program on my cluster, as i >> cannot write the entire code of the program can anyone tell where can i get >> ip programs. >> >> I know this is the wrong place to ask but thought would give it a try as i >> cannot find anything on the net. >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > -- > Jeff Squyres > Cisco Systems > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Problem with running openMPI program
I found some programs on this link : http://lam-mpi.lzu.edu.cn/tutorials/nd/part1/ will these program run on my openmpi cluster? actually i want to run some image processing program on my cluster, as i cannot write the entire code of the program can anyone tell where can i get ip programs. I know this is the wrong place to ask but thought would give it a try as i cannot find anything on the net.
Re: [OMPI users] Problem with running openMPI program
@Gus, Eugene I read all you mails and even followed the same procedure, it was blas that was giving the problem. Thanks I am again stuck on a problem, i connected a new node to my cluster and installed CentOS 5.2 on it. after that i use yum to install openmpi,openmpi-libs and openmpi-devel sucessfully. But still when i run mpicc command it gives me error : *bash: mpicc: command not found* i found out there is a command *mpi-selector* but dont know hoe to use it. Is this a new version of openmpi? how do i configure it?
Re: [OMPI users] Problem with running openMPI program
@gus we are not able to make hpl sucessfully. i think it has to do something with blas i cannot find blas tar file on the net, i found rpm but de installation steps is with tar file. #*locate blas* gave us the following result *[root@ccomp1 hpl]# locate blas /hpl/include/hpl_blas.h /hpl/makes/Make.blas /hpl/src/blas /hpl/src/blas/HPL_daxpy.c /hpl/src/blas/HPL_dcopy.c /hpl/src/blas/HPL_dgemm.c /hpl/src/blas/HPL_dgemv.c /hpl/src/blas/HPL_dger.c /hpl/src/blas/HPL_dscal.c /hpl/src/blas/HPL_dswap.c /hpl/src/blas/HPL_dtrsm.c /hpl/src/blas/HPL_dtrsv.c /hpl/src/blas/HPL_idamax.c /hpl/src/blas/ccomp /hpl/src/blas/i386 /hpl/src/blas/ccomp/Make.inc /hpl/src/blas/ccomp/Makefile /hpl/src/blas/i386/Make.inc /hpl/src/blas/i386/Makefile /usr/include/boost/numeric/ublas /usr/include/boost/numeric/ublas/banded.hpp /usr/include/boost/numeric/ublas/blas.hpp /usr/include/boost/numeric/ublas/detail /usr/include/boost/numeric/ublas/exception.hpp /usr/include/boost/numeric/ublas/expression_types.hpp /usr/include/boost/numeric/ublas/functional.hpp /usr/include/boost/numeric/ublas/fwd.hpp /usr/include/boost/numeric/ublas/hermitian.hpp /usr/include/boost/numeric/ublas/io.hpp /usr/include/boost/numeric/ublas/lu.hpp /usr/include/boost/numeric/ublas/matrix.hpp /usr/include/boost/numeric/ublas/matrix_expression.hpp /usr/include/boost/numeric/ublas/matrix_proxy.hpp /usr/include/boost/numeric/ublas/matrix_sparse.hpp /usr/include/boost/numeric/ublas/operation.hpp /usr/include/boost/numeric/ublas/operation_blocked.hpp /usr/include/boost/numeric/ublas/operation_sparse.hpp /usr/include/boost/numeric/ublas/storage.hpp /usr/include/boost/numeric/ublas/storage_sparse.hpp /usr/include/boost/numeric/ublas/symmetric.hpp /usr/include/boost/numeric/ublas/traits.hpp /usr/include/boost/numeric/ublas/triangular.hpp /usr/include/boost/numeric/ublas/vector.hpp /usr/include/boost/numeric/ublas/vector_expression.hpp /usr/include/boost/numeric/ublas/vector_of_vector.hpp /usr/include/boost/numeric/ublas/vector_proxy.hpp /usr/include/boost/numeric/ublas/vector_sparse.hpp /usr/include/boost/numeric/ublas/detail/concepts.hpp /usr/include/boost/numeric/ublas/detail/config.hpp /usr/include/boost/numeric/ublas/detail/definitions.hpp /usr/include/boost/numeric/ublas/detail/documentation.hpp /usr/include/boost/numeric/ublas/detail/duff.hpp /usr/include/boost/numeric/ublas/detail/iterator.hpp /usr/include/boost/numeric/ublas/detail/matrix_assign.hpp /usr/include/boost/numeric/ublas/detail/raw.hpp /usr/include/boost/numeric/ublas/detail/returntype_deduction.hpp /usr/include/boost/numeric/ublas/detail/temporary.hpp /usr/include/boost/numeric/ublas/detail/vector_assign.hpp /usr/lib/libblas.so.3 /usr/lib/libblas.so.3.1 /usr/lib/libblas.so.3.1.1 /usr/lib/openoffice.org/basis3.0/share/gallery/htmlexpo/cublast.gif /usr/lib/openoffice.org/basis3.0/share/gallery/htmlexpo/cublast_.gif /usr/share/backgrounds/images/tiny_blast_of_red.jpg /usr/share/doc/blas-3.1.1 /usr/share/doc/blas-3.1.1/blasqr.ps /usr/share/man/manl/intro_blas1.l.gz* When we try to make using the following command *# make arch=ccomp* ** it gives error : *Makefile:47: Make.inc: No such file or directory make[2]: *** No rule to make target `Make.inc'. Stop. make[2]: Leaving directory `/hpl/src/auxil/ccomp' make[1]: *** [build_src] Error 2 make[1]: Leaving directory `/hpl' make: *** [build] Error 2* ** *ccomp* folder is created but *xhpl* file is not created is it some prob with de config file? On Wed, Apr 22, 2009 at 11:40 AM, Ankush Kaul <ankush.rk...@gmail.com>wrote: > i feel the above problem occured due 2 installing mpich package, now even > nomal mpi programs are not running. > What should we do? we even tried *yum remove mpich* but it says no > packages to remove. > Please Help!!! > > On Wed, Apr 22, 2009 at 11:34 AM, Ankush Kaul <ankush.rk...@gmail.com>wrote: > >> We are facing another problem, we were tryin to install different >> benchmarking packages >> >> now whenever we try to run *mpirun* command (which was working perfectly >> before) we get this error: >> *usr/local/bin/mpdroot: open failed for root's mpd conf filempdtrace >> (__init__ 1190): forked process failed; status=255* >> >> whats the problem here? >> >> >> >> On Tue, Apr 21, 2009 at 11:45 PM, Gus Correa <g...@ldeo.columbia.edu>wrote: >> >>> Hi Ankush >>> >>> Ankush Kaul wrote: >>> >>>> @Eugene >>>> they are ok but we wanted something better, which would more clearly >>>> show de diff in using a single pc and the cluster. >>>> >>>> @Prakash >>>> i had prob with running de programs as they were compiling using mpcc n >>>> not mpicc >>>> >>>> @gus >>>> we are tryin 2 figure out de hpl config, its quite complicated, >>>> >>> &g
Re: [OMPI users] Problem with running openMPI program
i feel the above problem occured due 2 installing mpich package, now even nomal mpi programs are not running. What should we do? we even tried *yum remove mpich* but it says no packages to remove. Please Help!!! On Wed, Apr 22, 2009 at 11:34 AM, Ankush Kaul <ankush.rk...@gmail.com>wrote: > We are facing another problem, we were tryin to install different > benchmarking packages > > now whenever we try to run *mpirun* command (which was working perfectly > before) we get this error: > *usr/local/bin/mpdroot: open failed for root's mpd conf filempdtrace > (__init__ 1190): forked process failed; status=255* > > whats the problem here? > > > > On Tue, Apr 21, 2009 at 11:45 PM, Gus Correa <g...@ldeo.columbia.edu>wrote: > >> Hi Ankush >> >> Ankush Kaul wrote: >> >>> @Eugene >>> they are ok but we wanted something better, which would more clearly show >>> de diff in using a single pc and the cluster. >>> >>> @Prakash >>> i had prob with running de programs as they were compiling using mpcc n >>> not mpicc >>> >>> @gus >>> we are tryin 2 figure out de hpl config, its quite complicated, >>> >> >> I sent you some sketchy instructions to build HPL, >> on my last message to this thread. >> I built HPL and run it here yesterday that way. >> Did you try my suggestions? >> Where did you get stuck? >> >> also de locate command lists lots of confusing results. >>> >>> >> I would say the list is just long, not really confusing. >> You can find what you need if you want. >> Pipe the output of locate through "more", and search carefully. >> If you are talking about BLAS try "locate libblas.a" and >> "locate libgoto.a". >> Those are the libraries you need, and if they are not there >> you need to install one of them. >> Read my previous email for details. >> I hope it will help you get HPL working, if you are interested on HPL. >> >> I hope this helps. >> >> Gus Correa >> - >> Gustavo Correa >> Lamont-Doherty Earth Observatory - Columbia University >> Palisades, NY, 10964-8000 - USA >> - >> >> @jeff >>> i think u are correct we may have installed openmpi without VT support, >>> but is there anythin we can do now??? >>> >>> One more thing I found this program but dont know how to run it : >>> http://www.cis.udel.edu/~pollock/367/manual/node35.html >>> >>> Thanks 2 all u guys 4 putting in so much efforts to help us out. >>> >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > >
Re: [OMPI users] Problem with running openMPI program
We are facing another problem, we were tryin to install different benchmarking packages now whenever we try to run *mpirun* command (which was working perfectly before) we get this error: *usr/local/bin/mpdroot: open failed for root's mpd conf filempdtrace (__init__ 1190): forked process failed; status=255* whats the problem here? On Tue, Apr 21, 2009 at 11:45 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Ankush > > Ankush Kaul wrote: > >> @Eugene >> they are ok but we wanted something better, which would more clearly show >> de diff in using a single pc and the cluster. >> >> @Prakash >> i had prob with running de programs as they were compiling using mpcc n >> not mpicc >> >> @gus >> we are tryin 2 figure out de hpl config, its quite complicated, >> > > I sent you some sketchy instructions to build HPL, > on my last message to this thread. > I built HPL and run it here yesterday that way. > Did you try my suggestions? > Where did you get stuck? > > also de locate command lists lots of confusing results. >> >> > I would say the list is just long, not really confusing. > You can find what you need if you want. > Pipe the output of locate through "more", and search carefully. > If you are talking about BLAS try "locate libblas.a" and > "locate libgoto.a". > Those are the libraries you need, and if they are not there > you need to install one of them. > Read my previous email for details. > I hope it will help you get HPL working, if you are interested on HPL. > > I hope this helps. > > Gus Correa > - > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > - > > @jeff >> i think u are correct we may have installed openmpi without VT support, >> but is there anythin we can do now??? >> >> One more thing I found this program but dont know how to run it : >> http://www.cis.udel.edu/~pollock/367/manual/node35.html >> >> Thanks 2 all u guys 4 putting in so much efforts to help us out. >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Problem with running openMPI program
@Eugene they are ok but we wanted something better, which would more clearly show de diff in using a single pc and the cluster. @Prakash i had prob with running de programs as they were compiling using mpcc n not mpicc @gus we are tryin 2 figure out de hpl config, its quite complicated, also de locate command lists lots of confusing results. @jeff i think u are correct we may have installed openmpi without VT support, but is there anythin we can do now??? One more thing I found this program but dont know how to run it : http://www.cis.udel.edu/~pollock/367/manual/node35.html Thanks 2 all u guys 4 putting in so much efforts to help us out.
Re: [OMPI users] Problem with running openMPI program
let me describe what i want to do. i had taken linux clustering as my final year engineering project as i m really iintrested in 0networking. to tell de truth our college does not have any professor with knowledge of clustering. the aim of our project was just to make a cluster, which we did. not we have to show and explain our project to the professors. so i want somethin to show them how de cluster works... some program or benchmarking s/w. hope you got the problem. and thanks again, we really appretiate you patience.
Re: [OMPI users] Problem with running openMPI program
Thanks a lot, I m implementing the passwordless cluster I m also tryin different benchmarking software n got fed up of all the probs in all de sofwares i try. will list few: *1) VampirTrace* I extracted de tar in /vt then followed following steps *$ ./configure --prefix=/vti* [...lots of output...] *$ make all install* after this the FAQ on open-mpi.org asks to '*Simply replace the compiler wrappers to activate vampir trace*' but does not tell how do i replace the compiler wrappers. i try to run *mpicc-vt -c hello.c -o hello * but it gives a error *bash: mpicc-vt: command not found **2) HPL * for this i didnt undersatnd the installation steps. I extracted the tar in /hpl Then is asks to '*create a file Make. in the top-level directory*' i created a file Make.i386. then it says '*This file essentially contains the compilers and librairies with their paths to be used*' how do i put that? after that it asks to run command *make arch=i386 *but it gives error* **make[3]: Entering directory `/hpl' make -f Make.top startup_dir arch=i386 make[4]: Entering directory `/hpl' Make.top:161: warning: overriding commands for target `clean_arch_all' Make.i386:84: warning: ignoring old commands for target `clean_arch_all' include/i386 make[4]: include/i386: Command not found make[4]: [startup_dir] Error 127 (ignored) lib make[4]: lib: Command not found make[4]: [startup_dir] Error 127 (ignored) lib/i386 make[4]: lib/i386: Command not found make[4]: [startup_dir] Error 127 (ignored) bin make[4]: bin: Command not found make[4]: [startup_dir] Error 127 (ignored) bin/i386 make[4]: bin/i386: Command not found make[4]: [startup_dir] Error 127 (ignored) make[4]: Leaving directory `/hpl' make -f Make.top startup_src arch=i386 make[4]: Entering directory `/hpl' Make.top:161: warning: overriding commands for target `clean_arch_all' Make.i386:84: warning: ignoring old commands for target `clean_arch_all' make -f Make.top leaf le=src/auxil arch=i386 make[5]: Entering directory `/hpl' Make.top:161: warning: overriding commands for target `clean_arch_all' Make.i386:84: warning: ignoring old commands for target `clean_arch_all' ( src/auxil ; i386 ) /bin/sh: src/auxil: is a directory *then it enters shell prompt. Please help, is there a simpler Benchmarking software? i dont wanna give at this point :( * *
Re: [OMPI users] Problem with running openMPI program
Also how can i find out where are my mpi libraries and include directories? On Sat, Apr 18, 2009 at 2:29 PM, Ankush Kaul <ankush.rk...@gmail.com> wrote: > Let me explain in detail, > > when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node > (192.168.45.65) > my openmpi-default-hostfile looked like* > 192.168.67.18 slots=2 > 192.168.45.65 slots=2* > > after this on running the command *miprun /work/Pi* on master node we got > * > # root@192.168.45.65 password :* > > after entering the password the program ran on both de nodes. > > Now after connecting a second compute node, and editing the hostfile: > > *192.168.67.18 slots=2 > 192.168.45.65 slots=2* > *192.168.67.241 slots=2 > > *and then running the command *miprun /work/Pi* on master node we got > > # root@192.168.45.65's password: root@192.168.67.241's password: > > which does not accept the password. > > Although we are trying to implement the passwordless cluster. i wud like to > know what this problem is occuring? > > > On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: > >> Ankush >> >> You need to setup passwordless connections with ssh to the node you just >> added. You (or somebody else) probably did this already on the first >> compute node, otherwise the MPI programs wouldn't run >> across the network. >> >> See the very last sentence on this FAQ: >> >> http://www.open-mpi.org/faq/?category=running#run-prereqs >> >> And try this recipe (if you use RSA keys instead of DSA, replace all "dsa" >> by "rsa"): >> >> >> http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3 >> >> I hope this helps. >> >> Gus Correa >> ----- >> Gustavo Correa >> Lamont-Doherty Earth Observatory - Columbia University >> Palisades, NY, 10964-8000 - USA >> - >> >> >> Ankush Kaul wrote: >> >>> Thank you, i m reading up on de tools u suggested. >>> >>> I am facing another problem, my cluster is working fine with 2 hosts (1 >>> master + 1 compute node) but when i tried 2 add another node (1 master + 2 >>> compute node) its not working. it works fine when i give de command mpirun >>> -host /work/Pi >>> >>> but when i try to run >>> mpirun /work/Pi it gives following error: >>> >>> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >>> root@192.168.67.241 <mailto:root@192.168.67.241>'s password: >>> >>> Permission denied, please try again. >>> >>> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >>> >>> Permission denied, please try again. >>> >>> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >>> >>> Permission denied (publickey,gssapi-with-mic,password). >>> >>> >>> Permission denied, please try again. >>> >>> root@192.168.67.241 <mailto:root@192.168.67.241>'s password: >>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> base/pls_base_orted_cmds.c at line 275 >>> >>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> pls_rsh_module.c at line 1166 >>> >>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> errmgr_hnp.c at line 90 >>> >>> [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to >>> start as expected. >>> >>> [ccomp1.cluster:03503] ERROR: There may be more information available >>> from >>> >>> [ccomp1.cluster:03503] ERROR: the remote shell (see above). >>> >>> [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status >>> 255. >>> >>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> base/pls_base_orted_cmds.c at line 188 >>> >>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >>> pls_rsh_module.c at line 1198 >>> >>> >>> What is the problem here? >>> >>> >>> -- >>> >>> mpirun was unable to cleanly terminate the daemons for this job. Returned >>> value Timeout instead of ORTE_SUCCESS >>> >>> >>> On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <eugene@
Re: [OMPI users] Problem with running openMPI program
Let me explain in detail, when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node (192.168.45.65) my openmpi-default-hostfile looked like* 192.168.67.18 slots=2 192.168.45.65 slots=2* after this on running the command *miprun /work/Pi* on master node we got * # root@192.168.45.65 password :* after entering the password the program ran on both de nodes. Now after connecting a second compute node, and editing the hostfile: *192.168.67.18 slots=2 192.168.45.65 slots=2* *192.168.67.241 slots=2 *and then running the command *miprun /work/Pi* on master node we got # root@192.168.45.65's password: root@192.168.67.241's password: which does not accept the password. Although we are trying to implement the passwordless cluster. i wud like to know what this problem is occuring? On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Ankush > > You need to setup passwordless connections with ssh to the node you just > added. You (or somebody else) probably did this already on the first > compute node, otherwise the MPI programs wouldn't run > across the network. > > See the very last sentence on this FAQ: > > http://www.open-mpi.org/faq/?category=running#run-prereqs > > And try this recipe (if you use RSA keys instead of DSA, replace all "dsa" > by "rsa"): > > > http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3 > > I hope this helps. > > Gus Correa > - > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > - > > > Ankush Kaul wrote: > >> Thank you, i m reading up on de tools u suggested. >> >> I am facing another problem, my cluster is working fine with 2 hosts (1 >> master + 1 compute node) but when i tried 2 add another node (1 master + 2 >> compute node) its not working. it works fine when i give de command mpirun >> -host /work/Pi >> >> but when i try to run >> mpirun /work/Pi it gives following error: >> >> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >> root@192.168.67.241 <mailto:root@192.168.67.241>'s password: >> >> Permission denied, please try again. >> >> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >> >> Permission denied, please try again. >> >> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >> >> Permission denied (publickey,gssapi-with-mic,password). >> >> >> Permission denied, please try again. >> >> root@192.168.67.241 <mailto:root@192.168.67.241>'s password: >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> base/pls_base_orted_cmds.c at line 275 >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> pls_rsh_module.c at line 1166 >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> errmgr_hnp.c at line 90 >> >> [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to >> start as expected. >> >> [ccomp1.cluster:03503] ERROR: There may be more information available from >> >> [ccomp1.cluster:03503] ERROR: the remote shell (see above). >> >> [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status >> 255. >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> base/pls_base_orted_cmds.c at line 188 >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> pls_rsh_module.c at line 1198 >> >> >> What is the problem here? >> >> -- >> >> mpirun was unable to cleanly terminate the daemons for this job. Returned >> value Timeout instead of ORTE_SUCCESS >> >> >> On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <eugene@sun.com > eugene@sun.com>> wrote: >> >>Ankush Kaul wrote: >> >>Finally, after mentioning the hostfiles the cluster is working >>fine. We downloaded few benchmarking softwares but i would like >>to know if there is any GUI based benchmarking software so that >>its easier to demonstrate the working of our cluster while >>displaying our cluster. >> >> >>I'm confused what you're looking for here, but thought I'd venture a >>suggestion. >> >>There are GUI-based performance analysis and tracing tools. E.g., >>run a pro
Re: [OMPI users] Problem with running openMPI program
Thank you, i m reading up on de tools u suggested. I am facing another problem, my cluster is working fine with 2 hosts (1 master + 1 compute node) but when i tried 2 add another node (1 master + 2 compute node) its not working. it works fine when i give de command mpirun -host /work/Pi but when i try to run mpirun /work/Pi it gives following error: root@192.168.45.65's password: root@192.168.67.241's password: Permission denied, please try again. root@192.168.45.65's password: Permission denied, please try again. root@192.168.45.65's password: Permission denied (publickey,gssapi-with-mic,password). Permission denied, please try again. root@192.168.67.241's password: [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 275 [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1166 [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90 [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to start as expected. [ccomp1.cluster:03503] ERROR: There may be more information available from [ccomp1.cluster:03503] ERROR: the remote shell (see above). [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status 255. [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c at line 188 [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at line 1198 What is the problem here? -- mpirun was unable to cleanly terminate the daemons for this job. Returned value Timeout instead of ORTE_SUCCESS On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <eugene@sun.com> wrote: > Ankush Kaul wrote: > > Finally, after mentioning the hostfiles the cluster is working fine. We >> downloaded few benchmarking softwares but i would like to know if there is >> any GUI based benchmarking software so that its easier to demonstrate the >> working of our cluster while displaying our cluster. >> > > I'm confused what you're looking for here, but thought I'd venture a > suggestion. > > There are GUI-based performance analysis and tracing tools. E.g., run a > program, [[semi-]automatically] collect performance data, run a GUI-based > analysis tool on the data, visualize what happened on your cluster. Would > this suit your purposes? > > If so, there are a variety of tools out there you could try. Some are > platform-specific or cost money. Some are widely/freely available. > Examples of these tools include Intel Trace Analyzer, Jumpshot, Vampir, > TAU, etc. I do know that Sun Studio (Performance Analyzer) is available via > free download on x86 and SPARC and Linux and Solaris and works with OMPI. > Possibly the same with Jumpshot. VampirTrace instrumentation is already in > OMPI, but then you need to figure out the analysis-tool part. (I think the > Vampir GUI tool requires a license, but I'm not sure. Maybe you can convert > to TAU, which is probably available for free download.) > > Anyhow, I don't even know if that sort of thing fits your requirements. > Just an idea. > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Problem with running openMPI program
Finally, after mentioning the hostfiles the cluster is working fine. We downloaded few benchmarking softwares but i would like to know if there is any GUI based benchmarking software so that its easier to demonstrate the working of our cluster while displaying our cluster. Regards Ankush
Re: [OMPI users] Problem with running openMPI program
I am able to run the program on de server node, but in de compute node the program only runs in the directory on which the de /work is mounted (/work on de server contains de Pi program). Also while running Pi it shows de process running only on server not compute node(using top) On Sat, Apr 11, 2009 at 1:34 PM, Ankush Kaul <ankush.rk...@gmail.com> wrote: > can you please suggest a simple benchmarking software, are there any gui > benchmarking softwares available? > > > On Tue, Apr 7, 2009 at 2:29 PM, Ankush Kaul <ankush.rk...@gmail.com>wrote: > >> Thank you sir, thanks a lot. >> >> The information you provided helped us a lot. Am currently going through >> the OpenMPI FAQ and will contact you in case of any doubts. >> >> Regards, >> Ankush Kaul >> > >
Re: [OMPI users] Problem with running openMPI program
can you please suggest a simple benchmarking software, are there any gui benchmarking softwares available? On Tue, Apr 7, 2009 at 2:29 PM, Ankush Kaul <ankush.rk...@gmail.com> wrote: > Thank you sir, thanks a lot. > > The information you provided helped us a lot. Am currently going through > the OpenMPI FAQ and will contact you in case of any doubts. > > Regards, > Ankush Kaul >
Re: [OMPI users] Problem with running openMPI program
I am not able to check if NFS export/mount of /tmp is working, when i give the command *ssh 192.168.45.65 192.168.67.18* i get the error : bash: 192.168.67.18: command not found let me explain what i understood using an example. First, i make a folder '/work directory' on my master node. Then i mount this directory on a folder named '/work directory/mnt' on the slave node is this correct? also how and where (is it on the master node) do i give the list of hosts? and by hosts you mean the compute nodes. Plez bear with me as this is the first time i am doin a project on Linux clustering. On Mon, Apr 6, 2009 at 9:27 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Hi Ankush > > If I remember right, > mpirun will put you on your home directory, not on /tmp, > when it starts your ssh session. > To run on /tmp (or on /mnt/nfs) > you may need to use "-path" option. > > Likewise, you may want to give mpirun a list of hosts (-host option) > or a hostfile (-hostfile option), to specify where you want the > program to run. > > Do > "/full/path/to/openmpi/mpriun -help" > for details. > > Make sure your NFS export/mount of /tmp is working, > say, by doing: > > ssh slave_node 'hostname; ls /tmp; ls /mnt/nfs' > > or similar, and see if your program "pi" is really there (and where). > > Actually, it may be confusing to export /tmp, as it is part > of the basic Linux directory tree, > which is the reason why you mounted it on /mnt/nfs. > You may want to choose to export/mount > a directory that is not so generic as /tmp, > so that you can use a consistent name on both computers. > For instance, you can create a /my_export or /work directory > (or whatever name you prefer) on the master node, > export it to the slave node, mount it on the slave node > with the same name/mountpoint, and use it for your MPI work. > > I hope this helps. > Gus Correa > - > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > - > > Ankush Kaul wrote: > >> Thank you sir, >> one more thing i am confused about, suppose i have 2 run a 'pi' program >> using open mpi, where do i place the program? >> >> currently i have placed it in /tmp folder on de master node. this /tmp >> folder is mounted on /mnt/nfs of the compute node. >> >> i run de progam from the tmp folder on de master node, is this correct? >> >> i m a newbie n really need some help, thanks in advance >> >> On Mon, Apr 6, 2009 at 8:43 PM, John Hearns <hear...@googlemail.com> hear...@googlemail.com>> wrote: >> >>2009/4/6 Ankush Kaul <ankush.rk...@gmail.com >><mailto:ankush.rk...@gmail.com>>: >> >> Also how do i come to know that the program is using resources >>of both the >> > nodes? >> >>Log into the second node before you start the program. >>Run 'top' >>Seriously - top is a very, very useful utility. >>___ >>users mailing list >>us...@open-mpi.org <mailto:us...@open-mpi.org> >>http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] Problem with running openMPI program
Thank you sir, one more thing i am confused about, suppose i have 2 run a 'pi' program using open mpi, where do i place the program? currently i have placed it in /tmp folder on de master node. this /tmp folder is mounted on /mnt/nfs of the compute node. i run de progam from the tmp folder on de master node, is this correct? i m a newbie n really need some help, thanks in advance On Mon, Apr 6, 2009 at 8:43 PM, John Hearns <hear...@googlemail.com> wrote: > 2009/4/6 Ankush Kaul <ankush.rk...@gmail.com>: > >> Also how do i come to know that the program is using resources of both > the > > nodes? > > Log into the second node before you start the program. > Run 'top' > Seriously - top is a very, very useful utility. > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >