On 11/30/12 12:12 AM, Gus Correa wrote: > On 11/29/2012 06:35 AM, Duke Nguyen wrote: >> On 11/29/12 5:52 PM, Duke Nguyen wrote: >>> On 11/28/12 1:56 AM, Gus Correa wrote: >>>> On 11/27/2012 01:52 PM, Gus Correa wrote: >>>>> On 11/27/2012 02:14 AM, Duke Nguyen wrote: >>>>>> On 11/27/12 1:44 PM, Christopher Samuel wrote: >>>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>>> Hash: SHA1 >>>>>>> >>>>>>> On 27/11/12 15:51, Duke Nguyen wrote: >>>>>>> >>>>>>>> Thanks! Yes, I am trying to get the system work with >>>>>>>> Torque/Maui/OpenMPI now. >>>>>>> Make sure you build Open-MPI with support for Torques TM interface, >>>>>>> that will save you a lot of hassle as it means mpiexec/mpirun will >>>>>>> find out directly from Torque what nodes and processors have been >>>>>>> allocated for the job. >>>>>> Christopher, how would I check that? I got Torque/Maui/OpenMPI up, >>>>>> working with root (not with normal user yet :( !!!), tried mpirun >>>>>> and it >>>>>> worked fine: >>>>>> >>>> PS - Do 'qsub myjob' as a regular user, not as root. >>>> >>>>>> # /usr/lib64/openmpi/bin/mpirun -pernode --hostfile >>>>>> /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello >>>>>> Hello world! I am process number: 3 on host node0118 >>>>>> Hello world! I am process number: 1 on host node0104 >>>>>> Hello world! I am process number: 0 on host node0103 >>>>>> Hello world! I am process number: 2 on host node0117 >>>>>> >>>>>> Thanks, >>>>>> >>>>>> D. >>>>> D. >>>>> >>>>> Try to omit the hostfile from your mpirun command line, >>>>> put it inside a Torque/PBS script, and submit it with qsub. >>>>> Like this: >>>>> >>>>> ********************************* >>>>> myPBSScript.tcsh >>>>> ********************************* >>>>> #! /bin/tcsh >>>>> #PBS -l nodes=2:ppn=8 [Assuming your Torque 'nodes' file has np=8] >>>>> #PBS -q [email protected] >>>>> #PBS -N hello >>>>> @ NP = `cat $PBS_NODEFILE | wc -l` >>>>> mpirun -np ${NP} ./mpihello >>>>> ********************************* >>>>> >>>>> $ qsub myPBSScript.tcsh >>>>> >>>>> >>>>> If OpenMPI was built with Torque support, >>>>> the job will run on the nodes/processors allocated by Torque. >>>>> [The nodes/processors are listed in $PBS_NODEFILE, >>>>> but you don't need to refer to it in the mpirun line if >>>>> OpenMPI was built with Torque support. If OpenMPI lacks >>>>> Torque support, then you can use $PBS_NODEFILE as your hostfile: >>>>> mpirun -hostfile $PBS_NODEFILE.] >>>>> >>>>> If Torque was installed in a standard place, say under /usr, >>>>> then OpenMPI configure will pick it up automatically. >>>>> If not in a standard location, then add >>>>> --with-tm=/torque/directory >>>>> to the OpenMPI configure line. >>>>> [./configure --help is your friend!] >>>>> >>>>> Another check: >>>>> >>>>> $ ompi_info [tons of output that you can grep for "tm" to see >>>>> if Torque was picked up.] >>>>> >>>>> >>> OK, after a huge headache of torque/maui things, I finally found out >>> that my master node's system was a mess :D. Multiple version of torque >>> (via yum and via src etc...) which cause the confuse for different >>> users logging in (root or normal users) - well, mainly because I >>> followed different guides on the net. Then I decided to delete >>> everything related to pbs (torque, maui, openmpi) and start from >>> scratch. So I built torque rpms for masters/nodes, installed them, >>> then built maui rpm, installed with support for torque, then built >>> openmpi rpm with support for torque too. This time I think I got >>> almost everything: >>> >>> [mpiwulf@biobos:~]$ ompi_info | grep tm >>> MCA ras: tm (MCA v2.0, API v2.0, Component v1.6.3) >>> MCA plm: tm (MCA v2.0, API v2.0, Component v1.6.3) >>> MCA ess: tm (MCA v2.0, API v2.0, Component v1.6.3) >>> >>> openmpi now works with infiniband: >>> >>> [mpiwulf@biobos:~]$ /usr/local/bin/mpirun -mca btl ^tcp -pernode >>> --hostfile /home/mpiwulf/.openmpihostfile /home/mpiwulf/test/mpihello >>> Hello world! I am process number: 3 on host node0118 >>> Hello world! I am process number: 1 on host node0104 >>> Hello world! I am process number: 2 on host node0117 >>> Hello world! I am process number: 0 on host node0103 >>> >>> openmpi also works with torque: >>> >>> ---------------- >>> [mpiwulf@biobos:~]$ cat test/KCBATCH >>> #!/bin/bash >>> # >>> #PBS -l nodes=6:ppn=1 >>> #PBS -N kcTEST >>> #PBS -m be >>> #PBS -e qsub.er.log >>> #PBS -o qsub.ou.log >>> # >>> { time { >>> /usr/local/bin/mpirun /home/mpiwulf/test/mpihello >>> } }&>output.log >>> >>> [mpiwulf@biobos:~]$ qsub test/KCBATCH >>> 21.biobos >>> >>> [mpiwulf@biobos:~]$ cat output.log >>> -------------------------------------------------------------------------- >>> >>> The OpenFabrics (openib) BTL failed to initialize while trying to >>> allocate some locked memory. This typically can indicate that the >>> memlock limits are set too low. For most HPC installations, the >>> memlock limits should be set to "unlimited". The failure occured >>> here: >>> >>> Local host: node0103 >>> OMPI source: btl_openib_component.c:1200 >>> Function: ompi_free_list_init_ex_new() >>> Device: mthca0 >>> Memlock limit: 65536 >>> >>> You may need to consult with your system administrator to get this >>> problem fixed. This FAQ entry on the Open MPI web site may also be >>> helpful: >>> >>> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages >>> -------------------------------------------------------------------------- >>> >>> -------------------------------------------------------------------------- >>> >>> WARNING: There was an error initializing an OpenFabrics device. >>> >>> Local host: node0103 >>> Local device: mthca0 >>> -------------------------------------------------------------------------- >>> >>> Hello world! I am process number: 5 on host node0103 >>> Hello world! I am process number: 0 on host node0104 >>> Hello world! I am process number: 2 on host node0110 >>> Hello world! I am process number: 4 on host node0118 >>> Hello world! I am process number: 1 on host node0109 >>> Hello world! I am process number: 3 on host node0117 >>> [node0104:02221] 5 more processes have sent help message >>> help-mpi-btl-openib.txt / init-fail-no-mem >>> [node0104:02221] Set MCA parameter "orte_base_help_aggregate" to 0 to >>> see all help / error messages >>> [node0104:02221] 5 more processes have sent help message >>> help-mpi-btl-openib.txt / error in device init >>> >>> real 0m0.291s >>> user 0m0.034s >>> sys 0m0.043s >>> ---------------- >>> >>> Unfortunately I still got the problem of "error registering openib >>> memory" with non-interactive job. Any experience on this would be great. >> Got it now, though I *do not* really like the solution. I had to edit >> the pbs_mom daemon: >> >> # vi /etc/rc.d/init.d/pbs_mom >> >> and make sure to have: >> >> ulimit -l unlimited >> #ulimit -n 32768 >> >> and now openib works fine :). >> >> D. >> >> > Hi Duke > > It is great news that you've figured it all and got everything working.
Thanks! :) > > Yes, in a cluster, installing Torque,Maui, > and any MPI (OpenMPI, MVAPICH2, MPICH2, > etc) works much better from source than from yum, > because they can/will be configured to match your hardware, > the compilers of your choice, allow resource manager support (Torque), > etc. > But the yum RPMs are probably fine for work in a single workstation. > I should have mentioned that in my previous email. No problem. It was actually better for me, I learnt more :). > > It is worth keeping an eye at the Torque and Maui admin guides and > mailing lists, and likewise for the various MPI mailing lists, > as they are active and helpful: > > http://www.adaptivecomputing.com/support/documentation/ > http://www.supercluster.org/mailman/listinfo/torqueusers > http://www.supercluster.org/mailman/listinfo/mauiusers > > http://www.open-mpi.org/community/lists/ompi.php > http://mvapich.cse.ohio-state.edu/support/mailing_lists.shtml > http://www.mpich.org/support/mailing-lists/ Thanks, yes, I am already in most of these lists (but have not voiced much, since I dont even know what to ask :D). > > *** > > If you have an NFS filesystem or directory shared across the cluster, > you can install applications (MPI, compilers, etc) there > (but for Torque it is better to install it on local disks, as you did). > This scheme scales OK for small clusters, > and simplifies the installation and maintenance process. > Say, if you may want to keep different versions of MPI, > compiled with different compilers, etc, maintaining everything > can be time consuming. > However your solution of creating RPMs works for any cluster, > and is probably the best for large clusters. Hum, good to know this :). I decided to build rpms since after installing with sources (./configure, make, make install) sometimes it is very hard to remove the package. Installing using rpm is much better if I want to remove using rpm or yum. > > *** > > I suggest that you take a look at the environment modules package, > which a great tool that allows users to switch > their environment across different compilers, MPI versions, etc: > > http://modules.sourceforge.net/ > > In my opinion they work much better than hardwiring static choices > in .bashrc/.tcshrc or in /etc/profile.d Very interesting and useful advice. I will try this for sure. > > *** > > Yes, limits on locked memory are a hurdle for OpenFabrics > registered memory. > It seems to be an OpenFabrics problem, not an OpenMPI problem. > It may affect only the OMPI 1.6 series, not the older 1.4, but I > am not sure about this. > There have been several recent posts on the OpenMPI list of problems > similar to the one you had, if you care to check their archives: > > http://www.open-mpi.org/community/lists/users/ > > For most real applications and number crunching parallel jobs, > the default (and small) Linux stacksize may also be a hurdle, > so you may want to make it unlimited or at least larger. > Likewise, you may want to increase the maximum > number of open file handles. This can be done in the > pbs_mom script and perhaps also in /etc/security/limits.conf. I found a much better way: instead of modifying /etc/init.d/pbs_mom, I created a file /etc/sysconfig/pbs_mom and control the memory there. Bests, D. _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
