Re: [OMPI users] Error building openmpi on Raspberry pi 2
Faraz, which OS are you running ? iirc, i faced similar issues, and the root cause is that though ARMv7 does support these instructions, the compiler only generate ARMv6 code and hence failed to build Open MPI Cheers, Gilles On Wed, Sep 27, 2017 at 10:32 AM, Faraz Hussainwrote: > I am receiving the make errors below on my pi 2: > > pi@pi001:~/openmpi-2.1.1 $ uname -a > Linux pi001 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 armv7l > GNU/Linux > > pi@pi001:~/openmpi-2.1.1 $ make -j 4 > . > . > . > . > make[2]: Entering directory '/home/pi/openmpi-2.1.1/opal/asm' > CPPASatomic-asm.lo > atomic-asm.S: Assembler messages: > atomic-asm.S:7: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:15: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:23: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:55: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:70: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:86: Error: selected processor does not support ARM mode `ldrexd > r4,r5,[r0]' > atomic-asm.S:91: Error: selected processor does not support ARM mode `strexd > r1,r6,r7,[r0]' > atomic-asm.S:107: Error: selected processor does not support ARM mode > `ldrexd r4,r5,[r0]' > atomic-asm.S:112: Error: selected processor does not support ARM mode > `strexd r1,r6,r7,[r0]' > atomic-asm.S:115: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:130: Error: selected processor does not support ARM mode > `ldrexd r4,r5,[r0]' > atomic-asm.S:135: Error: selected processor does not support ARM mode `dmb' > atomic-asm.S:136: Error: selected processor does not support ARM mode > `strexd r1,r6,r7,[r0]' > Makefile:1743: recipe for target 'atomic-asm.lo' failed > make[2]: *** [atomic-asm.lo] Error 1 > make[2]: Leaving directory '/home/pi/openmpi-2.1.1/opal/asm' > Makefile:2307: recipe for target 'all-recursive' failed > make[1]: *** [all-recursive] Error 1 > make[1]: Leaving directory '/home/pi/openmpi-2.1.1/opal' > Makefile:1806: recipe for target 'all-recursive' failed > make: *** [all-recursive] Error 1 > > > ___ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Fwd: OpenMPI does not obey hostfile
Anthony, a few things ... - Open MPI v1.10 is no more supported - you should at least use v2.0, preferably v2.1 or even the newly released 3.0 - if you need to run under torque/pbs, then Open MPI should be built with tm support - openhpc.org provides Open MPI 1.10.7 with tm support Cheers, Gilles On Wed, Sep 27, 2017 at 12:57 PM, Anthony Thyssenwrote: > This is not explained in the manual, when giving a hostfile (though I was > suspecting that was the case). > > However running one process on each node listed WAS the default behaviour in > the past. In fact that is the default behaviour on a old Version 1.5.4 > OpenMPI, I have on an old cluster which I am replacing. > > I suggest that this be explicitly explained in at least the manpages, and > preferably the OpenMPI FAQ too. > > > It explains why the manpages and FAQ seems to avoids specifying a host twice > in a --hostfile, and yet specifically does specify a host twice in the next > section on the --hosts option. But no explanation is given! > > It explains why if I give a --pernode option, it runs only one process on > each host BUT ignores the fact that a host was listed twice. And if a -np > option was also given with --pernode errors with "more processes than the > ppr" > > > What that does NOT explain was why it completely ignores the "ALLOCATED > NODES" that was reported in the debug output, as shown above. > > The only reason I posted for help was because the debug out seems to > indicate that it should be performing as I expected. > > --- > > Is there an option to force OpenMPi to use the OLD behaviour? Just as many > web pages indicates it should be doing? > I have found no such option in the man pages. > > Without such an option, it makes the passing the $PBS_NODEFILE (from torque) > to the "mpirun" command much more difficult. Which was why I developed the > "awk" script above, or try an convert it to a comma separated --host > argument, that does work. > > It seems a LOT of webpages on the net all, assume the old behaviour of > --hostfile which is why this new behaviour is confusing me, especially with > no explicit mention of this behaviour in the manual or OpenMPI FAQ pages. > > --- > > I have seen many PBS guides specify a --np option for the MPI command. > Though I could not see the point of it. > > A quick test seemed to indicate that it works, so I thought perhaps that was > the way to specify the old behaviour. > > # mpirun --hostfile hostfile.txt hostname > node21.emperor > node22.emperor > node21.emperor > node22.emperor > node23.emperor > node23.emperor > > # mpirun --hostfile hostfile.txt --np $(wc -l node21.emperor > node22.emperor > node22.emperor > node21.emperor > > I think however that was purely a fluke. As when I expand it to a PBS batch > script command, to run on a larger number of nodes... > > mpirun --hostfile $PBS_NODEFILE -np $PBS_NP hostname > > Results is that OpenMPI still runs as many of the processes as it can (up to > the NP limit) on the first few nodes given. And node as Torque PBS > specified. > > --- > > ASIDE: The auto-discover does not appear to work very well. Tests with a mix > of dual and quad-core machines, often result in only > 2 processes on some of the quad-core machines. > > I saw mention of a --hetero-nodes which works to make auto-discovery work as > expected. BUT it is NOT mentioned in the manual, and to me "hetero" > implies a heterogeneous set of computers (all the same) rather than a mix of > computer types. As such the option name does not make any real sense to me. > > --- > > Now I have attempted to recompile the OpenMPI package, to include torque > support, but the RPM build specifications is overly complex (as is typical > for RHEL) . I have yet to succeed in getting a replacement OpenMPI package > with the "tm" resource manager, that works. Redhat has declared that it > will not do it as "Torque" is EPEL, and not RHEL as "OpenMPI" is. > > Also I hate having to build local versions of packages as it means I then no > longer get package updates automatically. > > > > On Wed, Sep 27, 2017 at 12:40 PM, r...@open-mpi.org wrote: >> >> That is correct. If you don’t specify a slot count, we auto-discover the >> number of cores on each node and set #slots to that number. If an RM is >> involved, then we use what they give us >> >> Sent from my iPad >> >> On Sep 26, 2017, at 8:11 PM, Anthony Thyssen >> wrote: >> >> >> I have been having problems with OpenMPI on a new cluster of machines, >> using >> stock RHEL7 packages. >> >> ASIDE: This will be used with Torque-PBS (from EPEL archives), though >> OpenMPI >> (currently) does not have the "tm" resource manager configured to use PBS, >> as you >> will be able to see in the debug output below. >> >> # mpirun -V >> mpirun (Open MPI) 1.10.6 >> >> # sudo yum list installed openmpi >> ... >> Installed Packages >> openmpi.x86_64
Re: [OMPI users] Fwd: OpenMPI does not obey hostfile
This is not explained in the manual, when giving a hostfile (though I was suspecting that was the case). However running one process on each node listed WAS the default behaviour in the past. In fact that is the default behaviour on a old Version 1.5.4 OpenMPI, I have on an old cluster which I am replacing. I suggest that this be explicitly explained in at least the manpages, and preferably the OpenMPI FAQ too. It explains why the manpages and FAQ seems to avoids specifying a host twice in a --hostfile, and yet specifically does specify a host twice in the next section on the --hosts option. But no explanation is given! It explains why if I give a --pernode option, it runs only one process on each host BUT ignores the fact that a host was listed twice. And if a -np option was also given with --pernode errors with "more processes than the ppr" What that does NOT explain was why it completely ignores the "ALLOCATED NODES" that was reported in the debug output, as shown above. The only reason I posted for help was because the debug out seems to indicate that it should be performing as I expected. --- *Is there an option to force OpenMPi to use the OLD behaviour?* Just as many web pages indicates it should be doing? I have found no such option in the man pages. Without such an option, it makes the passing the $PBS_NODEFILE (from torque) to the "mpirun" command much more difficult. Which was why I developed the "awk" script above, or try an convert it to a comma separated --host argument, that does work. It seems a LOT of webpages on the net all, assume the old behaviour of --hostfile which is why this new behaviour is confusing me, especially with no explicit mention of this behaviour in the manual or OpenMPI FAQ pages. --- I have seen many PBS guides specify a --np option for the MPI command. Though I could not see the point of it. A quick test seemed to indicate that it works, so I thought perhaps that was the way to specify the old behaviour. *# mpirun --hostfile hostfile.txt hostname* node21.emperor node22.emperor node21.emperor node22.emperor node23.emperor node23.emperor *# mpirun --hostfile hostfile.txt --np $(wc -lwrote: > That is correct. If you don’t specify a slot count, we auto-discover the > number of cores on each node and set #slots to that number. If an RM is > involved, then we use what they give us > > Sent from my iPad > > On Sep 26, 2017, at 8:11 PM, Anthony Thyssen > wrote: > > > I have been having problems with OpenMPI on a new cluster of machines, > using > stock RHEL7 packages. > > ASIDE: This will be used with Torque-PBS (from EPEL archives), though > OpenMPI > (currently) does not have the "tm" resource manager configured to use PBS, > as you > will be able to see in the debug output below. > > *# mpirun -V* > mpirun (Open MPI) 1.10.6 > > *# sudo yum list installed openmpi* > ... > Installed Packages > openmpi.x86_641.10.6-2.el7@rhel-7-server-rpms > ... > > More than likely I am doing something fundamentally stupid, but I have no > idea what. > > The problem is that OpenMPI is not obeying the given hostfile, and running > one > process on each host given in the list. The manual and all my (meagre) > experience > is that that is what it is meant to do. > > Instead it runs the maximum number of processes that is allowed to run for > the CPU > of that machine. That is a nice feature, but NOT what is wanted. > > There is no "/etc/openmpi-x86_64/openmpi-default-hostfile"
Re: [OMPI users] Fwd: OpenMPI does not obey hostfile
That is correct. If you don’t specify a slot count, we auto-discover the number of cores on each node and set #slots to that number. If an RM is involved, then we use what they give us Sent from my iPad > On Sep 26, 2017, at 8:11 PM, Anthony Thyssen> wrote: > > > I have been having problems with OpenMPI on a new cluster of machines, using > stock RHEL7 packages. > > ASIDE: This will be used with Torque-PBS (from EPEL archives), though OpenMPI > (currently) does not have the "tm" resource manager configured to use PBS, as > you > will be able to see in the debug output below. > > # mpirun -V > mpirun (Open MPI) 1.10.6 > > # sudo yum list installed openmpi > ... > Installed Packages > openmpi.x86_641.10.6-2.el7@rhel-7-server-rpms > ... > > More than likely I am doing something fundamentally stupid, but I have no > idea what. > > The problem is that OpenMPI is not obeying the given hostfile, and running one > process on each host given in the list. The manual and all my (meagre) > experience > is that that is what it is meant to do. > > Instead it runs the maximum number of processes that is allowed to run for > the CPU > of that machine. That is a nice feature, but NOT what is wanted. > > There is no "/etc/openmpi-x86_64/openmpi-default-hostfile" configuration > present. > > For example given the hostfile > > # cat hostfile.txt > node21.emperor > node22.emperor > node22.emperor > node23.emperor > > Running OpenMPI on the head node "shrek", I get the following, > (ras debugging enabled to see the result) > > # mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello > [shrek.emperor:93385] mca:base:select:( ras) Querying component [gridengine] > [shrek.emperor:93385] mca:base:select:( ras) Skipping component > [gridengine]. Query failed to return a module > [shrek.emperor:93385] mca:base:select:( ras) Querying component [loadleveler] > [shrek.emperor:93385] mca:base:select:( ras) Skipping component > [loadleveler]. Query failed to return a module > [shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator] > [shrek.emperor:93385] mca:base:select:( ras) Skipping component [simulator]. > Query failed to return a module > [shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm] > [shrek.emperor:93385] mca:base:select:( ras) Skipping component [slurm]. > Query failed to return a module > [shrek.emperor:93385] mca:base:select:( ras) No component selected! > > == ALLOCATED NODES == > node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN > node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN > = > Hello World! from process 0 out of 6 on node21.emperor > Hello World! from process 2 out of 6 on node22.emperor > Hello World! from process 1 out of 6 on node21.emperor > Hello World! from process 3 out of 6 on node22.emperor > Hello World! from process 4 out of 6 on node23.emperor > Hello World! from process 5 out of 6 on node23.emperor > > These machines are all dual core CPU's. If a quad core is added to the list > I get 4 processes on that node. And so on, BUT NOT always. > > Note that the "ALLOCATED NODES" list is NOT obeyed. > > If on the other hand I add "slot=#" to the provided hostfile it works as > expected! > (the debug output was not included as it is essentially the same as above) > > # awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt > > hostfile_slots.txt > # cat hostfile_slots.txt > node23.emperor slots=1 > node22.emperor slots=2 > node21.emperor slots=1 > > # mpirun --hostfile hostfile_slots.txt mpi_hello > Hello World! from process 0 out of 4 on node23.emperor > Hello World! from process 1 out of 4 on node22.emperor > Hello World! from process 3 out of 4 on node21.emperor > Hello World! from process 2 out of 4 on node22.emperor > > Or if I convert the hostfile into a comma separated host list it also works. > > # tr '\n' , node21.emperor,node22.emperor,node22.emperor,node23.emperor, > # mpirun --host $(tr '\n' , Hello World! from process 0 out of 4 on node21.emperor > Hello World! from process 1 out of 4 on node22.emperor > Hello World! from process 3 out of 4 on node23.emperor > Hello World! from process 2 out of 4 on node22.emperor > > > Any help as to why --hostfile does not work as expected and debugged says it > should be working would be appreciated. > > As you can see I have been studing this problem a long time. Google has not > been very helpful. All I seem to get are man pages, and general help guides. > > > Anthony Thyssen ( System Programmer ) > -- > All the books of Power
[OMPI users] Error building openmpi on Raspberry pi 2
I am receiving the make errors below on my pi 2: pi@pi001:~/openmpi-2.1.1 $ uname -a Linux pi001 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 armv7l GNU/Linux pi@pi001:~/openmpi-2.1.1 $ make -j 4 . . . . make[2]: Entering directory '/home/pi/openmpi-2.1.1/opal/asm' CPPASatomic-asm.lo atomic-asm.S: Assembler messages: atomic-asm.S:7: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:15: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:23: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:55: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:70: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:86: Error: selected processor does not support ARM mode `ldrexd r4,r5,[r0]' atomic-asm.S:91: Error: selected processor does not support ARM mode `strexd r1,r6,r7,[r0]' atomic-asm.S:107: Error: selected processor does not support ARM mode `ldrexd r4,r5,[r0]' atomic-asm.S:112: Error: selected processor does not support ARM mode `strexd r1,r6,r7,[r0]' atomic-asm.S:115: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:130: Error: selected processor does not support ARM mode `ldrexd r4,r5,[r0]' atomic-asm.S:135: Error: selected processor does not support ARM mode `dmb' atomic-asm.S:136: Error: selected processor does not support ARM mode `strexd r1,r6,r7,[r0]' Makefile:1743: recipe for target 'atomic-asm.lo' failed make[2]: *** [atomic-asm.lo] Error 1 make[2]: Leaving directory '/home/pi/openmpi-2.1.1/opal/asm' Makefile:2307: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/pi/openmpi-2.1.1/opal' Makefile:1806: recipe for target 'all-recursive' failed make: *** [all-recursive] Error 1 ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Fwd: OpenMPI does not obey hostfile
I have been having problems with OpenMPI on a new cluster of machines, using stock RHEL7 packages. ASIDE: This will be used with Torque-PBS (from EPEL archives), though OpenMPI (currently) does not have the "tm" resource manager configured to use PBS, as you will be able to see in the debug output below. *# mpirun -V* mpirun (Open MPI) 1.10.6 *# sudo yum list installed openmpi* ... Installed Packages openmpi.x86_641.10.6-2.el7@rhel-7-server-rpms ... More than likely I am doing something fundamentally stupid, but I have no idea what. The problem is that OpenMPI is not obeying the given hostfile, and running one process on each host given in the list. The manual and all my (meagre) experience is that that is what it is meant to do. Instead it runs the maximum number of processes that is allowed to run for the CPU of that machine. That is a nice feature, but NOT what is wanted. There is no "/etc/openmpi-x86_64/openmpi-default-hostfile" configuration present. For example given the hostfile *# cat hostfile.txt* node21.emperor node22.emperor node22.emperor node23.emperor Running OpenMPI on the head node "shrek", I get the following, (ras debugging enabled to see the result) *# mpirun --hostfile hostfile.txt --mca ras_base_verbose 5 mpi_hello* [shrek.emperor:93385] mca:base:select:( ras) Querying component [gridengine] [shrek.emperor:93385] mca:base:select:( ras) Skipping component [gridengine]. Query failed to return a module [shrek.emperor:93385] mca:base:select:( ras) Querying component [loadleveler] [shrek.emperor:93385] mca:base:select:( ras) Skipping component [loadleveler]. Query failed to return a module [shrek.emperor:93385] mca:base:select:( ras) Querying component [simulator] [shrek.emperor:93385] mca:base:select:( ras) Skipping component [simulator]. Query failed to return a module [shrek.emperor:93385] mca:base:select:( ras) Querying component [slurm] [shrek.emperor:93385] mca:base:select:( ras) Skipping component [slurm]. Query failed to return a module [shrek.emperor:93385] mca:base:select:( ras) No component selected! == ALLOCATED NODES == node21.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN node22.emperor: slots=2 max_slots=0 slots_inuse=0 state=UNKNOWN node23.emperor: slots=1 max_slots=0 slots_inuse=0 state=UNKNOWN = Hello World! from process 0 out of 6 on node21.emperor Hello World! from process 2 out of 6 on node22.emperor Hello World! from process 1 out of 6 on node21.emperor Hello World! from process 3 out of 6 on node22.emperor Hello World! from process 4 out of 6 on node23.emperor Hello World! from process 5 out of 6 on node23.emperor These machines are all dual core CPU's. If a quad core is added to the list I get 4 processes on that node. And so on, BUT NOT always. *Note that the "ALLOCATED NODES" list is NOT obeyed.* If on the other hand I add "slot=#" to the provided hostfile it works as expected! (the debug output was not included as it is essentially the same as above) *# awk '{n[$0]++} END {for(i in n)print i,"slots="n[i]}' hostfile.txt > hostfile_slots.txt* *# cat hostfile_slots.txt* node23.emperor slots=1 node22.emperor slots=2 node21.emperor slots=1 *# mpirun --hostfile hostfile_slots.txt mpi_hello* Hello World! from process 0 out of 4 on node23.emperor Hello World! from process 1 out of 4 on node22.emperor Hello World! from process 3 out of 4 on node21.emperor Hello World! from process 2 out of 4 on node22.emperor Or if I convert the hostfile into a comma separated host list it also works. *# tr '\n' ,-- All the books of Power had their own particular nature. The "Octavo" was harsh and imperious. The "Bumper Fun Grimore" went in for deadly practical jokes. The "Joy of Tantric Sex" had to be kept under iced water. -- Terry Pratchett, "Moving Pictures" -- ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users