[OMPI users] pmix error with openmpi-master-201702010209-6cb484a on Linux

2017-02-02 Thread Siegmar Gross

Hi,

I have installed openmpi-master-201702010209-6cb484a on my "SUSE Linux
Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0.
Unfortunately, I get errors when I run my spawn programs.


loki spawn 107 mpiexec -np 1 --host loki,loki,nfs1 spawn_intra_comm
Parent process 0: I create 2 slave processes
[nfs1:27716] PMIX ERROR: ERROR in file 
../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/dstore/pmix_esh.c 
at line 1029
[nfs1:27716] PMIX ERROR: ERROR in file 
../../../../../../../openmpi-master-201702010209-6cb484a/opal/mca/pmix/pmix2x/pmix/src/server/pmix_server_get.c 
at line 501

--
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[42193,2],1]) is on host: nfs1
  Process 2 ([[42193,1],0]) is on host: unknown!
  BTLs attempted: self tcp

Your MPI job is now going to abort; sorry.
--
[nfs1:27727] [[42193,2],1] ORTE_ERROR_LOG: Unreachable in file 
../../openmpi-master-201702010209-6cb484a/ompi/dpm/dpm.c at line 426

--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_dpm_dyn_init() failed
  --> Returned "Unreachable" (-12) instead of "Success" (0)
--
[nfs1:27727] *** An error occurred in MPI_Init
[nfs1:27727] *** reported by process [2765160450,1]
[nfs1:27727] *** on a NULL communicator
[nfs1:27727] *** Unknown error
[nfs1:27727] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now 
abort,

[nfs1:27727] ***and potentially your MPI job)
loki spawn 108



I used the following commands to build and install the package.
${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64" for my
Linux machine. Options "--enable-mpi-cxx-bindings and
"--enable-mpi-thread-multiple" are now unrecognized. Probably
they are now automatically supported. "configure" reports a
warning that I should report.


mkdir openmpi-master-201702010209-6cb484a-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
cd openmpi-master-201702010209-6cb484a-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc

../openmpi-master-201702010209-6cb484a/configure \
  --prefix=/usr/local/openmpi-master_64_cc \
  --libdir=/usr/local/openmpi-master_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
  JAVA_HOME=/usr/local/jdk1.8.0_66 \
  LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  --enable-mpi-cxx \
  --enable-mpi-cxx-bindings \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-mpi-thread-multiple \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64" \
  --with-wrapper-fcflags="-m64" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
rm -r /usr/local/openmpi-master_64_cc.old
mv /usr/local/openmpi-master_64_cc /usr/local/openmpi-master_64_cc.old
make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc



...
checking numaif.h usability... no
checking numaif.h presence... yes
configure: WARNING: numaif.h: present but cannot be compiled
configure: WARNING: numaif.h: check for missing prerequisite headers?
configure: WARNING: numaif.h: see the Autoconf documentation
configure: WARNING: numaif.h: section "Present But Cannot Be Compiled"
configure: WARNING: numaif.h: proceeding with the compiler's result
configure: WARNING: ## 
-- ##
configure: WARNING: ## Report this to 
http://www.open-mpi.org/community/help/ ##
configure: WARNING: ## 
-- ##
checking for numaif.h... no
...




I get the following errors, if I run "spawn_master" or "spawn_multiple_master".

loki spawn 108 mpiexec -np 1 --host loki,loki,loki,nfs1,nfs1 spawn_master

Parent process 0 running on loki
  I create 4 slave processes

[nfs1:29189] *** Process received signal ***
[nfs1:29189] 

[OMPI users] problem with opal_list_remove_item for openmpi-v2.x-201702010255-8b16747 on Linux

2017-02-02 Thread Siegmar Gross

Hi,

I have installed openmpi-v2.x-201702010255-8b16747 on my "SUSE Linux
Enterprise Server 12.2 (x86_64)" with Sun C 5.14 and gcc-6.3.0.
Unfortunately, I get a warning from "opal_list_remove_item" about a
missing item when I run one of my programs.

loki spawn 115 mpiexec -np 1 --host loki,loki,nfs1 spawn_intra_comm
Parent process 0: I create 2 slave processes

Parent process 0 running on loki
MPI_COMM_WORLD ntasks:  1
COMM_CHILD_PROCESSES ntasks_local:  1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks:  3
mytid in COMM_ALL_PROCESSES:0

Child process 0 running on loki
MPI_COMM_WORLD ntasks:  2
COMM_ALL_PROCESSES ntasks:  3
mytid in COMM_ALL_PROCESSES:1

Child process 1 running on nfs1
MPI_COMM_WORLD ntasks:  2
COMM_ALL_PROCESSES ntasks:  3
mytid in COMM_ALL_PROCESSES:2
 Warning :: opal_list_remove_item - the item 0xc45f80 is not on the list 
0x7f5bb1f34978

loki spawn 116


I used the following commands to build and install the package.
${SYSTEM_ENV} is "Linux" and ${MACHINE_ENV} is "x86_64" for my
Linux machine. Option "--enable-mpi-cxx-bindings is now
unrecognized. Are cxx-bindings now automatically supported?
"configure" reports a warning that I should report.

mkdir openmpi-v2.x-201702010255-8b16747-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc
cd openmpi-v2.x-201702010255-8b16747-${SYSTEM_ENV}.${MACHINE_ENV}.64_cc

../openmpi-v2.x-201702010255-8b16747/configure \
  --prefix=/usr/local/openmpi-2.1.0_64_cc \
  --libdir=/usr/local/openmpi-2.1.0_64_cc/lib64 \
  --with-jdk-bindir=/usr/local/jdk1.8.0_66/bin \
  --with-jdk-headers=/usr/local/jdk1.8.0_66/include \
  JAVA_HOME=/usr/local/jdk1.8.0_66 \
  LDFLAGS="-m64 -mt -Wl,-z -Wl,noexecstack" CC="cc" CXX="CC" FC="f95" \
  CFLAGS="-m64 -mt" CXXFLAGS="-m64" FCFLAGS="-m64" \
  CPP="cpp" CXXCPP="cpp" \
  --enable-mpi-cxx \
  --enable-mpi-cxx-bindings \
  --enable-cxx-exceptions \
  --enable-mpi-java \
  --enable-mpi-thread-multiple \
  --with-hwloc=internal \
  --without-verbs \
  --with-wrapper-cflags="-m64 -mt" \
  --with-wrapper-cxxflags="-m64" \
  --with-wrapper-fcflags="-m64" \
  --with-wrapper-ldflags="-mt" \
  --enable-debug \
  |& tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc

make |& tee log.make.$SYSTEM_ENV.$MACHINE_ENV.64_cc
rm -r /usr/local/openmpi-2.1.0_64_cc.old
mv /usr/local/openmpi-2.1.0_64_cc /usr/local/openmpi-2.1.0_64_cc.old
make install |& tee log.make-install.$SYSTEM_ENV.$MACHINE_ENV.64_cc
make check |& tee log.make-check.$SYSTEM_ENV.$MACHINE_ENV.64_cc




...
checking numaif.h usability... no
checking numaif.h presence... yes
configure: WARNING: numaif.h: present but cannot be compiled
configure: WARNING: numaif.h: check for missing prerequisite headers?
configure: WARNING: numaif.h: see the Autoconf documentation
configure: WARNING: numaif.h: section "Present But Cannot Be Compiled"
configure: WARNING: numaif.h: proceeding with the compiler's result
configure: WARNING: ## 
-- ##
configure: WARNING: ## Report this to 
http://www.open-mpi.org/community/help/ ##
configure: WARNING: ## 
-- ##
checking for numaif.h... no
...


I would be grateful, if somebody can fix the problems. do you need anything
else? Thank you very much for any help in advance.


Kind regards

Siegmar
#include 
#include 
#include "mpi.h"

#define NUM_SLAVES	2		/* create NUM_SLAVES processes	*/


int main (int argc, char *argv[])
{
  MPI_Comm COMM_ALL_PROCESSES,		/* intra-communicator		*/
	   COMM_CHILD_PROCESSES,	/* inter-communicator		*/
	   COMM_PARENT_PROCESSES;	/* inter-communicator		*/
  int	   ntasks_world,		/* # of tasks in MPI_COMM_WORLD	*/
	   ntasks_local,		/* COMM_CHILD_PROCESSES local	*/
	   ntasks_remote,		/* COMM_CHILD_PROCESSES remote	*/
	   ntasks_all,			/* tasks in COMM_ALL_PROCESSES	*/
	   mytid_world,			/* my task id in MPI_COMM_WORLD	*/
	   mytid_all,			/* id in COMM_ALL_PROCESSES	*/
	   namelen;			/* length of processor name	*/
  char	   processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init (, );
  MPI_Comm_rank (MPI_COMM_WORLD, _world);
  /* At first we must decide if this program is executed from a parent
   * or child process because only a parent is allowed to spawn child
   * processes (otherwise the child process with rank 0 would spawn
   * itself child processes and so on). "MPI_Comm_get_parent ()"
   * returns the parent inter-communicator for a spawned MPI rank and
   * MPI_COMM_NULL if the process wasn't spawned, i.e. it was started
   * statically via "mpiexec" on the command line.
   */
  MPI_Comm_get_parent (_PARENT_PROCESSES);
  if (COMM_PARENT_PROCESSES == MPI_COMM_NULL)
  {
/* All parent processes must call "MPI_Comm_spawn ()" but only
 * the root process (in our case the process with rank 0) will
 * spawn child processes. All other processes of the
 * 

Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Michel Lesoinne
Howard,

First, thanks to you and Jeff for looking into this with me. 
I tried ../configure --disable-shared --enable-static --prefix ~/.local
The result is the same as without --disable-shared. i.e. I get the
following error:

[Michels-MacBook-Pro.local:92780] [[46617,0],0] ORTE_ERROR_LOG: Bad
parameter in file ../../orte/orted/pmix/pmix_server.c at line 262

[Michels-MacBook-Pro.local:92780] [[46617,0],0] ORTE_ERROR_LOG: Bad
parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line
666

--

It looks like orte_init failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during orte_init; some of which are due to configuration or

environment problems.  This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):


  pmix server init failed

  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS

--

On Thu, Feb 2, 2017 at 12:29 PM, Howard Pritchard 
wrote:

> Hi Michel
>
> Try adding --enable-static to the configure.
> That fixed the problem for me.
>
> Howard
>
> Michel Lesoinne  schrieb am Mi. 1. Feb. 2017 um
> 19:07:
>
>> I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and
>> have been trying to run simple program.
>> I configured openmpi with
>> ../configure --disable-shared --prefix ~/.local
>> make all install
>>
>> Then I have  a simple code only containing a call to MPI_Init.
>> I compile it with
>> mpirun -np 2 ./mpitest
>>
>> The output is:
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_patcher_overwrite: File not found (ignored)
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_shmem_mmap: File not found (ignored)
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_shmem_posix: File not found (ignored)
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_shmem_sysv: File not found (ignored)
>>
>> 
>> --
>>
>> It looks like opal_init failed for some reason; your parallel process is
>>
>> likely to abort.  There are many reasons that a parallel process can
>>
>> fail during opal_init; some of which are due to configuration or
>>
>> environment problems.  This failure appears to be an internal failure;
>>
>> here's some additional information (which may only be relevant to an
>>
>> Open MPI developer):
>>
>>
>>   opal_shmem_base_select failed
>>
>>   --> Returned value -1 instead of OPAL_SUCCESS
>>
>> 
>> --
>>
>> Without the --disable-shared in the configuration, then I get:
>>
>>
>> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file ../../orte/orted/pmix/pmix_server.c at line 264
>>
>> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at
>> line 666
>>
>> 
>> --
>>
>> It looks like orte_init failed for some reason; your parallel process is
>>
>> likely to abort.  There are many reasons that a parallel process can
>>
>> fail during orte_init; some of which are due to configuration or
>>
>> environment problems.  This failure appears to be an internal failure;
>>
>> here's some additional information (which may only be relevant to an
>>
>> Open MPI developer):
>>
>>
>>   pmix server init failed
>>
>>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>>
>> 
>> --
>>
>>
>>
>>
>> Has anyone seen this? What am I missing?
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Michel Lesoinne
No previous version installed. This is a brand new laptop on which no MPI
of any kind was installed previously.

On Thu, Feb 2, 2017 at 6:11 AM, Jeff Squyres (jsquyres) 
wrote:

> Michel --
>
> Also, did you install Open MPI v2.0.2 over a prior version of Open MPI
> (i.e., with the same prefix value to configure)?  That would almost
> certainly cause a problem.
>
>
> > On Feb 2, 2017, at 7:56 AM, Howard Pritchard 
> wrote:
> >
> > Hi Michel
> >
> > It's somewhat unusual to use the disable-shared  configure option.  That
> may be causing this.  Could you try to build without using this option and
> see if you still see the problem?
> >
> >
> > Thanks,
> >
> > Howard
> >
> > Michel Lesoinne  schrieb am Mi. 1. Feb. 2017
> um 21:07:
> > I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and
> have been trying to run simple program.
> > I configured openmpi with
> > ../configure --disable-shared --prefix ~/.local
> > make all install
> >
> > Then I have  a simple code only containing a call to MPI_Init.
> > I compile it with
> > mpirun -np 2 ./mpitest
> >
> > The output is:
> >
> > [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_patcher_overwrite: File not found (ignored)
> >
> > [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_mmap: File not found (ignored)
> >
> > [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_posix: File not found (ignored)
> >
> > [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_sysv: File not found (ignored)
> >
> > 
> --
> >
> > It looks like opal_init failed for some reason; your parallel process is
> >
> > likely to abort.  There are many reasons that a parallel process can
> >
> > fail during opal_init; some of which are due to configuration or
> >
> > environment problems.  This failure appears to be an internal failure;
> >
> > here's some additional information (which may only be relevant to an
> >
> > Open MPI developer):
> >
> >
> >
> >   opal_shmem_base_select failed
> >
> >   --> Returned value -1 instead of OPAL_SUCCESS
> >
> > 
> --
> >
> > Without the --disable-shared in the configuration, then I get:
> >
> >
> >
> > [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../orte/orted/pmix/pmix_server.c at line 264
> >
> > [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at
> line 666
> >
> > 
> --
> >
> > It looks like orte_init failed for some reason; your parallel process is
> >
> > likely to abort.  There are many reasons that a parallel process can
> >
> > fail during orte_init; some of which are due to configuration or
> >
> > environment problems.  This failure appears to be an internal failure;
> >
> > here's some additional information (which may only be relevant to an
> >
> > Open MPI developer):
> >
> >
> >
> >   pmix server init failed
> >
> >   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
> >
> >
> > 
> --
> >
> >
> >
> >
> >
> >
> >
> > Has anyone seen this? What am I missing?
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Michel Lesoinne
I am disabling shared to make it a bit easier, not having to setup the
DYLD_LIBRARY_PATH. But not disabling 'shared' results in similar behavior,
though with different messages (the line numbers are a bit shifted as I
tried to debug some arguments):

*[Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
parameter in file ../../orte/orted/pmix/pmix_server.c at line 264*

*[Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line
666*

*--*

*It looks like orte_init failed for some reason; your parallel process is*

*likely to abort.  There are many reasons that a parallel process can*

*fail during orte_init; some of which are due to configuration or*

*environment problems.  This failure appears to be an internal failure;*

*here's some additional information (which may only be relevant to an*

*Open MPI developer):*


*  pmix server init failed*

*  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS*

--

On Thu, Feb 2, 2017 at 5:56 AM, Howard Pritchard 
wrote:

> Hi Michel
>
> It's somewhat unusual to use the disable-shared  configure option.  That
> may be causing this.  Could you try to build without using this option and
> see if you still see the problem?
>
>
> Thanks,
>
> Howard
>
> Michel Lesoinne  schrieb am Mi. 1. Feb. 2017 um
> 21:07:
>
>> I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and
>> have been trying to run simple program.
>> I configured openmpi with
>> ../configure --disable-shared --prefix ~/.local
>> make all install
>>
>> Then I have  a simple code only containing a call to MPI_Init.
>> I compile it with
>> mpirun -np 2 ./mpitest
>>
>> The output is:
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_patcher_overwrite: File not found (ignored)
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_shmem_mmap: File not found (ignored)
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_shmem_posix: File not found (ignored)
>>
>> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
>> unable to open mca_shmem_sysv: File not found (ignored)
>>
>> 
>> --
>>
>> It looks like opal_init failed for some reason; your parallel process is
>>
>> likely to abort.  There are many reasons that a parallel process can
>>
>> fail during opal_init; some of which are due to configuration or
>>
>> environment problems.  This failure appears to be an internal failure;
>>
>> here's some additional information (which may only be relevant to an
>>
>> Open MPI developer):
>>
>>
>>   opal_shmem_base_select failed
>>
>>   --> Returned value -1 instead of OPAL_SUCCESS
>>
>> 
>> --
>>
>> Without the --disable-shared in the configuration, then I get:
>>
>>
>> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file ../../orte/orted/pmix/pmix_server.c at line 264
>>
>> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
>> parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at
>> line 666
>>
>> 
>> --
>>
>> It looks like orte_init failed for some reason; your parallel process is
>>
>> likely to abort.  There are many reasons that a parallel process can
>>
>> fail during orte_init; some of which are due to configuration or
>>
>> environment problems.  This failure appears to be an internal failure;
>>
>> here's some additional information (which may only be relevant to an
>>
>> Open MPI developer):
>>
>>
>>   pmix server init failed
>>
>>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>>
>> 
>> --
>>
>>
>>
>>
>> Has anyone seen this? What am I missing?
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Howard Pritchard
Hi Michel

Try adding --enable-static to the configure.
That fixed the problem for me.

Howard

Michel Lesoinne  schrieb am Mi. 1. Feb. 2017 um
19:07:

> I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and have
> been trying to run simple program.
> I configured openmpi with
> ../configure --disable-shared --prefix ~/.local
> make all install
>
> Then I have  a simple code only containing a call to MPI_Init.
> I compile it with
> mpirun -np 2 ./mpitest
>
> The output is:
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_patcher_overwrite: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_mmap: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_posix: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_sysv: File not found (ignored)
>
> --
>
> It looks like opal_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during opal_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>   opal_shmem_base_select failed
>
>   --> Returned value -1 instead of OPAL_SUCCESS
>
> --
>
> Without the --disable-shared in the configuration, then I get:
>
>
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../orte/orted/pmix/pmix_server.c at line 264
>
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line
> 666
>
> --
>
> It looks like orte_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during orte_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>   pmix server init failed
>
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>
> --
>
>
>
>
> Has anyone seen this? What am I missing?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Error using hpcc benchmark

2017-02-02 Thread wodel youchi
Hi Cabral, and thank you.

I started hpcc benchmark using -x PSM_MEMORY=large without any error, I
didn't finish the test for now, but I waited about 10 minutes, and this
time no errors, I even augmented the Ns variable on hpccint.txt and started
the test again without problem.

The cluster is composed of :
- one management node
- 32 compute nodes, each one has 16 cores (2sockets x 8 cores), 32GB of
RAM, and intel qle_7340 one port infiniband 40Gb/s card

I used this site to generate the input file for hpcc :
http://www.advancedclustering.com/act-kb/tune-hpl-dat-file/
with some modifications :

1# of problems sizes (N)
331520 Ns
1# of NBs
128   NBs
0PMAP process mapping (0=Row-,1=Column-major)
1# of process grids (P x Q)
16Ps
32Qs

The Ns here represents almost 90% of the total memory of the cluster. the
total number of processes is 512, each node will start 16 processes 1 per
core.

Before modifying the PSM_MEMORY value, the test exited with the mentioned
error, even with lower values of Ns.

I find it weird, that there is no mention of this variable anywhere in the
net, not even in the intel true scale ofed+ documentation!!!???

Thanks again.




2017-02-01 22:12 GMT+01:00 Cabral, Matias A :

> Hi Wodel,
>
>
>
> As you already figured out, mpirun -x 

[hwloc-users] Difference in core numbering for output formats

2017-02-02 Thread Gunter, David O
Can anyone explain why I get different outputs for Core IDs when using lstopo’s 
graphical output versus the text output I get with lstopo —only core?

On a dual 10-core Sandy Bridge node I see two sets of cores, with IDs 
#0,1,2,3,4,8,9,10,11,12. This corresponds with the core IDs I see if I cat 
/proc/cpuinfo.

However, the output from lstopo —only core gives
Core L#0
Core L#1
Core L#2
Core L#3
Core L#4
Core L#5
Core L#6
Core L#7
Core L#8
Core L#9
Core L#10
Core L#11
Core L#12
Core L#13
Core L#14
Core L#15
Core L#16
Core L#17
Core L#18
Core L#19

Why would it be different from the previous?

Thanks,
david
--
David Gunter
HPC-ENV: Applications Readiness Team
Los Alamos National Laboratory




___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users

Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Howard Pritchard
Hi Michael,

I reproduced this problem on my Mac too:

pn1249323:~/ompi/examples (v2.0.x *)$ mpirun -np 2 ./ring_c

[pn1249323.lanl.gov:94283] mca_base_component_repository_open: unable to
open mca_patcher_overwrite: File not found (ignored)

[pn1249323.lanl.gov:94283] mca_base_component_repository_open: unable to
open mca_shmem_mmap: File not found (ignored)

[pn1249323.lanl.gov:94283] mca_base_component_repository_open: unable to
open mca_shmem_posix: File not found (ignored)

[pn1249323.lanl.gov:94283] mca_base_component_repository_open: unable to
open mca_shmem_sysv: File not found (ignored)

--

It looks like opal_init failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during opal_init; some of which are due to configuration or

environment problems.  This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):


  opal_shmem_base_select failed

  --> Returned value -1 instead of OPAL_SUCCESS

Is there a reason why you are using the --disable-shared option?  Can you
use the --disable-dlopen instead?

I'll do some more investigating and open an issue.

Howard



2017-02-01 19:05 GMT-07:00 Michel Lesoinne :

> I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and have
> been trying to run simple program.
> I configured openmpi with
> ../configure --disable-shared --prefix ~/.local
> make all install
>
> Then I have  a simple code only containing a call to MPI_Init.
> I compile it with
> mpirun -np 2 ./mpitest
>
> The output is:
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_patcher_overwrite: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_mmap: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_posix: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_sysv: File not found (ignored)
>
> --
>
> It looks like opal_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during opal_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>   opal_shmem_base_select failed
>
>   --> Returned value -1 instead of OPAL_SUCCESS
>
> --
>
> Without the --disable-shared in the configuration, then I get:
>
>
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../orte/orted/pmix/pmix_server.c at line 264
>
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at
> line 666
>
> --
>
> It looks like orte_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during orte_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>   pmix server init failed
>
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>
> --
>
>
>
>
> Has anyone seen this? What am I missing?
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Jeff Squyres (jsquyres)
Michel --

Also, did you install Open MPI v2.0.2 over a prior version of Open MPI (i.e., 
with the same prefix value to configure)?  That would almost certainly cause a 
problem.


> On Feb 2, 2017, at 7:56 AM, Howard Pritchard  wrote:
> 
> Hi Michel
> 
> It's somewhat unusual to use the disable-shared  configure option.  That may 
> be causing this.  Could you try to build without using this option and see if 
> you still see the problem?
> 
> 
> Thanks,
> 
> Howard
> 
> Michel Lesoinne  schrieb am Mi. 1. Feb. 2017 um 
> 21:07:
> I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and have 
> been trying to run simple program.
> I configured openmpi with
> ../configure --disable-shared --prefix ~/.local
> make all install
> 
> Then I have  a simple code only containing a call to MPI_Init.
> I compile it with
> mpirun -np 2 ./mpitest
> 
> The output is:
> 
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open: unable 
> to open mca_patcher_overwrite: File not found (ignored)
> 
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open: unable 
> to open mca_shmem_mmap: File not found (ignored)
> 
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open: unable 
> to open mca_shmem_posix: File not found (ignored)
> 
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open: unable 
> to open mca_shmem_sysv: File not found (ignored)
> 
> --
> 
> It looks like opal_init failed for some reason; your parallel process is
> 
> likely to abort.  There are many reasons that a parallel process can
> 
> fail during opal_init; some of which are due to configuration or
> 
> environment problems.  This failure appears to be an internal failure;
> 
> here's some additional information (which may only be relevant to an
> 
> Open MPI developer):
> 
> 
> 
>   opal_shmem_base_select failed
> 
>   --> Returned value -1 instead of OPAL_SUCCESS
> 
> --
> 
> Without the --disable-shared in the configuration, then I get:
> 
> 
> 
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad parameter 
> in file ../../orte/orted/pmix/pmix_server.c at line 264
> 
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad parameter 
> in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line 666
> 
> --
> 
> It looks like orte_init failed for some reason; your parallel process is
> 
> likely to abort.  There are many reasons that a parallel process can
> 
> fail during orte_init; some of which are due to configuration or
> 
> environment problems.  This failure appears to be an internal failure;
> 
> here's some additional information (which may only be relevant to an
> 
> Open MPI developer):
> 
> 
> 
>   pmix server init failed
> 
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
> 
> 
> --
> 
> 
> 
> 
> 
> 
> 
> Has anyone seen this? What am I missing?
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI not running any job on Mac OS X 10.12

2017-02-02 Thread Howard Pritchard
Hi Michel

It's somewhat unusual to use the disable-shared  configure option.  That
may be causing this.  Could you try to build without using this option and
see if you still see the problem?


Thanks,

Howard

Michel Lesoinne  schrieb am Mi. 1. Feb. 2017 um
21:07:

> I have compiled OpenMPI 2.0.2 on a new Macbook running OS X 10.12 and have
> been trying to run simple program.
> I configured openmpi with
> ../configure --disable-shared --prefix ~/.local
> make all install
>
> Then I have  a simple code only containing a call to MPI_Init.
> I compile it with
> mpirun -np 2 ./mpitest
>
> The output is:
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_patcher_overwrite: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_mmap: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_posix: File not found (ignored)
>
> [Michels-MacBook-Pro.local:45101] mca_base_component_repository_open:
> unable to open mca_shmem_sysv: File not found (ignored)
>
> --
>
> It looks like opal_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during opal_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>   opal_shmem_base_select failed
>
>   --> Returned value -1 instead of OPAL_SUCCESS
>
> --
>
> Without the --disable-shared in the configuration, then I get:
>
>
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../orte/orted/pmix/pmix_server.c at line 264
>
> [Michels-MacBook-Pro.local:68818] [[53415,0],0] ORTE_ERROR_LOG: Bad
> parameter in file ../../../../../orte/mca/ess/hnp/ess_hnp_module.c at line
> 666
>
> --
>
> It looks like orte_init failed for some reason; your parallel process is
>
> likely to abort.  There are many reasons that a parallel process can
>
> fail during orte_init; some of which are due to configuration or
>
> environment problems.  This failure appears to be an internal failure;
>
> here's some additional information (which may only be relevant to an
>
> Open MPI developer):
>
>
>   pmix server init failed
>
>   --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
>
> --
>
>
>
>
> Has anyone seen this? What am I missing?
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] Performance Issues on SMP Workstation

2017-02-02 Thread Gilles Gouaillardet
i cannot remember what is the default binding (if any) on Open MPI 1.6
nor whether the default is the same with or without PBS

you can simply
mpirun --tag-output grep Cpus_allowed_list /proc/self/status
and see if you note any discrepancy between your systems

you might also consider upgrading to the latest Open MPI 2.0.2, and see how
things gi

Cheers,

Gilles

On Thursday, February 2, 2017,  wrote:

> Hello Andy,
>
> You can also use the --report-bindings option of mpirun to check which
> cores
> your program will use and to which cores the processes are bound to.
>
> Are you using the same backend compiler on both systems?
>
> Do you have performance tools available on the systems where you can see in
> which part of the Program the time is lost? Common tools would be Score-P/
> Vampir/CUBE, TAU, extrae/Paraver.
>
> Best
> Christoph
>
> On Wednesday, 1 February 2017 21:09:28 CET Andy Witzig wrote:
> > Thank you, Bennet.  From my testing, I?ve seen that the application
> usually
> > performs better at much smaller ranks on the workstation.  I?ve tested on
> > the cluster and do not see the same response (i.e. see better performance
> > with ranks of -np 15 or 20).   The workstation is not shared and is not
> > doing any other work.  I ran the application on the Workstation with top
> > and confirmed that 20 procs were fully loaded.
> >
> > I?ll look into the diagnostics you mentioned and get back with you.
> >
> > Best regards,
> > Andy
> >
> > On Feb 1, 2017, at 6:15 PM, Bennet Fauber  > wrote:
> >
> > How do they compare if you run a much smaller number of ranks, say -np 2
> or
> > 4?
> >
> > Is the workstation shared and doing any other work?
> >
> > You could insert some diagnostics into your script, for example,
> > uptime and free, both before and after running your MPI program and
> > compare.
> >
> > You could also run top in batch mode in the background for your own
> > username, then run your MPI program, and compare the results from top.
> > We've seen instances where the MPI ranks only get distributed to a
> > small number of processors, which you see if they all have small
> > percentages of CPU.
> >
> > Just flailing in the dark...
> >
> > -- bennet
> >
> > On Wed, Feb 1, 2017 at 6:36 PM, Andy Witzig  > wrote:
> > > Thank for the idea.  I did the test and only get a single host.
> > >
> > > Thanks,
> > > Andy
> > >
> > > On Feb 1, 2017, at 5:04 PM, r...@open-mpi.org  wrote:
> > >
> > > Simple test: replace your executable with ?hostname?. If you see
> multiple
> > > hosts come out on your cluster, then you know why the performance is
> > > different.
> > >
> > > On Feb 1, 2017, at 2:46 PM, Andy Witzig  > wrote:
> > >
> > > Honestly, I?m not exactly sure what scheme is being used.  I am using
> the
> > > default template from Penguin Computing for job submission.  It looks
> > > like:
> > >
> > > #PBS -S /bin/bash
> > > #PBS -q T30
> > > #PBS -l walltime=24:00:00,nodes=1:ppn=20
> > > #PBS -j oe
> > > #PBS -N test
> > > #PBS -r n
> > >
> > > mpirun $EXECUTABLE $INPUT_FILE
> > >
> > > I?m not configuring OpenMPI anywhere else. It is possible the Penguin
> > > Computing folks have pre-configured my MPI environment.  I?ll see what
> I
> > > can find.
> > >
> > > Best regards,
> > > Andy
> > >
> > > On Feb 1, 2017, at 4:32 PM, Douglas L Reeder  > wrote:
> > >
> > > Andy,
> > >
> > > What allocation scheme are you using on the cluster. For some codes we
> see
> > > noticeable differences using fillup vs round robin, not 4x though.
> Fillup
> > > is more shared memory use while round robin uses more infinniband.
> > >
> > > Doug
> > >
> > > On Feb 1, 2017, at 3:25 PM, Andy Witzig  > wrote:
> > >
> > > Hi Tom,
> > >
> > > The cluster uses an Infiniband interconnect.  On the cluster I?m
> > > requesting: #PBS -l walltime=24:00:00,nodes=1:ppn=20.  So technically,
> > > the run on the cluster should be SMP on the node, since there are 20
> > > cores/node.  On the workstation I?m just using the command: mpirun -np
> 20
> > > ?. I haven?t finished setting Torque/PBS up yet.
> > >
> > > Best regards,
> > > Andy
> > >
> > > On Feb 1, 2017, at 4:10 PM, Elken, Tom  > wrote:
> > >
> > > For this case:  " a cluster system with 2.6GHz Intel Haswell with 20
> cores
> > > / node and 128GB RAM/node.  "
> > >
> > > are you running 5 ranks per node on 4 nodes?
> > > What interconnect are you using for the cluster?
> > >
> > > -Tom
> > >
> > > -Original Message-
> > > From: users [mailto:users-boun...@lists.open-mpi.org ]
> On Behalf Of Andrew
> > > Witzig
> > > Sent: Wednesday, February 01, 2017 1:37 PM
> > > To: Open MPI Users
> > > Subject: Re: [OMPI users] Performance Issues on SMP Workstation
> > >
> > > By the way, the workstation has a total of 

Re: [OMPI users] Performance Issues on SMP Workstation

2017-02-02 Thread niethammer
Hello Andy,

You can also use the --report-bindings option of mpirun to check which cores 
your program will use and to which cores the processes are bound to.

Are you using the same backend compiler on both systems?

Do you have performance tools available on the systems where you can see in 
which part of the Program the time is lost? Common tools would be Score-P/
Vampir/CUBE, TAU, extrae/Paraver.

Best
Christoph

On Wednesday, 1 February 2017 21:09:28 CET Andy Witzig wrote:
> Thank you, Bennet.  From my testing, I?ve seen that the application usually
> performs better at much smaller ranks on the workstation.  I?ve tested on
> the cluster and do not see the same response (i.e. see better performance
> with ranks of -np 15 or 20).   The workstation is not shared and is not
> doing any other work.  I ran the application on the Workstation with top
> and confirmed that 20 procs were fully loaded.
> 
> I?ll look into the diagnostics you mentioned and get back with you.
> 
> Best regards,
> Andy
> 
> On Feb 1, 2017, at 6:15 PM, Bennet Fauber  wrote:
> 
> How do they compare if you run a much smaller number of ranks, say -np 2 or
> 4?
> 
> Is the workstation shared and doing any other work?
> 
> You could insert some diagnostics into your script, for example,
> uptime and free, both before and after running your MPI program and
> compare.
> 
> You could also run top in batch mode in the background for your own
> username, then run your MPI program, and compare the results from top.
> We've seen instances where the MPI ranks only get distributed to a
> small number of processors, which you see if they all have small
> percentages of CPU.
> 
> Just flailing in the dark...
> 
> -- bennet
> 
> On Wed, Feb 1, 2017 at 6:36 PM, Andy Witzig  wrote:
> > Thank for the idea.  I did the test and only get a single host.
> > 
> > Thanks,
> > Andy
> > 
> > On Feb 1, 2017, at 5:04 PM, r...@open-mpi.org wrote:
> > 
> > Simple test: replace your executable with ?hostname?. If you see multiple
> > hosts come out on your cluster, then you know why the performance is
> > different.
> > 
> > On Feb 1, 2017, at 2:46 PM, Andy Witzig  wrote:
> > 
> > Honestly, I?m not exactly sure what scheme is being used.  I am using the
> > default template from Penguin Computing for job submission.  It looks
> > like:
> > 
> > #PBS -S /bin/bash
> > #PBS -q T30
> > #PBS -l walltime=24:00:00,nodes=1:ppn=20
> > #PBS -j oe
> > #PBS -N test
> > #PBS -r n
> > 
> > mpirun $EXECUTABLE $INPUT_FILE
> > 
> > I?m not configuring OpenMPI anywhere else. It is possible the Penguin
> > Computing folks have pre-configured my MPI environment.  I?ll see what I
> > can find.
> > 
> > Best regards,
> > Andy
> > 
> > On Feb 1, 2017, at 4:32 PM, Douglas L Reeder  wrote:
> > 
> > Andy,
> > 
> > What allocation scheme are you using on the cluster. For some codes we see
> > noticeable differences using fillup vs round robin, not 4x though. Fillup
> > is more shared memory use while round robin uses more infinniband.
> > 
> > Doug
> > 
> > On Feb 1, 2017, at 3:25 PM, Andy Witzig  wrote:
> > 
> > Hi Tom,
> > 
> > The cluster uses an Infiniband interconnect.  On the cluster I?m
> > requesting: #PBS -l walltime=24:00:00,nodes=1:ppn=20.  So technically,
> > the run on the cluster should be SMP on the node, since there are 20
> > cores/node.  On the workstation I?m just using the command: mpirun -np 20
> > ?. I haven?t finished setting Torque/PBS up yet.
> > 
> > Best regards,
> > Andy
> > 
> > On Feb 1, 2017, at 4:10 PM, Elken, Tom  wrote:
> > 
> > For this case:  " a cluster system with 2.6GHz Intel Haswell with 20 cores
> > / node and 128GB RAM/node.  "
> > 
> > are you running 5 ranks per node on 4 nodes?
> > What interconnect are you using for the cluster?
> > 
> > -Tom
> > 
> > -Original Message-
> > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Andrew
> > Witzig
> > Sent: Wednesday, February 01, 2017 1:37 PM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] Performance Issues on SMP Workstation
> > 
> > By the way, the workstation has a total of 36 cores / 72 threads, so using
> > mpirun
> > -np 20 is possible (and should be equivalent) on both platforms.
> > 
> > Thanks,
> > cap79
> > 
> > On Feb 1, 2017, at 2:52 PM, Andy Witzig  wrote:
> > 
> > Hi all,
> > 
> > I?m testing my application on a SMP workstation (dual Intel Xeon E5-2697
> > V4
> > 
> > 2.3 GHz Intel Broadwell (boost 2.8-3.1GHz) processors 128GB RAM) and am
> > seeing a 4x performance drop compared to a cluster system with 2.6GHz
> > Intel
> > Haswell with 20 cores / node and 128GB RAM/node.  Both applications have
> > been compiled using OpenMPI 1.6.4.  I have tried running:
> > 
> > 
> > mpirun -np 20 $EXECUTABLE $INPUT_FILE
> > mpirun -np 20 --mca btl self,sm $EXECUTABLE $INPUT_FILE
> > 
> > and others, but cannot achieve the same