Re: [OMPI devel] orte-restart and PATH

2014-03-12 Thread Ralph Castain
That's what the --enable-orterun-prefix-by-default configure option is for


On Mar 12, 2014, at 9:28 AM, Adrian Reber  wrote:

> I am using orte-restart without setting my PATH to my Open MPI
> installation. I am running /full/path/to/orte-restart and orte-restart
> tries to run mpirun to restart the process. This fails on my system
> because I do not have any mpirun in my PATH. Is it expected for an Open
> MPI installation to set up the PATH variable or should it work using the
> absolute path to the binaries?
> 
> Should I just set my PATH correctly and be done with it or should
> orte-restart figure out the full path to its accompanying mpirun and
> start mpirun with the full path?
> 
>   Adrian
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/03/14339.php



[OMPI devel] orte-restart and PATH

2014-03-12 Thread Adrian Reber
I am using orte-restart without setting my PATH to my Open MPI
installation. I am running /full/path/to/orte-restart and orte-restart
tries to run mpirun to restart the process. This fails on my system
because I do not have any mpirun in my PATH. Is it expected for an Open
MPI installation to set up the PATH variable or should it work using the
absolute path to the binaries?

Should I just set my PATH correctly and be done with it or should
orte-restart figure out the full path to its accompanying mpirun and
start mpirun with the full path?

Adrian


Re: [OMPI devel] Loading Open MPI from MPJ Express (Java) fails

2014-03-12 Thread Ralph Castain
Here's what I had to do to load the library correctly (we were only using ORTE, 
so substitute "libmpi") - this was called at the beginning of "init":

/* first, load the required ORTE library */
#if OPAL_WANT_LIBLTDL
lt_dladvise advise;

if (lt_dlinit() != 0) {
fprintf(stderr, "LT_DLINIT FAILED - CANNOT LOAD LIBMRPLUS\n");
return JNI_FALSE;
}

#if OPAL_HAVE_LTDL_ADVISE
/* open the library into the global namespace */
if (lt_dladvise_init(&advise)) {
fprintf(stderr, "LT_DLADVISE INIT FAILED - CANNOT LOAD LIBMRPLUS\n");
return JNI_FALSE;
}

if (lt_dladvise_ext(&advise)) {
fprintf(stderr, "LT_DLADVISE EXT FAILED - CANNOT LOAD LIBMRPLUS\n");
lt_dladvise_destroy(&advise);
return JNI_FALSE;
}

if (lt_dladvise_global(&advise)) {
fprintf(stderr, "LT_DLADVISE GLOBAL FAILED - CANNOT LOAD LIBMRPLUS\n");
lt_dladvise_destroy(&advise);
return JNI_FALSE;
}

/* we don't care about the return value
 * on dlopen - it might return an error
 * because the lib is already loaded,
 * depending on the way we were built
 */
lt_dlopenadvise("libopen-rte", advise);
lt_dladvise_destroy(&advise);
#else
fprintf(stderr, "NO LT_DLADVISE - CANNOT LOAD LIBMRPLUS\n");
/* need to balance the ltdl inits */
lt_dlexit();
/* if we don't have advise, then we are hosed */
return JNI_FALSE;
#endif
#endif
/* if dlopen was disabled, then all symbols
 * should have been pulled up into the libraries,
 * so we don't need to do anything as the symbols
 * are already available.
 */

On Mar 12, 2014, at 6:32 AM, Jeff Squyres (jsquyres)  wrote:

> Check out how we did this with the embedded java bindings in Open MPI; see 
> the comment describing exactly this issue starting here:
> 
>
> https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/java/c/mpi_MPI.c#L79
> 
> Feel free to compare MPJ to the OMPI java bindings -- they're shipping in 
> 1.7.4 and have a bunch of improvements in the soon-to-be-released 1.7.5, but 
> you must enable them since they aren't enabled by default:
> 
>./configure --enable-mpi-java ...
> 
> FWIW, we found a few places in the Java bindings where it was necessary for 
> the bindings to have some insight into the internals of the MPI 
> implementation.  Did you find the same thing with MPJ Express?
> 
> Are your bindings similar in style/signature to ours?
> 
> 
> 
> On Mar 12, 2014, at 6:40 AM, Bibrak Qamar  wrote:
> 
>> Hi all,
>> 
>> I am writing a new device for MPJ Express that uses a native MPI library for 
>> communication. Its based on JNI wrappers like the original mpiJava. The 
>> device works fine with MPICH3  (and MVAPICH2.2). Here is the issue with 
>> loading Open MPI 1.7.4 from MPJ Express.
>> 
>> I have generated the following error message from a simple JNI to MPI 
>> application for clarity purposes and also to regenerate the error easily. I 
>> have attached the app for your consideration.
>> 
>> 
>> [bibrak@localhost JNI_to_MPI]$ mpirun -np 2 java -cp . 
>> -Djava.library.path=/home/bibrak/work/JNI_to_MPI/ simpleJNI_MPI
>> [localhost.localdomain:29086] mca: base: component_find: unable to open 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap: 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so: 
>> undefined symbol: opal_show_help (ignored)
>> [localhost.localdomain:29085] mca: base: component_find: unable to open 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap: 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so: 
>> undefined symbol: opal_show_help (ignored)
>> [localhost.localdomain:29085] mca: base: component_find: unable to open 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix: 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so: 
>> undefined symbol: opal_shmem_base_framework (ignored)
>> [localhost.localdomain:29086] mca: base: component_find: unable to open 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix: 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so: 
>> undefined symbol: opal_shmem_base_framework (ignored)
>> [localhost.localdomain:29086] mca: base: component_find: unable to open 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv: 
>> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so: 
>> undefined symbol: opal_show_help (ignored)
>> --
>> It looks like opal_init failed for some reason; your parallel process is
>> likely to abort.  There are many reasons that a parallel process can
>> fail during opal_init; some of which are due to configuration or
>> environment problems.  This failure appears to be an internal failure;
>> here's some additional information

Re: [OMPI devel] Loading Open MPI from MPJ Express (Java) fails

2014-03-12 Thread Jeff Squyres (jsquyres)
Check out how we did this with the embedded java bindings in Open MPI; see the 
comment describing exactly this issue starting here:


https://svn.open-mpi.org/trac/ompi/browser/trunk/ompi/mpi/java/c/mpi_MPI.c#L79

Feel free to compare MPJ to the OMPI java bindings -- they're shipping in 1.7.4 
and have a bunch of improvements in the soon-to-be-released 1.7.5, but you must 
enable them since they aren't enabled by default:

./configure --enable-mpi-java ...

FWIW, we found a few places in the Java bindings where it was necessary for the 
bindings to have some insight into the internals of the MPI implementation.  
Did you find the same thing with MPJ Express?

Are your bindings similar in style/signature to ours?



On Mar 12, 2014, at 6:40 AM, Bibrak Qamar  wrote:

> Hi all,
> 
> I am writing a new device for MPJ Express that uses a native MPI library for 
> communication. Its based on JNI wrappers like the original mpiJava. The 
> device works fine with MPICH3  (and MVAPICH2.2). Here is the issue with 
> loading Open MPI 1.7.4 from MPJ Express.
> 
> I have generated the following error message from a simple JNI to MPI 
> application for clarity purposes and also to regenerate the error easily. I 
> have attached the app for your consideration.
> 
> 
> [bibrak@localhost JNI_to_MPI]$ mpirun -np 2 java -cp . 
> -Djava.library.path=/home/bibrak/work/JNI_to_MPI/ simpleJNI_MPI
> [localhost.localdomain:29086] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so: 
> undefined symbol: opal_show_help (ignored)
> [localhost.localdomain:29085] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so: 
> undefined symbol: opal_show_help (ignored)
> [localhost.localdomain:29085] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so: 
> undefined symbol: opal_shmem_base_framework (ignored)
> [localhost.localdomain:29086] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so: 
> undefined symbol: opal_shmem_base_framework (ignored)
> [localhost.localdomain:29086] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so: 
> undefined symbol: opal_show_help (ignored)
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --
> [localhost.localdomain:29085] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so: 
> undefined symbol: opal_show_help (ignored)
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   opal_init failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL 

Re: [OMPI devel] Loading Open MPI from MPJ Express (Java) fails

2014-03-12 Thread Ralph Castain
If you are going to use OMPI via JNI, then you have to load the OMPI library 
from within your code. This is a little tricky from Java as OMPI by default 
builds as a set of dynamic libraries, and each component is a dynamic library 
as well. The solution is to either build OMPI static, or to use lt_dladvise and 
friends to ensure the load paths are followed.


On Mar 12, 2014, at 3:40 AM, Bibrak Qamar  wrote:

> Hi all,
> 
> I am writing a new device for MPJ Express that uses a native MPI library for 
> communication. Its based on JNI wrappers like the original mpiJava. The 
> device works fine with MPICH3  (and MVAPICH2.2). Here is the issue with 
> loading Open MPI 1.7.4 from MPJ Express.
> 
> I have generated the following error message from a simple JNI to MPI 
> application for clarity purposes and also to regenerate the error easily. I 
> have attached the app for your consideration.
> 
> 
> [bibrak@localhost JNI_to_MPI]$ mpirun -np 2 java -cp . 
> -Djava.library.path=/home/bibrak/work/JNI_to_MPI/ simpleJNI_MPI
> [localhost.localdomain:29086] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so: 
> undefined symbol: opal_show_help (ignored)
> [localhost.localdomain:29085] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so: 
> undefined symbol: opal_show_help (ignored)
> [localhost.localdomain:29085] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so: 
> undefined symbol: opal_shmem_base_framework (ignored)
> [localhost.localdomain:29086] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so: 
> undefined symbol: opal_shmem_base_framework (ignored)
> [localhost.localdomain:29086] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so: 
> undefined symbol: opal_show_help (ignored)
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   opal_shmem_base_select failed
>   --> Returned value -1 instead of OPAL_SUCCESS
> --
> [localhost.localdomain:29085] mca: base: component_find: unable to open 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv: 
> /home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so: 
> undefined symbol: opal_show_help (ignored)
> --
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> 
>   opal_init failed
>   --> Returned value Error (-1) instead of ORTE_SUCCESS
> --
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   ompi_mpi_init: ompi_rte_init failed
>   --> Returned "Error" (-1) instead of "Success" (0)
> --
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [localhost.localdomain:29086] Local abort before MPI_INIT completed 
> successfully; not able to aggregate error messages, and not able to guarantee 
> that all other processes were killed!
> -

[OMPI devel] Loading Open MPI from MPJ Express (Java) fails

2014-03-12 Thread Bibrak Qamar
Hi all,

I am writing a new device for MPJ Express that uses a native MPI library
for communication. Its based on JNI wrappers like the original mpiJava. The
device works fine with MPICH3  (and MVAPICH2.2). Here is the issue with
loading Open MPI 1.7.4 from MPJ Express.

I have generated the following error message from a simple JNI to MPI
application for clarity purposes and also to regenerate the error easily. I
have attached the app for your consideration.


[bibrak@localhost JNI_to_MPI]$ *mpirun -np 2 java -cp .
-Djava.library.path=/home/**bibrak/work/JNI_to_MPI/ simpleJNI_MPI*
[localhost.localdomain:29086] mca: base: component_find: unable to open
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap:
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so:
undefined symbol: opal_show_help (ignored)
[localhost.localdomain:29085] mca: base: component_find: unable to open
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap:
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_mmap.so:
undefined symbol: opal_show_help (ignored)
[localhost.localdomain:29085] mca: base: component_find: unable to open
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix:
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so:
undefined symbol: opal_shmem_base_framework (ignored)
[localhost.localdomain:29086] mca: base: component_find: unable to open
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix:
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_posix.so:
undefined symbol: opal_shmem_base_framework (ignored)
[localhost.localdomain:29086] mca: base: component_find: unable to open
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv:
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so:
undefined symbol: opal_show_help (ignored)
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
[localhost.localdomain:29085] mca: base: component_find: unable to open
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv:
/home/bibrak/work/installs/OpenMPI_installed/lib/openmpi/mca_shmem_sysv.so:
undefined symbol: opal_show_help (ignored)
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: ompi_rte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[localhost.localdomain:29086] Local abort before MPI_INIT completed
successfully; not able to aggregate error messages, and not able to
guarantee that all other processes were killed!
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--
-

Re: [OMPI devel] [Score-P support] Compile errors of Fedora rawhide

2014-03-12 Thread Jeff Squyres (jsquyres)
On Mar 11, 2014, at 11:25 PM, Orion Poplawski  wrote:

>> Did you find any others, perchance?
> 
> Also:
> MPI_Type_hindexed
> MPI_Type_struct
> 
> But these were also deprecated in MPI-2.0, so probably gone in MPI-3.

Correct -- i.e., I confirm you're right: MPI_Type_indexed and MPI_Type_struct 
were deprecated in MPI-2.0 (1994), and finally removed in MPI-3 (2012).

> That's it as far as score-p is concerned.  Note that dropping functions
> has serious ABI/soname implications.  

Keep in mind that MPI-3 deleted these interfaces, but they had already been 
deprecated for over 2012-1994 = 17 years.

MPI-3 also made it clear that MPI implementations can keep providing these 
interfaces, but they must adhere to the prototypes that were published in prior 
versions of the MPI specification (i.e., no const).

At this point, I believe both Open MPI and MPICH will issue deprecated warnings 
if your compiler supports them if you use these functions.  Open MPI doesn't 
yet have any plans for actually removing the functions.  If/when we do remove 
them, we'll do it at the beginning of a new feature series.

> I'll probably have to hack up
> something for scorep to handle these, probably just be removing them.

It would be best to migrate to the new interfaces; it should be quite trivial 
(change the parameter type from "int" to "MPI_Aint").  17 years is enough.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/