Re: Re: Open MPI problem After F17 -> F18 Upgrade

2013-07-21 Thread Kevin H. Hobbs
On Sun Jul 21 11:47:51 UTC 2013 Susi Lehtola wrote :
> Are you sure you have the correct MPI runtime loaded? All sorts of
> strange things can happen if you try running an MPI binary with the
> wrong runtime libraries, e.g. mpich2 binary with openmpi.

Yes, I had only one MPI installed at the time, and I did ldd
every which way.

There are more details in bugzilla :

  https://bugzilla.redhat.com/show_bug.cgi?id=986409

and in the openmpi users list :

  http://www.open-mpi.org/community/lists/users/2013/07/22346.php

but basically on my home workstation openmpi does not get along
with the hwloc to which it is linked in Fedora 18.



signature.asc
Description: OpenPGP digital signature
-- 
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org


Re: Open MPI problem After F17 -> F18 Upgrade

2013-07-21 Thread Susi Lehtola
On Thu, 18 Jul 2013 16:53:47 -0400
"Kevin H. Hobbs"  wrote:

> After I updated my home computer from Fedora 17 to 18 MPI programs
> stopped working.

(clip)
 
> I'm actually able to run the generated executable on another computer
> just fine.
> 
> I tried using gdb to step through util/nidmap.c from 148 but whatever
> goes wrong is too far away for me to find it.
> 
> Does anybody have any clue where I should look?
 
Are you sure you have the correct MPI runtime loaded? All sorts of
strange things can happen if you try running an MPI binary with the
wrong runtime libraries, e.g. mpich2 binary with openmpi.
-- 
Susi Lehtola
Fedora Project Contributor
jussileht...@fedoraproject.org
-- 
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org


Open MPI problem After F17 -> F18 Upgrade

2013-07-18 Thread Kevin H. Hobbs
After I updated my home computer from Fedora 17 to 18 MPI programs
stopped working.

I have a very simple program :

#include 
#include 
#include 

int main( int argc, char * argv[] )
{

  int rank, size;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  printf("my rank is %i of %i\n", rank, size );

  MPI_Finalize();

  return EXIT_SUCCESS;
}

I can compile it with :

  mpicc -g -o mpi_simple mpi_simple.c

but when I run it with :

  mpirun -n 1 ./mpi_simple

I get :

[murron.hobbs-hancock:05668] [[55801,1],0] ORTE_ERROR_LOG: Error in file
util/nidmap.c at line 148
[murron.hobbs-hancock:05668] [[55801,1],0] ORTE_ERROR_LOG: Error in file
ess_env_module.c at line 174
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_util_nidmap_init failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_ess_set_name failed
  --> Returned value Error (-1) instead of ORTE_SUCCESS
--
--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init failed
  --> Returned "Error" (-1) instead of "Success" (0)
--
[murron.hobbs-hancock:05668] [[55801,1],0] ORTE_ERROR_LOG: Error in file
runtime/orte_init.c at line 128
[murron.hobbs-hancock:5668] *** An error occurred in MPI_Init
[murron.hobbs-hancock:5668] *** on a NULL communicator
[murron.hobbs-hancock:5668] *** Unknown error
[murron.hobbs-hancock:5668] *** MPI_ERRORS_ARE_FATAL: your MPI job will
now abort
--
An MPI process is aborting at a time when it cannot guarantee that all
of its peer processes in the job will be killed properly.  You should
double check that everything has shut down cleanly.

  Reason: Before MPI_INIT completed
  Local host: murron.hobbs-hancock
  PID:5668
--
--
mpirun has exited due to process rank 0 with PID 5668 on
node murron.hobbs-hancock exiting improperly. There are two reasons this
could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--

I'm actually able to run the generated executable on another computer
just fine.

I tried using gdb to step through util/nidmap.c from 148 but whatever
goes wrong is too far away for me to find it.

Does anybody have any clue where I should look?



signature.asc
Description: OpenPGP digital signature
-- 
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org