Re: [OMPI users] openmpi-1.7.4a1r29646 with -hostfile option under Torque manager

2013-11-24 Thread Ralph Castain
Fixed and scheduled to move to 1.7.4. Thanks again!


On Nov 17, 2013, at 6:11 PM, Ralph Castain  wrote:

> Thanks! That's precisely where I was going to look when I had time :-)
> 
> I'll update tomorrow.
> Ralph
> 
> 
> 
> 
> On Sun, Nov 17, 2013 at 7:01 PM,  wrote:
> 
> 
> Hi Ralph,
> 
> This is the continuous story of "Segmentation fault in oob_tcp.c of
> openmpi-1.7.4a1r29646".
> 
> I found the cause.
> 
> Firstly, I noticed that your hostfile can work and mine can not.
> 
> Your host file:
> cat hosts
> bend001 slots=12
> 
> My host file:
> cat hosts
> node08
> node08
> ...(total 8 lines)
> 
> I modified my script file to add "slots=1" to each line of my hostfile
> just before launching mpirun. Then it worked.
> 
> My host file(modified):
> cat hosts
> node08 slots=1
> node08 slots=1
> ...(total 8 lines)
> 
> Secondary, I confirmed that there's a slight difference between
> orte/util/hostfile/hostfile.c of 1.7.3 and that of 1.7.4a1r29646.
> 
> $ diff
> hostfile.c.org ../../../../openmpi-1.7.3/orte/util/hostfile/hostfile.c
> 394,401c394,399
> < if (got_count) {
> < node->slots_given = true;
> < } else if (got_max) {
> < node->slots = node->slots_max;
> < node->slots_given = true;
> < } else {
> < /* should be set by obj_new, but just to be clear */
> < node->slots_given = false;
> ---
> > if (!got_count) {
> > if (got_max) {
> > node->slots = node->slots_max;
> > } else {
> > ++node->slots;
> > }
> 
> 
> Finally, I added the line 402 below just as a tentative trial.
> Then, it worked.
> 
> cat -n orte/util/hostfile/hostfile.c:
>...
>394  if (got_count) {
>395  node->slots_given = true;
>396  } else if (got_max) {
>397  node->slots = node->slots_max;
>398  node->slots_given = true;
>399  } else {
>400  /* should be set by obj_new, but just to be clear */
>401  node->slots_given = false;
>402  ++node->slots; /* added by tmishima */
>403  }
>...
> 
> Please fix the problem properly, because it's just based on my
> random guess. It's related to the treatment of hostfile where slots
> information is not given.
> 
> Regards,
> Tetsuya Mishima
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Dominique Orban
Pierre,

Thank you for pointing out the erroneous flags. I am indeed compiling from 
Homebrew. After using the flags mentioned in the link you give, this is the 
output of Ralph's test program:

$ mpirun -n 2 ./testmpi2
Calling MPI_Init_thread...
Calling MPI_Init_thread...
MPI_Init_thread returned, provided = 3
MPI_Init_thread returned, provided = 3
[warn] select: Bad file descriptor
[warn] select: Bad file descriptor

It doesn't hang anymore but I'm not sure what to make of the warnings. Some 
runs don't trigger the warnings. Please pardon my MPI ignorance.

My question originates from a hang similar to the one I described in my first 
message in the PETSc tests. They still hang after I corrected the OpenMPI 
compile flags. I'm in touch with the PETSc folks as well about this.

Dominique


On 2013-11-23, at 9:22 PM, Pierre Jolivet  wrote:

> Dominique,
> It looks like you are compiling Open MPI with Homebrew. The flags they use in 
> the formula when --enable-mpi-thread-multiple is wrong.
> c.f. a similar problem with MacPorts 
> https://lists.macosforge.org/pipermail/macports-tickets/2013-June/138145.html.
> 
> Pierre
> 
> On Nov 23, 2013, at 4:56 PM, Ralph Castain  wrote:
> 
>> Hmmm...well, it seems to work for me:
>> 
>> $ mpirun -n 4 ./thread_init
>> Calling MPI_Init_thread...
>> Calling MPI_Init_thread...
>> Calling MPI_Init_thread...
>> Calling MPI_Init_thread...
>> MPI_Init_thread returned, provided = 3
>> MPI_Init_thread returned, provided = 3
>> MPI_Init_thread returned, provided = 3
>> MPI_Init_thread returned, provided = 3
>> $
>> 
>> This is with the current 1.7 code branch, so it's possible something has 
>> been updated. You might try it with the next nightly tarball and see if it 
>> helps.
>> 
>> BTW: The correct configure option is --enable-mpi-thread-multiple
>> 
>> My test program:
>> 
>> #include 
>> #include 
>> int main(int argc, const char* argv[]) {
>>  int provided = -1;
>>  printf("Calling MPI_Init_thread...\n");
>>  MPI_Init_thread(NULL, NULL, MPI_THREAD_MULTIPLE, &provided);
>>  printf("MPI_Init_thread returned, provided = %d\n", provided);
>>  MPI_Finalize();
>>  return 0;
>> }
>> 
>> 
>> On Nov 21, 2013, at 1:36 PM, Dominique Orban  
>> wrote:
>> 
>>> Hi,
>>> 
>>> I'm compiling the example code at the bottom of the following page that 
>>> illustrates MPI_Init_Thread(): 
>>> 
>>> http://mpi.deino.net/mpi_functions/mpi_init_thread.html
>>> 
>>> I have OpenMPI 1.7.3 installed on OSX 10.8.5 with --enable-thread-multiple 
>>> compiled with clang-425.0.28. I can reproduce the following on OSX 10.9 
>>> (clang-500) and another user was able to reproduce it on some flavor of 
>>> Linux:
>>> 
>>> $ mpicc -g -o testmpi testmpi.c -lmpi
>>> $ mpirun -n 2 ./testmpi
>>> $ # hangs forever
>>> 
>>> I've no knowledge of how to debug MPI programs but it was suggested to me 
>>> to do this:
>>> 
>>> $ mpirun -n 2 xterm -e gdb ./testmpi
>>> 
>>> In the first xterm, I say 'run' in gdb, interrupt the program after a while 
>>> and get a backtrace:
>>> 
>>> ^C
>>> Program received signal SIGINT, Interrupt.
>>> 0x7fff99116322 in select$DARWIN_EXTSN ()
>>>from /usr/lib/system/libsystem_kernel.dylib
>>> (gdb) where
>>> #0  0x7fff99116322 in select$DARWIN_EXTSN ()
>>>from /usr/lib/system/libsystem_kernel.dylib
>>> #1  0x0001001963c2 in select_dispatch ()
>>>from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #2  0x00010018f178 in opal_libevent2021_event_base_loop ()
>>>from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #3  0x00010015f059 in opal_progress ()
>>>from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #4  0x000100019321 in ompi_mpi_init () from 
>>> /usr/local/lib/libmpi.1.dylib
>>> #5  0x0001000334da in MPI_Init_thread () from 
>>> /usr/local/lib/libmpi.1.dylib
>>> #6  0x00010ddb in main (argc=1, argv=0x7fff5fbfedc0) at 
>>> testmpi.c:9
>>> (gdb) backtrace
>>> #0  0x7fff99116322 in select$DARWIN_EXTSN ()
>>>from /usr/lib/system/libsystem_kernel.dylib
>>> #1  0x0001001963c2 in select_dispatch ()
>>>from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #2  0x00010018f178 in opal_libevent2021_event_base_loop ()
>>>from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #3  0x00010015f059 in opal_progress ()
>>>from /usr/local/Cellar/open-mpi/1.7.3/lib/libopen-pal.6.dylib
>>> #4  0x000100019321 in ompi_mpi_init () from 
>>> /usr/local/lib/libmpi.1.dylib
>>> #5  0x0001000334da in MPI_Init_thread () from 
>>> /usr/local/lib/libmpi.1.dylib
>>> #6  0x00010ddb in main (argc=1, argv=0x7fff5fbfedc0) at 
>>> testmpi.c:9
>>> (gdb)
>>> 
>>> In the second xterm window:
>>> 
>>> ^C
>>> Program received signal SIGINT, Interrupt.
>>> 0x0001002e9a28 in mca_common_sm_init ()
>>>from /usr

Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Jed Brown
Pierre Jolivet  writes:
> It looks like you are compiling Open MPI with Homebrew. The flags they use in 
> the formula when --enable-mpi-thread-multiple is wrong.
> c.f. a similar problem with MacPorts 
> https://lists.macosforge.org/pipermail/macports-tickets/2013-June/138145.html.

If these "wrong" configure flags cause deadlock, wouldn't you consider
it to be an Open MPI bug?  In decreasing order of preference, I would
say

1. simple configure flags work to enable feature

2. configure errors due to inconsistent flags

3. configure succeeds, but feature is not actually enabled (so no
   deadlock, though this is arguably already a bug)


pgptnSE6yMaWp.pgp
Description: PGP signature


Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Jed Brown
Dominique Orban  writes:
> My question originates from a hang similar to the one I described in
> my first message in the PETSc tests. They still hang after I corrected
> the OpenMPI compile flags. I'm in touch with the PETSc folks as well
> about this.

Do you have an updated stack trace?


pgpg268u_56n9.pgp
Description: PGP signature


Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Ralph Castain
Given that we have no idea what Homebrew uses, I don't know how we could 
clarify/respond.


On Nov 24, 2013, at 12:43 PM, Jed Brown  wrote:

> Pierre Jolivet  writes:
>> It looks like you are compiling Open MPI with Homebrew. The flags they use 
>> in the formula when --enable-mpi-thread-multiple is wrong.
>> c.f. a similar problem with MacPorts 
>> https://lists.macosforge.org/pipermail/macports-tickets/2013-June/138145.html.
> 
> If these "wrong" configure flags cause deadlock, wouldn't you consider
> it to be an Open MPI bug?  In decreasing order of preference, I would
> say
> 
> 1. simple configure flags work to enable feature
> 
> 2. configure errors due to inconsistent flags
> 
> 3. configure succeeds, but feature is not actually enabled (so no
>   deadlock, though this is arguably already a bug)
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] MPI process hangs if OpenMPI is compiled with --enable-thread-multiple

2013-11-24 Thread Jed Brown
Ralph Castain  writes:

> Given that we have no idea what Homebrew uses, I don't know how we
> could clarify/respond.

Pierre provided a link to MacPorts saying that all of the following
options were needed to properly enable threads.

 --enable-event-thread-support --enable-opal-multi-threads 
--enable-orte-progress-threads --enable-mpi-thread-multiple

If that is indeed the case, and if passing some subset of these options
results in deadlock, it's not exactly user-friendly.

Maybe --enable-mpi-thread-multiple is enough, in which case MacPorts is
doing something needlessly complicated and Pierre's link was a red
herring?


pgp2T2LDecJvY.pgp
Description: PGP signature