[hwloc-devel] Create success (hwloc git dev-65-gbb80f0f)

2014-01-30 Thread MPI Team
Creating nightly hwloc snapshot git tarball was a success.

Snapshot:   hwloc dev-65-gbb80f0f
Start time: Thu Jan 30 21:01:01 EST 2014
End time:   Thu Jan 30 21:03:31 EST 2014

Your friendly daemon,
Cyrador


Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Ralph Castain
Sure! cmr it across to v1.7.4 and we'll add it before we release

Thanks!
Ralph

On Jan 30, 2014, at 11:53 AM, Rolf vandeVaart  wrote:

> I ran mpirun through valgrind and I got some strange complaints about an 
> issue with thread 2.  I hunted around mpirun code and I see that we start a 
> thread, but we never have it finish during shutdown.  Therefore, I added this 
> snippet of code (probably in the wrong place) and I no longer see my 
> intermittent crashes.
> 
> Ralph, what do you think?  Does this seem reasonable?
> 
> Rolf
> 
> [rvandevaart@drossetti-ivy0 ompi-v1.7]$ svn diff
> Index: orte/mca/oob/tcp/oob_tcp_component.c
> ===
> --- orte/mca/oob/tcp/oob_tcp_component.c  (revision 30500)
> +++ orte/mca/oob/tcp/oob_tcp_component.c  (working copy)
> @@ -631,6 +631,10 @@
> opal_output_verbose(2, orte_oob_base_framework.framework_output,
> "%s TCP SHUTDOWN",
> ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
> +if (ORTE_PROC_IS_HNP) {
> +mca_oob_tcp_component.listen_thread_active = 0;
> +opal_thread_join(_oob_tcp_component.listen_thread, NULL);
> +}
> 
> while (NULL != (item = 
> opal_list_remove_first(_oob_tcp_component.listeners))) {
> OBJ_RELEASE(item);
> 
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>> Castain
>> Sent: Thursday, January 30, 2014 12:35 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Intermittent mpirun crash?
>> 
>> That option might explain why your test process is failing (which segfaulted 
>> as
>> well), but obviously wouldn't have anything to do with mpirun
>> 
>> On Jan 30, 2014, at 9:29 AM, Rolf vandeVaart 
>> wrote:
>> 
>>> I just retested with --mca mpi_leave_pinned 0 and that made no difference.
>> I still see the mpirun crash.
>>> 
 -Original Message-
 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
 Bosilca
 Sent: Thursday, January 30, 2014 11:59 AM
 To: Open MPI Developers
 Subject: Re: [OMPI devel] Intermittent mpirun crash?
 
 I got something similar 2 days ago, with a large software package
 abusing of MPI_Waitany/MPI_Waitsome (that was working seamlessly a
 month ago). I had to find a quick fix. Upon figuring out that turning
 the leave_pinned off fixes the problem, I did not investigate any further.
 
 Do you see a similar behavior?
 
 George.
 
 On Jan 30, 2014, at 17:26 , Rolf vandeVaart 
>> wrote:
 
> I am seeing this happening to me very intermittently.  Looks like
> mpirun is
 getting a SEGV.  Is anyone else seeing this?
> This is 1.7.4 built yesterday.  (Note that I added some stuff to
> what is being printed out so the message is slightly different than
> 1.7.4
> output)
> 
> mpirun - -np 6 -host
> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome
> using two nodes
> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two
> nodes all tests PASSED (742) [drossetti-ivy0:10353] *** Process
> (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
> not mapped (1) [drossetti-ivy0:10353] Failing at address:
> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information -
> not sleeping
> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
 tests/trunk/intel_tests'
> 
> (gdb) where
> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from
> /lib64/libgcc_s.so.1
> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
> #3  0x7fd320b0d622 in opal_backtrace_buffer
 (message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
>  at
> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
> #4  0x7fd320b0a794 in show_stackframe (signo=11,
> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
> ../../../opal/util/stacktrace.c:354
> #5  
> #6  0x7fd31e5f208d in ?? ()
> #7  0x7fd31e5e46d8 in ?? ()
> #8  0xc2a8 in ?? ()
> #9  0x in ?? ()
> 
> 
> 
> --
> - This email message is for the sole use of the intended
> recipient(s) and may contain confidential information.  Any
> unauthorized review, use, disclosure or distribution is prohibited.
> If you are not the intended recipient, please contact the 

Re: [OMPI devel] [EXTERNAL] SPARC V8+ question

2014-01-30 Thread Barrett, Brian W
Following up on the mailing list, Paul and I think this is gcc being silly; it 
didn't pass the right architecture flag to the assembler, which barfed at the 
Sparc V9 instruction (compare and swap).  So the test worked as it should and 
we'll figure out the gcc thing as we go.

I've filed a change for v1.7 to fix the warning message.  The reference to 
FFLAGS wasn't the only problem, so it's a slightly more generic error message 
now.

Brian

On 1/29/14 4:10 PM, "Paul Hargrove" 
> wrote:

I know Open MPI dropped support for the SPARC V8 ABI some time ago.
So, I configured with CC="gcc -mv8plus", but I still get:

checking if have Sparc v8+/v9 support... no
configure: WARNING: Sparc v8 target is not supported in this release of Open 
MPI.
configure: WARNING: You must specify the target architecture v8plus
configure: WARNING: (cc: -xarch=v8plus, gcc: -mcpu=v9) for CFLAGS, CXXFLAGS,
configure: WARNING: FFLAGS, and FCFLAGS to compile Open MPI in 32 bit mode on
configure: WARNING: Sparc processors
configure: error: Can not continue.

So, I am wondering if there is something flawed in the "have Sparc v8+/v9 
support" or if gcc's "-mv8plus" is flawed.
Of course, I will follow the advise in the warning and use -mcpu=v9 instead of 
-mv8plus, but don't see why the later didn't work.  Any ideas what is going on?

And since this was found in 1.7.4rc2:
WOULD SOMEBODY PLEASE REMOVE "FFLAGS" FROM THAT MESSAGE!!

-Paul

--
Paul H. Hargrove  
phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories


Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Rolf vandeVaart
I ran mpirun through valgrind and I got some strange complaints about an issue 
with thread 2.  I hunted around mpirun code and I see that we start a thread, 
but we never have it finish during shutdown.  Therefore, I added this snippet 
of code (probably in the wrong place) and I no longer see my intermittent 
crashes.

Ralph, what do you think?  Does this seem reasonable?

Rolf

[rvandevaart@drossetti-ivy0 ompi-v1.7]$ svn diff
Index: orte/mca/oob/tcp/oob_tcp_component.c
===
--- orte/mca/oob/tcp/oob_tcp_component.c(revision 30500)
+++ orte/mca/oob/tcp/oob_tcp_component.c(working copy)
@@ -631,6 +631,10 @@
 opal_output_verbose(2, orte_oob_base_framework.framework_output,
 "%s TCP SHUTDOWN",
 ORTE_NAME_PRINT(ORTE_PROC_MY_NAME));
+if (ORTE_PROC_IS_HNP) {
+mca_oob_tcp_component.listen_thread_active = 0;
+opal_thread_join(_oob_tcp_component.listen_thread, NULL);
+}
 
 while (NULL != (item = 
opal_list_remove_first(_oob_tcp_component.listeners))) {
 OBJ_RELEASE(item);


>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph
>Castain
>Sent: Thursday, January 30, 2014 12:35 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Intermittent mpirun crash?
>
>That option might explain why your test process is failing (which segfaulted as
>well), but obviously wouldn't have anything to do with mpirun
>
>On Jan 30, 2014, at 9:29 AM, Rolf vandeVaart 
>wrote:
>
>> I just retested with --mca mpi_leave_pinned 0 and that made no difference.
>I still see the mpirun crash.
>>
>>> -Original Message-
>>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>>> Bosilca
>>> Sent: Thursday, January 30, 2014 11:59 AM
>>> To: Open MPI Developers
>>> Subject: Re: [OMPI devel] Intermittent mpirun crash?
>>>
>>> I got something similar 2 days ago, with a large software package
>>> abusing of MPI_Waitany/MPI_Waitsome (that was working seamlessly a
>>> month ago). I had to find a quick fix. Upon figuring out that turning
>>> the leave_pinned off fixes the problem, I did not investigate any further.
>>>
>>> Do you see a similar behavior?
>>>
>>> George.
>>>
>>> On Jan 30, 2014, at 17:26 , Rolf vandeVaart 
>wrote:
>>>
 I am seeing this happening to me very intermittently.  Looks like
 mpirun is
>>> getting a SEGV.  Is anyone else seeing this?
 This is 1.7.4 built yesterday.  (Note that I added some stuff to
 what is being printed out so the message is slightly different than
 1.7.4
 output)

 mpirun - -np 6 -host
 drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
 btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
 MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome
 using two nodes
 MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two
 nodes all tests PASSED (742) [drossetti-ivy0:10353] *** Process
 (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
 Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
 not mapped (1) [drossetti-ivy0:10353] Failing at address:
 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information -
 not sleeping
 gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
 gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
>>> tests/trunk/intel_tests'

 (gdb) where
 #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
 #1  0x7fd31f6210b9 in _Unwind_Backtrace () from
 /lib64/libgcc_s.so.1
 #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
 #3  0x7fd320b0d622 in opal_backtrace_buffer
>>> (message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
   at
 ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
 #4  0x7fd320b0a794 in show_stackframe (signo=11,
 info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
 ../../../opal/util/stacktrace.c:354
 #5  
 #6  0x7fd31e5f208d in ?? ()
 #7  0x7fd31e5e46d8 in ?? ()
 #8  0xc2a8 in ?? ()
 #9  0x in ?? ()


 
 --
 - This email message is for the sole use of the intended
 recipient(s) and may contain confidential information.  Any
 unauthorized review, use, disclosure or distribution is prohibited.
 If you are not the intended recipient, please contact the sender by
 reply email and destroy all copies of the original message.
 
 --
 - ___
 devel mailing list
 de...@open-mpi.org
 

Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Ralph Castain
That option might explain why your test process is failing (which segfaulted as 
well), but obviously wouldn't have anything to do with mpirun

On Jan 30, 2014, at 9:29 AM, Rolf vandeVaart  wrote:

> I just retested with --mca mpi_leave_pinned 0 and that made no difference.  I 
> still see the mpirun crash.
> 
>> -Original Message-
>> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>> Bosilca
>> Sent: Thursday, January 30, 2014 11:59 AM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Intermittent mpirun crash?
>> 
>> I got something similar 2 days ago, with a large software package abusing of
>> MPI_Waitany/MPI_Waitsome (that was working seamlessly a month ago). I
>> had to find a quick fix. Upon figuring out that turning the leave_pinned off
>> fixes the problem, I did not investigate any further.
>> 
>> Do you see a similar behavior?
>> 
>> George.
>> 
>> On Jan 30, 2014, at 17:26 , Rolf vandeVaart  wrote:
>> 
>>> I am seeing this happening to me very intermittently.  Looks like mpirun is
>> getting a SEGV.  Is anyone else seeing this?
>>> This is 1.7.4 built yesterday.  (Note that I added some stuff to what
>>> is being printed out so the message is slightly different than 1.7.4
>>> output)
>>> 
>>> mpirun - -np 6 -host
>>> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
>>> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
>>> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome
>>> using two nodes
>>> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes
>>> all tests PASSED (742) [drossetti-ivy0:10353] *** Process
>>> (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
>>> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
>>> not mapped (1) [drossetti-ivy0:10353] Failing at address:
>>> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information - not
>>> sleeping
>>> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
>>> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
>> tests/trunk/intel_tests'
>>> 
>>> (gdb) where
>>> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
>>> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from
>>> /lib64/libgcc_s.so.1
>>> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
>>> #3  0x7fd320b0d622 in opal_backtrace_buffer
>> (message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
>>>   at
>>> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
>>> #4  0x7fd320b0a794 in show_stackframe (signo=11,
>>> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
>>> ../../../opal/util/stacktrace.c:354
>>> #5  
>>> #6  0x7fd31e5f208d in ?? ()
>>> #7  0x7fd31e5e46d8 in ?? ()
>>> #8  0xc2a8 in ?? ()
>>> #9  0x in ?? ()
>>> 
>>> 
>>> --
>>> - This email message is for the sole use of the intended
>>> recipient(s) and may contain confidential information.  Any
>>> unauthorized review, use, disclosure or distribution is prohibited.
>>> If you are not the intended recipient, please contact the sender by
>>> reply email and destroy all copies of the original message.
>>> --
>>> - ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Rolf vandeVaart
I just retested with --mca mpi_leave_pinned 0 and that made no difference.  I 
still see the mpirun crash.

>-Original Message-
>From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of George
>Bosilca
>Sent: Thursday, January 30, 2014 11:59 AM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] Intermittent mpirun crash?
>
>I got something similar 2 days ago, with a large software package abusing of
>MPI_Waitany/MPI_Waitsome (that was working seamlessly a month ago). I
>had to find a quick fix. Upon figuring out that turning the leave_pinned off
>fixes the problem, I did not investigate any further.
>
>Do you see a similar behavior?
>
>  George.
>
>On Jan 30, 2014, at 17:26 , Rolf vandeVaart  wrote:
>
>> I am seeing this happening to me very intermittently.  Looks like mpirun is
>getting a SEGV.  Is anyone else seeing this?
>> This is 1.7.4 built yesterday.  (Note that I added some stuff to what
>> is being printed out so the message is slightly different than 1.7.4
>> output)
>>
>> mpirun - -np 6 -host
>> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca
>> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
>> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome
>> using two nodes
>> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes
>> all tests PASSED (742) [drossetti-ivy0:10353] *** Process
>> (mpirun)received signal *** [drossetti-ivy0:10353] Signal:
>> Segmentation fault (11) [drossetti-ivy0:10353] Signal code: Address
>> not mapped (1) [drossetti-ivy0:10353] Failing at address:
>> 0x7fd31e5f208d [drossetti-ivy0:10353] End of signal information - not
>> sleeping
>> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
>> gmake[1]: Leaving directory `/geppetto/home/rvandevaart/public/ompi-
>tests/trunk/intel_tests'
>>
>> (gdb) where
>> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
>> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from
>> /lib64/libgcc_s.so.1
>> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
>> #3  0x7fd320b0d622 in opal_backtrace_buffer
>(message_out=0x7fd31e5e33a0, len_out=0x7fd31e5e33ac)
>>at
>> ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
>> #4  0x7fd320b0a794 in show_stackframe (signo=11,
>> info=0x7fd31e5e3930, p=0x7fd31e5e3800) at
>> ../../../opal/util/stacktrace.c:354
>> #5  
>> #6  0x7fd31e5f208d in ?? ()
>> #7  0x7fd31e5e46d8 in ?? ()
>> #8  0xc2a8 in ?? ()
>> #9  0x in ?? ()
>>
>>
>> --
>> - This email message is for the sole use of the intended
>> recipient(s) and may contain confidential information.  Any
>> unauthorized review, use, disclosure or distribution is prohibited.
>> If you are not the intended recipient, please contact the sender by
>> reply email and destroy all copies of the original message.
>> --
>> - ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>___
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel


Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread George Bosilca
I got something similar 2 days ago, with a large software package abusing of 
MPI_Waitany/MPI_Waitsome (that was working seamlessly a month ago). I had to 
find a quick fix. Upon figuring out that turning the leave_pinned off fixes the 
problem, I did not investigate any further.

Do you see a similar behavior?

  George.

On Jan 30, 2014, at 17:26 , Rolf vandeVaart  wrote:

> I am seeing this happening to me very intermittently.  Looks like mpirun is 
> getting a SEGV.  Is anyone else seeing this?
> This is 1.7.4 built yesterday.  (Note that I added some stuff to what is 
> being printed out so the message is slightly different than 1.7.4 output)
> 
> mpirun - -np 6 -host 
> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca 
> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome using two 
> nodes
> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes all 
> tests PASSED (742)
> [drossetti-ivy0:10353] *** Process (mpirun)received signal ***
> [drossetti-ivy0:10353] Signal: Segmentation fault (11)
> [drossetti-ivy0:10353] Signal code: Address not mapped (1)
> [drossetti-ivy0:10353] Failing at address: 0x7fd31e5f208d
> [drossetti-ivy0:10353] End of signal information - not sleeping
> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
> gmake[1]: Leaving directory 
> `/geppetto/home/rvandevaart/public/ompi-tests/trunk/intel_tests'
> 
> (gdb) where
> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
> #3  0x7fd320b0d622 in opal_backtrace_buffer (message_out=0x7fd31e5e33a0, 
> len_out=0x7fd31e5e33ac)
>at ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
> #4  0x7fd320b0a794 in show_stackframe (signo=11, info=0x7fd31e5e3930, 
> p=0x7fd31e5e3800) at ../../../opal/util/stacktrace.c:354
> #5  
> #6  0x7fd31e5f208d in ?? ()
> #7  0x7fd31e5e46d8 in ?? ()
> #8  0xc2a8 in ?? ()
> #9  0x in ?? ()
> 
> 
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Ralph Castain
Huh - not much info there, I'm afraid. I gather you didn't build this with 
--enable-debug?

On Jan 30, 2014, at 8:26 AM, Rolf vandeVaart  wrote:

> I am seeing this happening to me very intermittently.  Looks like mpirun is 
> getting a SEGV.  Is anyone else seeing this?
> This is 1.7.4 built yesterday.  (Note that I added some stuff to what is 
> being printed out so the message is slightly different than 1.7.4 output)
> 
> mpirun - -np 6 -host 
> drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca 
> btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
> MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome using two 
> nodes
> MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes all 
> tests PASSED (742)
> [drossetti-ivy0:10353] *** Process (mpirun)received signal ***
> [drossetti-ivy0:10353] Signal: Segmentation fault (11)
> [drossetti-ivy0:10353] Signal code: Address not mapped (1)
> [drossetti-ivy0:10353] Failing at address: 0x7fd31e5f208d
> [drossetti-ivy0:10353] End of signal information - not sleeping
> gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
> gmake[1]: Leaving directory 
> `/geppetto/home/rvandevaart/public/ompi-tests/trunk/intel_tests'
> 
> (gdb) where
> #0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
> #1  0x7fd31f6210b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
> #2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
> #3  0x7fd320b0d622 in opal_backtrace_buffer (message_out=0x7fd31e5e33a0, 
> len_out=0x7fd31e5e33ac)
>at ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
> #4  0x7fd320b0a794 in show_stackframe (signo=11, info=0x7fd31e5e3930, 
> p=0x7fd31e5e3800) at ../../../opal/util/stacktrace.c:354
> #5  
> #6  0x7fd31e5f208d in ?? ()
> #7  0x7fd31e5e46d8 in ?? ()
> #8  0xc2a8 in ?? ()
> #9  0x in ?? ()
> 
> 
> ---
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> ---
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Intermittent mpirun crash?

2014-01-30 Thread Rolf vandeVaart
I am seeing this happening to me very intermittently.  Looks like mpirun is 
getting a SEGV.  Is anyone else seeing this?
This is 1.7.4 built yesterday.  (Note that I added some stuff to what is being 
printed out so the message is slightly different than 1.7.4 output)

mpirun - -np 6 -host 
drossetti-ivy0,drossetti-ivy1,drossetti-ivy2,drossetti-ivy3 --mca 
btl_openib_warn_default_gid_prefix 0  --  `pwd`/src/MPI_Waitsome_p_c
MPITEST info  (0): Starting:  MPI_Waitsome_p:  Persistent Waitsome using two 
nodes
MPITEST_results: MPI_Waitsome_p:  Persistent Waitsome using two nodes all tests 
PASSED (742)
[drossetti-ivy0:10353] *** Process (mpirun)received signal ***
[drossetti-ivy0:10353] Signal: Segmentation fault (11)
[drossetti-ivy0:10353] Signal code: Address not mapped (1)
[drossetti-ivy0:10353] Failing at address: 0x7fd31e5f208d
[drossetti-ivy0:10353] End of signal information - not sleeping
gmake[1]: *** [MPI_Waitsome_p_c] Segmentation fault (core dumped)
gmake[1]: Leaving directory 
`/geppetto/home/rvandevaart/public/ompi-tests/trunk/intel_tests'

(gdb) where
#0  0x7fd31f620807 in ?? () from /lib64/libgcc_s.so.1
#1  0x7fd31f6210b9 in _Unwind_Backtrace () from /lib64/libgcc_s.so.1
#2  0x7fd31fb2893e in backtrace () from /lib64/libc.so.6
#3  0x7fd320b0d622 in opal_backtrace_buffer (message_out=0x7fd31e5e33a0, 
len_out=0x7fd31e5e33ac)
at ../../../../../opal/mca/backtrace/execinfo/backtrace_execinfo.c:57
#4  0x7fd320b0a794 in show_stackframe (signo=11, info=0x7fd31e5e3930, 
p=0x7fd31e5e3800) at ../../../opal/util/stacktrace.c:354
#5  
#6  0x7fd31e5f208d in ?? ()
#7  0x7fd31e5e46d8 in ?? ()
#8  0xc2a8 in ?? ()
#9  0x in ?? ()


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


Re: [OMPI devel] Still getting 100% trunk failure on 32 bit platform: coll ml

2014-01-30 Thread Shamis, Pavel
Let me know if you need y help.

Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory






On Jan 30, 2014, at 10:27 AM, Nathan Hjelm 
> wrote:

Ok. Looks like I need to fix one more. Will take a look now.

-Nathan

On Thu, Jan 30, 2014 at 01:25:44PM +, Jeff Squyres (jsquyres) wrote:
MTT shows 100% trunk failure on 32 bit platform:

   http://mtt.open-mpi.org/index.php?do_redir=2144

It's seg faulting in mca_coll_ml_comm_query().

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Still getting 100% trunk failure on 32 bit platform: coll ml

2014-01-30 Thread Nathan Hjelm
Ok. Looks like I need to fix one more. Will take a look now.

-Nathan

On Thu, Jan 30, 2014 at 01:25:44PM +, Jeff Squyres (jsquyres) wrote:
> MTT shows 100% trunk failure on 32 bit platform:
> 
> http://mtt.open-mpi.org/index.php?do_redir=2144
> 
> It's seg faulting in mca_coll_ml_comm_query().
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


pgpV809HGdU9X.pgp
Description: PGP signature


[OMPI devel] Still getting 100% trunk failure on 32 bit platform: coll ml

2014-01-30 Thread Jeff Squyres (jsquyres)
MTT shows 100% trunk failure on 32 bit platform:

http://mtt.open-mpi.org/index.php?do_redir=2144

It's seg faulting in mca_coll_ml_comm_query().

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/