Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-16 Thread Lisandro Dalcin
On Thu, Feb 12, 2009 at 10:02 PM, Jeff Squyres  wrote:
> On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote:
>
>> Below a list of stuff that I've got by running mpi4py testsuite. Never
>> reported them before just because some of them are not actually
>> errors, but anyway, I want to raise the discussion.
>>
>> - Likely bugs (regarding my interpretation of the MPI standard)
>>
>> 1) When passing MPI_REQUEST_NULL, MPI_Request_free() DO NOT fail.
>>
>> 2) When passing MPI_REQUEST_NULL, MPI_Cancel() DO NOT fail.
>>
>> 3) When passing MPI_REQUEST_NULL, MPI_Request_get_status() DO NOT fail.
>
> I agree with all of these; I'm not sure why we allowed MPI_REQUEST_NULL.  I
> double checked LAM/MPI -- it errors in all of these cases.  So OMPI now
> does, too.
>
>> 4)  When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and
>> MPI_Win_set_errhandler()  DO NOT fail.
>
> I was a little more dubious here; the param checking code was specifically
> checking for MPI_WIN_NULL and not classifying it as an error.  Digging to
> find out why we did that, the best that I can come up with is that it is
> *not* an error to call MPI_File_set|get_errhandler on MPI_FILE_NULL (to set
> behavior for what happens when FILE_OPEN fails); I'm *guessing* that we
> simply copied the _File_ code to the _Win_ code and forgot to remove that
> extra check.
>
> I can't find anything in MPI-2.1 that says it is legal to call set|get
> errhandler on MPI_WIN_NULL.  I checked LAM as well; LAM errors in this case.
>  So I made this now be an error in OMPI as well.
>
> Do you need these in the 1.3 series?  Or are you ok waiting for 1.4
> (assuming 1.4 takes significantly less time to release than 1.3 :-) ).
>

I do not have a strong need to get those fixes in 1.3 series. In
mpi4py, I have some compatibility layer in a implementation by
implementation (well, actually just MPICH 1/2, Open MPI and LAM) and
release by release basis trying to hide those small discrepancies and
bugs in the MPI's out there.

>> - Unexpected errors classes (at least for me)
>>
>> 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with
>> MPI_ERR_ARG. I would expect MPI_ERR_COMM.
>
> I don't have a strong feeling on this one; I think you could probably argue
> either way.  That being said, we haven't paid too close attention to the
> error values that we return.  Unfortunately, I don't think there's much
> standardization between different MPI implementations, unless they share a
> common code ancestry.
>

You are right... However, IMHO, some agreement between Open MPI and
MPICH2 would be great, right :) ? In the end, they are the
reference/basis for other implementations.

>> 2) MPI_Type_free() fails with MPI_ERR_INTERN when passing predefined
>> datatypes like MPI_INT or MPI_FLOAT. I would expect MPI_ERR_TYPE.
>
> Ya, that seems weird.  Fixed.
>
>> - Controversial (I'm even fine with the current behavior)
>>
>> 1) MPI_Info_get_nthkey(info, n) returns MPI_ERR_INFO_KEY when "n" is
>> larger that the number of keys. Perhaps MPI_ERR_ARG would be more
>> appropriate? A possible rationale would be that the error is not
>> related to the contents of a 'key' string, but an out of range value
>> for "n".
>
> I don't have a particular opinion on this one.
>
>> That's all. Sorry for being so pedantic :-) and not offering help for
>> the patches, but I'm really busy.
>
>
> No worries; this stuff is great.  Thanks -- and keep it coming!  (we usually
> remember to cite people who submit stuff like this; e.g.,
> https://svn.open-mpi.org/trac/ompi/changeset/20537 and
> https://svn.open-mpi.org/trac/ompi/changeset/20538).
>

Jeff, once again, many thanks for you fast response, and even more
thanks for fixing the issues!


> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] possible bugs and unexpected values in returned errors classes

2009-02-16 Thread Lisandro Dalcin
Just found something new to comment after diving into the actual sources

On Thu, Feb 12, 2009 at 10:02 PM, Jeff Squyres  wrote:
> On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote:
>>
>> 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with
>> MPI_ERR_ARG. I would expect MPI_ERR_COMM.
>
> I don't have a strong feeling on this one; I think you could probably argue
> either way.  That being said, we haven't paid too close attention to the
> error values that we return.  Unfortunately, I don't think there's much
> standardization between different MPI implementations, unless they share a
> common code ancestry.
>

After running my testsuite again and next looking at
"ompi/mpi/c/comm_set_errhandler.c", I noticed that
MPI_Comm_set_errhandler() do return MPI_ERR_COMM when invalid
communicators are passed. IMHO, for the sake of consistency, you
should fix MPI_Comm_get_errhandler() to behave the same as the setter.
Would this rationale be enough?


-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



[OMPI devel] RFC: New Open MPI release procedure

2009-02-16 Thread Jeff Squyres

What: Proposal for a new release methodology for the Open MPI Project.

Why: We have [at least] 2 competing forces in Open MPI:
  - desire to release new features quickly.  Fast is good.
  - desire to release based on production quality.  Slow is good.

  The competition between these two forces has both created some
  tension in the Open MPI community as well as created a Very Long
  release cycle for OMPI v1.3 (yes, it was our specific and deliberate
  choice to be feature driven -- but it was still very lng).

How: Take some ideas from other well-established release paradigms,  
such as:

  - Linux kernel "odd/even" version number release methodology
  - Red Hat/Fedora stable vs. feature releases
  - Agile development models

When: For all releases after the v1.3 series (i.e., this proposal does
not include any releases in the v1.3 series).

--> Ralph and I will talk through all the details and answer any
questions on tomorrow's teleconference (Tue, 17 Feb 2009).

= 



Details:

In v1.3, we let a lot of really good features sit in development for a
long, long time.  Yes, we chose to do this and there were good reasons
for doing so, but the fact remains that we had some really good stuff
done and stable for long periods of time, but they weren't generally
available to users who wanted to use them.  Even for users who are
willing to be out on the bleeding edge, trunk tarballs are just too
scary.

Given the two competing forces mentioned above (feature/fast releases
+ stable/slow releases), it seems like we really want two different --
but overlapping -- release mechanisms.

Taking inspiration from other well-established paradigms, Ralph and I
propose the following for all releases starting with v1.4.0:

- Have two concurrent release series:
  1. "Super stable": for production users who care about stability
 above all else.  They're willing to wait long periods of time
 before updating to a new version of Open MPI.
  2. "Feature driven": for users who are willing to take a few chances
 to get new OMPI features -- but cannot endure the chaos of
 nightly trunk tarballs.

- The general idea is that a feature driven release is developed for a
  while in an RM-regulated yet somewhat agile development style.  When
  specific criteria are met (i.e., feature complete, schedule driven,
  etc.), the feature release series is morphed into a super stable
  state and released.  At this point, all development stops on that
  release series; only bug fixes are allowed.
- RM's therefore become responsible for *two* release series: a
  feature driven series and the corresponding super stable series that
  emerges from it.

***KEY POINT*** This "two release" methodology allows for the release
(and real-world testing) of new features in a much more timely fashion
than our current release methodology.

Here's a crude ASCII art representation of how branches will work
using this proposal in SVN:

  v1.3 series/super stable
 v1.3.0  v1.3.2   v1.6.0
/|---|---|---| 
> /-|---|---|->

trunk  / v1.3.1  v1.3.1/
>
  \   v1.4.0  v1.4.2  v1.4.4  ...   v1.5.0   v1.5.1

\--|---|---|---|---|---|---|---|---|-||-->
  v1.4.1  v1.4.3  ...now becomes
 v1.4/feature driven v1.5/super stable

Here's how a typical release cycle works:

- Assume that a "super stable" version exists; a release series that
  has an odd minor number: v1.3, v1.5, v1.7, ...etc.
- For this example, let's assume that the super stable is v1.3.
- Only bug fixes go into the "super stable" series.

- Plans for the next "super stable" are drawn up (v1.5 in this
  example), including a list of goals, new features, a timeline, etc.

- A new feature release series is created shortly after the first
  super stable release with a minor version number that is even (e.g.,
  v1.4, v1.6, v1.8, ...etc.).
- In this example, the feature release series will be v1.4.
- The v1.4 branch is taken to a point with at least some degree of
  stability and released as v1.4.0.

- Development on the SVN trunk continues.

- According to a public schedule (probably tied to our teleconference
  schedule), the RM's will approve the moving of features and bug
  fixes to the feature release.
  - Rather than submitting CMRs for the Gatekeeper to move, the
"owner" of a particular feature/bug fix will be assigned a
specific time to move the item to the feature branch.
  - For example, George will have from Tues-Fri to move his cool new
feature X to the v1.4 branch.
  - Friday night, new 1.4 tarballs are cut and everyone's MTT tries
them out.
  - Iterate for the next week or so to get the v1.4

Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562

2009-02-16 Thread Josh Hursey
This commit seems to have broken the tools. If I use orte-ps then on  
finalize I get an abort() with the following stack:


shell$ orte-ps
...
(gdb) bt
#0  0x2bcee155 in raise () from /lib64/libc.so.6
#1  0x2bcefbf0 in abort () from /lib64/libc.so.6
#2  0x2bce75d6 in __assert_fail () from /lib64/libc.so.6
#3  0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at  
signal.c:295
#4  0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0)  
at poll.c:390
#5  0x2af70667 in opal_event_base_free (base=0x609f50) at  
event.c:530

#6  0x2af70519 in opal_event_fini () at event.c:390
#7  0x2af5f624 in opal_finalize () at runtime/opal_finalize.c: 
117

#8  0x2acd4fc4 in orte_finalize () at runtime/orte_finalize.c:84
#9  0x0040196a in main (argc=1, argv=0x7fffdf38) at orte- 
ps.c:275


Any thoughts on why this is happening for only the tools case?

-- Josh

On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009)
New Revision: 20562
URL: https://svn.open-mpi.org/trac/ompi/changeset/20562

Log:
Release the default base on finalize.

Text files modified:
  trunk/opal/event/event.c | 4 
  1 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/opal/event/event.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/opal/event/event.c(original)
+++ trunk/opal/event/event.c	2009-02-14 16:51:09 EST (Sat, 14 Feb  
2009)

@@ -386,6 +386,10 @@
if (NULL != opal_event_module_include) {
opal_argv_free(opal_event_module_include);
}
+if( NULL != opal_current_base ) {
+event_base_free(opal_current_base);
+opal_current_base = NULL;
+}
return OPAL_SUCCESS;
}

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn




Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h

2009-02-16 Thread Matney Sr, Kenneth D.
The basic mechanics of this is similar to the problem with the portals 
BTL that I fixed.  However, in my case, the problem manifested itself 
with the Intel test MPI_Send_Fairness_c (and MPI_Isend_Fairness_c) at 
60 processes (the limit that MTT imposes on the Intel tests).

The original code followed the portals design document for MPI pretty 
well.  When the receiver is overwhelmed, a "reject" entry is used to 
handle the excess messages.  One of the features of this "reject" entry 
is that the receiver (at the BTL level) never interacts with the actual 
message.  The problem was that the sender did not recognize the return 
ACK from portals [in mca_btl_portals_component_progress()] as a failure.
So, the sender did not resend a message that the receiver was expecting.
While I fixed it in the trunk, I had to disable mca_btl_portals_sendi()
because there is a potential for this function to be used with a 0-byte
portals message payload.  For this particular test, 
https://svn.open-mpi.org/trac/ompi/ticket/1791, we would not have
seen a failure because, the root process would not know that it had
missed a message and the non-root processes would not have diagnosed
a need to resend.  As corrected, the root process still is FD&H and
the non-root processes will keep re-transmitting until success.

Sorry for boring you about portals.  In the sm case, the non-root 
processes continually are appending to FIFOs.  Obviously, these blasters 
can append to the FIFOs much more quickly than the processes can remove 
entries:  

S7 --> S0
S6 --> S1
S5 --> S2
S4 --> S3

In the first cycle, everyone is busy.  In the second cycle, S7, S6, S5, 
and S4 are ready for the next reduction, but S3, S2, S1, and S0 still 
are on the hook, meaning that the latter FIFOs are going to grow at
a faster rate:

S3 --> S0
S2 --> S1

Now, S3 and S2 are ready for the next reduction, but S0 and S1 still have
work left in the current reduction:

S1 --> S0

Since S0 (the root process) takes a little time to finish processing the
reduction, it is going to be a little behind S1.  So we end up with the 
Following timings:

S0: (3+Δ)T
S1: 3T
S2: 2T
S2: 2T
S2: 2T
S2: 2T



-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Eugene Loh
Sent: Friday, February 13, 2009 11:42 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h

George Bosilca wrote:

> I can't confirm or deny. The only thing I can tell is that the same  
> test works fine over other BTL, so this tent either to pinpoint a  
> problem in the sm BTL or in a particular path in the PML (the one 
> used  by the sm BTL). I'll have to dig a little bit more into it, but 
> I was  hoping to do it in the context of the new sm BTL (just to avoid 
> having  to do it twice).

Okay.  I'll try to get "single queue" put back soon and might look at 
1791 along the way.

But here is what I wonder.  Let's say you have one-way traffic -- either 
rank A sending rank B messages without ever any traffic in the other 
direction, or repeated MPI_Reduce operations always with the same root 
-- and the senders somehow get well ahead of the receiver.  Say, A wants 
to pump 1,000,000 messages over and B is busy doing something else.  
What should happen?  What should the PML and BTL do?  The conditions 
could range from B not being in MPI at all, or B listening to the BTL 
without yet having the posted receives to match.  Should the connection 
become congested and force the sender to wait -- and if so, is this in 
the BTL or PML?  Or, should B keep on queueing up the unexpected messages?

After some basic "single queue" putbacks, I'll try to look at the code 
and understand what the PML is doing in cases like this.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h

2009-02-16 Thread Matney Sr, Kenneth D.
Sorry about the premature send...

The basic mechanics of this is similar to the problem with the portals 
BTL that I fixed.  However, in my case, the problem manifested itself 
with the Intel test MPI_Send_Fairness_c (and MPI_Isend_Fairness_c) at 
60 processes (the limit that MTT imposes on the Intel tests).

The original code followed the portals design document for MPI pretty 
well.  When the receiver is overwhelmed, a "reject" entry is used to 
handle the excess messages.  One of the features of this "reject" entry 
is that the receiver (at the BTL level) never interacts with the actual 
message.  The problem was that the sender did not recognize the return 
ACK from portals [in mca_btl_portals_component_progress()] as a failure.
So, the sender did not resend a message that the receiver was expecting.
While I fixed it in the trunk, I had to disable mca_btl_portals_sendi()
because there is a potential for this function to be used with a 0-byte
portals message payload.  For this particular test, 
https://svn.open-mpi.org/trac/ompi/ticket/1791, we would not have
seen a failure because, the root process would not know that it had
missed a message and the non-root processes would not have diagnosed
a need to resend.  As corrected, the root process still is FD&H and
the non-root processes will keep re-transmitting until success.

Sorry for boring you about portals.  In the sm case, the non-root 
processes continually are appending to FIFOs.  Obviously, these blasters 
can append to the FIFOs much more quickly than the processes can remove 
entries:  

S7 --> S0
S6 --> S1
S5 --> S2
S4 --> S3

In the first cycle, everyone is busy.  In the second cycle, S7, S6, S5, 
and S4 are ready for the next reduction, but S3, S2, S1, and S0 still 
are on the hook, meaning that the latter FIFOs are going to grow at
a faster rate:

S3 --> S0
S2 --> S1

Now, S3 and S2 are ready for the next reduction, but S0 and S1 still have
work left in the current reduction:

S1 --> S0

Since S0 (the root process) takes a little time to finish processing the
reduction, it is going to be a little behind S1.  So we end up with the 
Following timings:

S0: (3+Δ)T
S1: 3T
S2: 2T
S3: 2T
S4: 1T
S5: 1T
S6: 1T
S7: 1T

If sm used a system of ACKs as in portals, we would know when we
are overloading the root process.  However, since it does not, and the
reduction itself is non-blocking, we have the potential to exhaust memory.
I guess that the real question is whether the reduction should be blocking
or whether we expect the user to protect himself.
--

-Original Message-
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Eugene Loh
Sent: Friday, February 13, 2009 11:42 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h

George Bosilca wrote:

> I can't confirm or deny. The only thing I can tell is that the same  
> test works fine over other BTL, so this tent either to pinpoint a  
> problem in the sm BTL or in a particular path in the PML (the one 
> used  by the sm BTL). I'll have to dig a little bit more into it, but 
> I was  hoping to do it in the context of the new sm BTL (just to avoid 
> having  to do it twice).

Okay.  I'll try to get "single queue" put back soon and might look at 
1791 along the way.

But here is what I wonder.  Let's say you have one-way traffic -- either 
rank A sending rank B messages without ever any traffic in the other 
direction, or repeated MPI_Reduce operations always with the same root 
-- and the senders somehow get well ahead of the receiver.  Say, A wants 
to pump 1,000,000 messages over and B is busy doing something else.  
What should happen?  What should the PML and BTL do?  The conditions 
could range from B not being in MPI at all, or B listening to the BTL 
without yet having the posted receives to match.  Should the connection 
become congested and force the sender to wait -- and if so, is this in 
the BTL or PML?  Or, should B keep on queueing up the unexpected messages?

After some basic "single queue" putbacks, I'll try to look at the code 
and understand what the PML is doing in cases like this.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562

2009-02-16 Thread George Bosilca

Josh,

Spending few minutes to understand, could have pinpointed you to the  
real culprit: the tool itself!


The assert in the code state that on finalize there is still a  
registered signal handler. A quick gdb show that this is for the  
SIG_CHLD. Tracking the signal addition in the tool (breakpoint in gdb  
on opal_event_queue_insert) clearly highlight the place where this  
happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274. So far  
so good, we're right of tracking the SIG_CHLD, but we're not supposed  
to leave it there when we're done (as the signal is registered with  
the PERSISTENT option). Leaving ... ah there is a function to cleanly  
unregister them, just by the orte_wait_init, with a very clear name:  
orte_wait_finalize. Wonderful, except that in the case of a tool this  
is never called. Strange isn't it that no other components in the ompi  
tree exhibit such a behavior. Maybe grep can help ... There we are:


[bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn  
orte_wait_finalize {} \;

./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_orted.c:310:orte_wait_finalize();
./orte/runtime/orte_wait.c:280:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:872:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void)

This clearly show that with the exception of the tools everybody else  
clear their state before leaving. And here we are, a quick patch that  
really fix the problem without removing code that had a really good  
reason to be there.


Index: orte/mca/ess/base/ess_base_std_tool.c
===
--- orte/mca/ess/base/ess_base_std_tool.c   (revision 20564)
+++ orte/mca/ess/base/ess_base_std_tool.c   (working copy)
@@ -158,6 +158,8 @@

 int orte_ess_base_tool_finalize(void)
 {
+orte_wait_finalize();
+
 /* if I am a tool, then all I will have done is
  * a very small subset of orte_init - ensure that
  * I only back those elements out


  george.


On Feb 16, 2009, at 12:57 , Josh Hursey wrote:

This commit seems to have broken the tools. If I use orte-ps then on  
finalize I get an abort() with the following stack:


shell$ orte-ps
...
(gdb) bt
#0  0x2bcee155 in raise () from /lib64/libc.so.6
#1  0x2bcefbf0 in abort () from /lib64/libc.so.6
#2  0x2bce75d6 in __assert_fail () from /lib64/libc.so.6
#3  0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at  
signal.c:295
#4  0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0)  
at poll.c:390
#5  0x2af70667 in opal_event_base_free (base=0x609f50) at  
event.c:530

#6  0x2af70519 in opal_event_fini () at event.c:390
#7  0x2af5f624 in opal_finalize () at runtime/ 
opal_finalize.c:117
#8  0x2acd4fc4 in orte_finalize () at runtime/ 
orte_finalize.c:84
#9  0x0040196a in main (argc=1, argv=0x7fffdf38) at orte- 
ps.c:275


Any thoughts on why this is happening for only the tools case?

-- Josh

On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009)
New Revision: 20562
URL: https://svn.open-mpi.org/trac/ompi/changeset/20562

Log:
Release the default base on finalize.

Text files modified:
 trunk/opal/event/event.c | 4 
 1 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/opal/event/event.c
= 
= 
= 
= 
= 
= 
= 
= 
= 
=

--- trunk/opal/event/event.c(original)
+++ trunk/opal/event/event.c	2009-02-14 16:51:09 EST (Sat, 14 Feb  
2009)

@@ -386,6 +386,10 @@
   if (NULL != opal_event_module_include) {
   opal_argv_free(opal_event_module_include);
   }
+if( NULL != opal_current_base ) {
+event_base_free(opal_current_base);
+opal_current_base = NULL;
+}
   return OPAL_SUCCESS;
}

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562

2009-02-16 Thread Jeff Squyres

George --

Will you commit?

On Feb 16, 2009, at 2:59 PM, George Bosilca wrote:


Josh,

Spending few minutes to understand, could have pinpointed you to the  
real culprit: the tool itself!


The assert in the code state that on finalize there is still a  
registered signal handler. A quick gdb show that this is for the  
SIG_CHLD. Tracking the signal addition in the tool (breakpoint in  
gdb on opal_event_queue_insert) clearly highlight the place where  
this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274.  
So far so good, we're right of tracking the SIG_CHLD, but we're not  
supposed to leave it there when we're done (as the signal is  
registered with the PERSISTENT option). Leaving ... ah there is a  
function to cleanly unregister them, just by the orte_wait_init,  
with a very clear name: orte_wait_finalize. Wonderful, except that  
in the case of a tool this is never called. Strange isn't it that no  
other components in the ompi tree exhibit such a behavior. Maybe  
grep can help ... There we are:


[bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn  
orte_wait_finalize {} \;

./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_orted.c:310:orte_wait_finalize();
./orte/runtime/orte_wait.c:280:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:872:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void)

This clearly show that with the exception of the tools everybody  
else clear their state before leaving. And here we are, a quick  
patch that really fix the problem without removing code that had a  
really good reason to be there.


Index: orte/mca/ess/base/ess_base_std_tool.c
===
--- orte/mca/ess/base/ess_base_std_tool.c   (revision 20564)
+++ orte/mca/ess/base/ess_base_std_tool.c   (working copy)
@@ -158,6 +158,8 @@

int orte_ess_base_tool_finalize(void)
{
+orte_wait_finalize();
+
/* if I am a tool, then all I will have done is
 * a very small subset of orte_init - ensure that
 * I only back those elements out


 george.


On Feb 16, 2009, at 12:57 , Josh Hursey wrote:

This commit seems to have broken the tools. If I use orte-ps then  
on finalize I get an abort() with the following stack:


shell$ orte-ps
...
(gdb) bt
#0  0x2bcee155 in raise () from /lib64/libc.so.6
#1  0x2bcefbf0 in abort () from /lib64/libc.so.6
#2  0x2bce75d6 in __assert_fail () from /lib64/libc.so.6
#3  0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at  
signal.c:295
#4  0x2af73f36 in poll_dealloc (base=0x609f50,  
arg=0x60a9a0) at poll.c:390
#5  0x2af70667 in opal_event_base_free (base=0x609f50) at  
event.c:530

#6  0x2af70519 in opal_event_fini () at event.c:390
#7  0x2af5f624 in opal_finalize () at runtime/ 
opal_finalize.c:117
#8  0x2acd4fc4 in orte_finalize () at runtime/ 
orte_finalize.c:84
#9  0x0040196a in main (argc=1, argv=0x7fffdf38) at  
orte-ps.c:275


Any thoughts on why this is happening for only the tools case?

-- Josh

On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009)
New Revision: 20562
URL: https://svn.open-mpi.org/trac/ompi/changeset/20562

Log:
Release the default base on finalize.

Text files modified:
trunk/opal/event/event.c | 4 
1 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/opal/event/event.c
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 


--- trunk/opal/event/event.c(original)
+++ trunk/opal/event/event.c	2009-02-14 16:51:09 EST (Sat, 14 Feb  
2009)

@@ -386,6 +386,10 @@
  if (NULL != opal_event_module_include) {
  opal_argv_free(opal_event_module_include);
  }
+if( NULL != opal_current_base ) {
+event_base_free(opal_current_base);
+opal_current_base = NULL;
+}
  return OPAL_SUCCESS;
}

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562

2009-02-16 Thread Jeff Squyres

Never mind -- you just did.  Thanks!  :-)

On Feb 16, 2009, at 3:07 PM, Jeff Squyres wrote:


George --

Will you commit?

On Feb 16, 2009, at 2:59 PM, George Bosilca wrote:


Josh,

Spending few minutes to understand, could have pinpointed you to  
the real culprit: the tool itself!


The assert in the code state that on finalize there is still a  
registered signal handler. A quick gdb show that this is for the  
SIG_CHLD. Tracking the signal addition in the tool (breakpoint in  
gdb on opal_event_queue_insert) clearly highlight the place where  
this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274.  
So far so good, we're right of tracking the SIG_CHLD, but we're not  
supposed to leave it there when we're done (as the signal is  
registered with the PERSISTENT option). Leaving ... ah there is a  
function to cleanly unregister them, just by the orte_wait_init,  
with a very clear name: orte_wait_finalize. Wonderful, except that  
in the case of a tool this is never called. Strange isn't it that  
no other components in the ompi tree exhibit such a behavior. Maybe  
grep can help ... There we are:


[bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn  
orte_wait_finalize {} \;

./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_orted.c:310: 
orte_wait_finalize();

./orte/runtime/orte_wait.c:280:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:872:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void)

This clearly show that with the exception of the tools everybody  
else clear their state before leaving. And here we are, a quick  
patch that really fix the problem without removing code that had a  
really good reason to be there.


Index: orte/mca/ess/base/ess_base_std_tool.c
===
--- orte/mca/ess/base/ess_base_std_tool.c   (revision 20564)
+++ orte/mca/ess/base/ess_base_std_tool.c   (working copy)
@@ -158,6 +158,8 @@

int orte_ess_base_tool_finalize(void)
{
+orte_wait_finalize();
+
   /* if I am a tool, then all I will have done is
* a very small subset of orte_init - ensure that
* I only back those elements out


george.


On Feb 16, 2009, at 12:57 , Josh Hursey wrote:

This commit seems to have broken the tools. If I use orte-ps then  
on finalize I get an abort() with the following stack:


shell$ orte-ps
...
(gdb) bt
#0  0x2bcee155 in raise () from /lib64/libc.so.6
#1  0x2bcefbf0 in abort () from /lib64/libc.so.6
#2  0x2bce75d6 in __assert_fail () from /lib64/libc.so.6
#3  0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at  
signal.c:295
#4  0x2af73f36 in poll_dealloc (base=0x609f50,  
arg=0x60a9a0) at poll.c:390
#5  0x2af70667 in opal_event_base_free (base=0x609f50) at  
event.c:530

#6  0x2af70519 in opal_event_fini () at event.c:390
#7  0x2af5f624 in opal_finalize () at runtime/ 
opal_finalize.c:117
#8  0x2acd4fc4 in orte_finalize () at runtime/ 
orte_finalize.c:84
#9  0x0040196a in main (argc=1, argv=0x7fffdf38) at  
orte-ps.c:275


Any thoughts on why this is happening for only the tools case?

-- Josh

On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009)
New Revision: 20562
URL: https://svn.open-mpi.org/trac/ompi/changeset/20562

Log:
Release the default base on finalize.

Text files modified:
trunk/opal/event/event.c | 4 
1 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/opal/event/event.c
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
= 
===

--- trunk/opal/event/event.c(original)
+++ trunk/opal/event/event.c	2009-02-14 16:51:09 EST (Sat, 14 Feb  
2009)

@@ -386,6 +386,10 @@
 if (NULL != opal_event_module_include) {
 opal_argv_free(opal_event_module_include);
 }
+if( NULL != opal_current_base ) {
+event_base_free(opal_current_base);
+opal_current_base = NULL;
+}
 return OPAL_SUCCESS;
}

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562

2009-02-16 Thread Josh Hursey

Thanks for the fix.
-- Josh

On Feb 16, 2009, at 2:59 PM, George Bosilca wrote:


Josh,

Spending few minutes to understand, could have pinpointed you to  
the real culprit: the tool itself!


The assert in the code state that on finalize there is still a  
registered signal handler. A quick gdb show that this is for the  
SIG_CHLD. Tracking the signal addition in the tool (breakpoint in  
gdb on opal_event_queue_insert) clearly highlight the place where  
this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274.  
So far so good, we're right of tracking the SIG_CHLD, but we're not  
supposed to leave it there when we're done (as the signal is  
registered with the PERSISTENT option). Leaving ... ah there is a  
function to cleanly unregister them, just by the orte_wait_init,  
with a very clear name: orte_wait_finalize. Wonderful, except that  
in the case of a tool this is never called. Strange isn't it that  
no other components in the ompi tree exhibit such a behavior. Maybe  
grep can help ... There we are:


[bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn  
orte_wait_finalize {} \;

./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize();
./orte/mca/ess/base/ess_base_std_orted.c:310:orte_wait_finalize();
./orte/runtime/orte_wait.c:280:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:872:orte_wait_finalize(void)
./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void)

This clearly show that with the exception of the tools everybody  
else clear their state before leaving. And here we are, a quick  
patch that really fix the problem without removing code that had a  
really good reason to be there.


Index: orte/mca/ess/base/ess_base_std_tool.c
===
--- orte/mca/ess/base/ess_base_std_tool.c   (revision 20564)
+++ orte/mca/ess/base/ess_base_std_tool.c   (working copy)
@@ -158,6 +158,8 @@

 int orte_ess_base_tool_finalize(void)
 {
+orte_wait_finalize();
+
 /* if I am a tool, then all I will have done is
  * a very small subset of orte_init - ensure that
  * I only back those elements out


  george.


On Feb 16, 2009, at 12:57 , Josh Hursey wrote:

This commit seems to have broken the tools. If I use orte-ps then  
on finalize I get an abort() with the following stack:


shell$ orte-ps
...
(gdb) bt
#0  0x2bcee155 in raise () from /lib64/libc.so.6
#1  0x2bcefbf0 in abort () from /lib64/libc.so.6
#2  0x2bce75d6 in __assert_fail () from /lib64/libc.so.6
#3  0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at  
signal.c:295
#4  0x2af73f36 in poll_dealloc (base=0x609f50,  
arg=0x60a9a0) at poll.c:390
#5  0x2af70667 in opal_event_base_free (base=0x609f50) at  
event.c:530

#6  0x2af70519 in opal_event_fini () at event.c:390
#7  0x2af5f624 in opal_finalize () at runtime/ 
opal_finalize.c:117
#8  0x2acd4fc4 in orte_finalize () at runtime/ 
orte_finalize.c:84
#9  0x0040196a in main (argc=1, argv=0x7fffdf38) at  
orte-ps.c:275


Any thoughts on why this is happening for only the tools case?

-- Josh

On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009)
New Revision: 20562
URL: https://svn.open-mpi.org/trac/ompi/changeset/20562

Log:
Release the default base on finalize.

Text files modified:
 trunk/opal/event/event.c | 4 
 1 files changed, 4 insertions(+), 0 deletions(-)

Modified: trunk/opal/event/event.c
 
==

--- trunk/opal/event/event.c(original)
+++ trunk/opal/event/event.c	2009-02-14 16:51:09 EST (Sat, 14 Feb  
2009)

@@ -386,6 +386,10 @@
   if (NULL != opal_event_module_include) {
   opal_argv_free(opal_event_module_include);
   }
+if( NULL != opal_current_base ) {
+event_base_free(opal_current_base);
+opal_current_base = NULL;
+}
   return OPAL_SUCCESS;
}

___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread Jeff Squyres
Unfortunately, this doesn't fully fix the problem -- I'm still getting  
bad frees:


[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:


Author: bosilca
Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
New Revision: 20568
URL: https://svn.open-mpi.org/trac/ompi/changeset/20568

Log:
Make sure we correctly unregister all persistent events
and signal handlers.

Text files modified:
  trunk/orte/orted/orted_main.c  | 8 
  trunk/orte/runtime/orte_wait.c | 4 ++--
  2 files changed, 10 insertions(+), 2 deletions(-)

Modified: trunk/orte/orted/orted_main.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/orte/orted/orted_main.c   (original)
+++ trunk/orte/orted/orted_main.c	2009-02-16 19:20:05 EST (Mon, 16  
Feb 2009)

@@ -754,6 +754,14 @@
exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
}

+/* Release all local signal handlers */
+opal_event_del(&term_handler);
+opal_event_del(&int_handler);
+#ifndef __WINDOWS__
+opal_signal_del(&sigusr1_handler);
+opal_signal_del(&sigusr2_handler);
+#endif  /* __WINDOWS__ */
+
/* Finalize and clean up ourselves */
ret = orte_finalize();
exit(ret);

Modified: trunk/orte/runtime/orte_wait.c
= 
= 
= 
= 
= 
= 
= 
= 
==

--- trunk/orte/runtime/orte_wait.c  (original)
+++ trunk/orte/runtime/orte_wait.c	2009-02-16 19:20:05 EST (Mon, 16  
Feb 2009)

@@ -517,8 +517,8 @@
/* define the event to fire when someone writes to the pipe */
opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);

-   /* Add it to the active events, without a timeout */
-   opal_event_add(*event, NULL);
+/* Add it to the active events, without a timeout */
+opal_event_add(*event, NULL);

/* all done */
return ORTE_SUCCESS;
___
svn-full mailing list
svn-f...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn-full



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread Jeff Squyres

r20569 fixes the problem, but I'm not 100% sure it's the Right Way.

Short version: now that we're guaranteeing to free the event base,  
we're exercising a code path that was never used before.  Apparently  
the orted initializes the ev->timebase min_heap_t structure, but then  
never uses it.  So the pointer to the array of events in the heap is  
still NULL when we get to the destructor.  Previously, the destructor  
just unconditionally freed the array.  I put in a NULL check, which  
avoids the problem.


But it begs the question -- why is that data structure being  
initialized/freed if we're never using it?  Is it something inherent  
in libevent?



On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote:


Unfortunately, this doesn't fully fix the problem -- I'm still getting
bad frees:

[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:

> Author: bosilca
> Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
> New Revision: 20568
> URL: https://svn.open-mpi.org/trac/ompi/changeset/20568
>
> Log:
> Make sure we correctly unregister all persistent events
> and signal handlers.
>
> Text files modified:
>   trunk/orte/orted/orted_main.c  | 8 
>   trunk/orte/runtime/orte_wait.c | 4 ++--
>   2 files changed, 10 insertions(+), 2 deletions(-)
>
> Modified: trunk/orte/orted/orted_main.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
==

> --- trunk/orte/orted/orted_main.c (original)
> +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -754,6 +754,14 @@
> exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
> }
>
> +/* Release all local signal handlers */
> +opal_event_del(&term_handler);
> +opal_event_del(&int_handler);
> +#ifndef __WINDOWS__
> +opal_signal_del(&sigusr1_handler);
> +opal_signal_del(&sigusr2_handler);
> +#endif  /* __WINDOWS__ */
> +
> /* Finalize and clean up ourselves */
> ret = orte_finalize();
> exit(ret);
>
> Modified: trunk/orte/runtime/orte_wait.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
==

> --- trunk/orte/runtime/orte_wait.c(original)
> +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -517,8 +517,8 @@
> /* define the event to fire when someone writes to the pipe */
> opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);
>
> - /* Add it to the active events, without a timeout */
> - opal_event_add(*event, NULL);
> +/* Add it to the active events, without a timeout */
> +opal_event_add(*event, NULL);
>
> /* all done */
> return ORTE_SUCCESS;
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568

2009-02-16 Thread George Bosilca
Based on several man pages, free is capable of handling a NULL  
argument. What is really puzzling is that on your system it doesn't ...


I tried on two system a 64 bits Debian and on my MAC OS X with all  
memory allocator options on, and I'm unable to get such a warning :(


  george.

On Feb 16, 2009, at 20:13 , Jeff Squyres wrote:


r20569 fixes the problem, but I'm not 100% sure it's the Right Way.

Short version: now that we're guaranteeing to free the event base,  
we're exercising a code path that was never used before.  Apparently  
the orted initializes the ev->timebase min_heap_t structure, but  
then never uses it.  So the pointer to the array of events in the  
heap is still NULL when we get to the destructor.  Previously, the  
destructor just unconditionally freed the array.  I put in a NULL  
check, which avoids the problem.


But it begs the question -- why is that data structure being  
initialized/freed if we're never using it?  Is it something inherent  
in libevent?



On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote:

Unfortunately, this doesn't fully fix the problem -- I'm still  
getting

bad frees:

[16:47] svbu-mpi:~/mpi % ./hello
stdout: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi.cisco.com)
malloc debug: Invalid free (min_heap.h, 58)

[16:48] svbu-mpi:~/mpi % mpirun -np 1 hello
[svbu-mpi001:27549] ** Parsing receive_queues
stdout: Hello, world!  I am 0 of 1 (svbu-mpi001)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi001)
malloc debug: Invalid free (min_heap.h, 58)


On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote:

> Author: bosilca
> Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009)
> New Revision: 20568
> URL: https://svn.open-mpi.org/trac/ompi/changeset/20568
>
> Log:
> Make sure we correctly unregister all persistent events
> and signal handlers.
>
> Text files modified:
>   trunk/orte/orted/orted_main.c  | 8 
>   trunk/orte/runtime/orte_wait.c | 4 ++--
>   2 files changed, 10 insertions(+), 2 deletions(-)
>
> Modified: trunk/orte/orted/orted_main.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
= 
=

> --- trunk/orte/orted/orted_main.c (original)
> +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -754,6 +754,14 @@
> exit(ORTE_ERROR_DEFAULT_EXIT_CODE);
> }
>
> +/* Release all local signal handlers */
> +opal_event_del(&term_handler);
> +opal_event_del(&int_handler);
> +#ifndef __WINDOWS__
> +opal_signal_del(&sigusr1_handler);
> +opal_signal_del(&sigusr2_handler);
> +#endif  /* __WINDOWS__ */
> +
> /* Finalize and clean up ourselves */
> ret = orte_finalize();
> exit(ret);
>
> Modified: trunk/orte/runtime/orte_wait.c
> =
> =
> =
> =
> =
> =
> =
> =
>  
= 
=

> --- trunk/orte/runtime/orte_wait.c(original)
> +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST  
(Mon, 16

> Feb 2009)
> @@ -517,8 +517,8 @@
> /* define the event to fire when someone writes to the pipe */
> opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL);
>
> - /* Add it to the active events, without a timeout */
> - opal_event_add(*event, NULL);
> +/* Add it to the active events, without a timeout */
> +opal_event_add(*event, NULL);
>
> /* all done */
> return ORTE_SUCCESS;
> ___
> svn-full mailing list
> svn-f...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full


--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel