Re: [OMPI devel] possible bugs and unexpected values in returned errors classes
On Thu, Feb 12, 2009 at 10:02 PM, Jeff Squyres wrote: > On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote: > >> Below a list of stuff that I've got by running mpi4py testsuite. Never >> reported them before just because some of them are not actually >> errors, but anyway, I want to raise the discussion. >> >> - Likely bugs (regarding my interpretation of the MPI standard) >> >> 1) When passing MPI_REQUEST_NULL, MPI_Request_free() DO NOT fail. >> >> 2) When passing MPI_REQUEST_NULL, MPI_Cancel() DO NOT fail. >> >> 3) When passing MPI_REQUEST_NULL, MPI_Request_get_status() DO NOT fail. > > I agree with all of these; I'm not sure why we allowed MPI_REQUEST_NULL. I > double checked LAM/MPI -- it errors in all of these cases. So OMPI now > does, too. > >> 4) When passing MPI_WIN_NULL, MPI_Win_get_errhandler() and >> MPI_Win_set_errhandler() DO NOT fail. > > I was a little more dubious here; the param checking code was specifically > checking for MPI_WIN_NULL and not classifying it as an error. Digging to > find out why we did that, the best that I can come up with is that it is > *not* an error to call MPI_File_set|get_errhandler on MPI_FILE_NULL (to set > behavior for what happens when FILE_OPEN fails); I'm *guessing* that we > simply copied the _File_ code to the _Win_ code and forgot to remove that > extra check. > > I can't find anything in MPI-2.1 that says it is legal to call set|get > errhandler on MPI_WIN_NULL. I checked LAM as well; LAM errors in this case. > So I made this now be an error in OMPI as well. > > Do you need these in the 1.3 series? Or are you ok waiting for 1.4 > (assuming 1.4 takes significantly less time to release than 1.3 :-) ). > I do not have a strong need to get those fixes in 1.3 series. In mpi4py, I have some compatibility layer in a implementation by implementation (well, actually just MPICH 1/2, Open MPI and LAM) and release by release basis trying to hide those small discrepancies and bugs in the MPI's out there. >> - Unexpected errors classes (at least for me) >> >> 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with >> MPI_ERR_ARG. I would expect MPI_ERR_COMM. > > I don't have a strong feeling on this one; I think you could probably argue > either way. That being said, we haven't paid too close attention to the > error values that we return. Unfortunately, I don't think there's much > standardization between different MPI implementations, unless they share a > common code ancestry. > You are right... However, IMHO, some agreement between Open MPI and MPICH2 would be great, right :) ? In the end, they are the reference/basis for other implementations. >> 2) MPI_Type_free() fails with MPI_ERR_INTERN when passing predefined >> datatypes like MPI_INT or MPI_FLOAT. I would expect MPI_ERR_TYPE. > > Ya, that seems weird. Fixed. > >> - Controversial (I'm even fine with the current behavior) >> >> 1) MPI_Info_get_nthkey(info, n) returns MPI_ERR_INFO_KEY when "n" is >> larger that the number of keys. Perhaps MPI_ERR_ARG would be more >> appropriate? A possible rationale would be that the error is not >> related to the contents of a 'key' string, but an out of range value >> for "n". > > I don't have a particular opinion on this one. > >> That's all. Sorry for being so pedantic :-) and not offering help for >> the patches, but I'm really busy. > > > No worries; this stuff is great. Thanks -- and keep it coming! (we usually > remember to cite people who submit stuff like this; e.g., > https://svn.open-mpi.org/trac/ompi/changeset/20537 and > https://svn.open-mpi.org/trac/ompi/changeset/20538). > Jeff, once again, many thanks for you fast response, and even more thanks for fixing the issues! > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
Re: [OMPI devel] possible bugs and unexpected values in returned errors classes
Just found something new to comment after diving into the actual sources On Thu, Feb 12, 2009 at 10:02 PM, Jeff Squyres wrote: > On Feb 11, 2009, at 8:24 AM, Lisandro Dalcin wrote: >> >> 1) When passing MPI_COMM_NULL, MPI_Comm_get_errhandler() fails with >> MPI_ERR_ARG. I would expect MPI_ERR_COMM. > > I don't have a strong feeling on this one; I think you could probably argue > either way. That being said, we haven't paid too close attention to the > error values that we return. Unfortunately, I don't think there's much > standardization between different MPI implementations, unless they share a > common code ancestry. > After running my testsuite again and next looking at "ompi/mpi/c/comm_set_errhandler.c", I noticed that MPI_Comm_set_errhandler() do return MPI_ERR_COMM when invalid communicators are passed. IMHO, for the sake of consistency, you should fix MPI_Comm_get_errhandler() to behave the same as the setter. Would this rationale be enough? -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
[OMPI devel] RFC: New Open MPI release procedure
What: Proposal for a new release methodology for the Open MPI Project. Why: We have [at least] 2 competing forces in Open MPI: - desire to release new features quickly. Fast is good. - desire to release based on production quality. Slow is good. The competition between these two forces has both created some tension in the Open MPI community as well as created a Very Long release cycle for OMPI v1.3 (yes, it was our specific and deliberate choice to be feature driven -- but it was still very lng). How: Take some ideas from other well-established release paradigms, such as: - Linux kernel "odd/even" version number release methodology - Red Hat/Fedora stable vs. feature releases - Agile development models When: For all releases after the v1.3 series (i.e., this proposal does not include any releases in the v1.3 series). --> Ralph and I will talk through all the details and answer any questions on tomorrow's teleconference (Tue, 17 Feb 2009). = Details: In v1.3, we let a lot of really good features sit in development for a long, long time. Yes, we chose to do this and there were good reasons for doing so, but the fact remains that we had some really good stuff done and stable for long periods of time, but they weren't generally available to users who wanted to use them. Even for users who are willing to be out on the bleeding edge, trunk tarballs are just too scary. Given the two competing forces mentioned above (feature/fast releases + stable/slow releases), it seems like we really want two different -- but overlapping -- release mechanisms. Taking inspiration from other well-established paradigms, Ralph and I propose the following for all releases starting with v1.4.0: - Have two concurrent release series: 1. "Super stable": for production users who care about stability above all else. They're willing to wait long periods of time before updating to a new version of Open MPI. 2. "Feature driven": for users who are willing to take a few chances to get new OMPI features -- but cannot endure the chaos of nightly trunk tarballs. - The general idea is that a feature driven release is developed for a while in an RM-regulated yet somewhat agile development style. When specific criteria are met (i.e., feature complete, schedule driven, etc.), the feature release series is morphed into a super stable state and released. At this point, all development stops on that release series; only bug fixes are allowed. - RM's therefore become responsible for *two* release series: a feature driven series and the corresponding super stable series that emerges from it. ***KEY POINT*** This "two release" methodology allows for the release (and real-world testing) of new features in a much more timely fashion than our current release methodology. Here's a crude ASCII art representation of how branches will work using this proposal in SVN: v1.3 series/super stable v1.3.0 v1.3.2 v1.6.0 /|---|---|---| > /-|---|---|-> trunk / v1.3.1 v1.3.1/ > \ v1.4.0 v1.4.2 v1.4.4 ... v1.5.0 v1.5.1 \--|---|---|---|---|---|---|---|---|-||--> v1.4.1 v1.4.3 ...now becomes v1.4/feature driven v1.5/super stable Here's how a typical release cycle works: - Assume that a "super stable" version exists; a release series that has an odd minor number: v1.3, v1.5, v1.7, ...etc. - For this example, let's assume that the super stable is v1.3. - Only bug fixes go into the "super stable" series. - Plans for the next "super stable" are drawn up (v1.5 in this example), including a list of goals, new features, a timeline, etc. - A new feature release series is created shortly after the first super stable release with a minor version number that is even (e.g., v1.4, v1.6, v1.8, ...etc.). - In this example, the feature release series will be v1.4. - The v1.4 branch is taken to a point with at least some degree of stability and released as v1.4.0. - Development on the SVN trunk continues. - According to a public schedule (probably tied to our teleconference schedule), the RM's will approve the moving of features and bug fixes to the feature release. - Rather than submitting CMRs for the Gatekeeper to move, the "owner" of a particular feature/bug fix will be assigned a specific time to move the item to the feature branch. - For example, George will have from Tues-Fri to move his cool new feature X to the v1.4 branch. - Friday night, new 1.4 tarballs are cut and everyone's MTT tries them out. - Iterate for the next week or so to get the v1.4
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562
This commit seems to have broken the tools. If I use orte-ps then on finalize I get an abort() with the following stack: shell$ orte-ps ... (gdb) bt #0 0x2bcee155 in raise () from /lib64/libc.so.6 #1 0x2bcefbf0 in abort () from /lib64/libc.so.6 #2 0x2bce75d6 in __assert_fail () from /lib64/libc.so.6 #3 0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at signal.c:295 #4 0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0) at poll.c:390 #5 0x2af70667 in opal_event_base_free (base=0x609f50) at event.c:530 #6 0x2af70519 in opal_event_fini () at event.c:390 #7 0x2af5f624 in opal_finalize () at runtime/opal_finalize.c: 117 #8 0x2acd4fc4 in orte_finalize () at runtime/orte_finalize.c:84 #9 0x0040196a in main (argc=1, argv=0x7fffdf38) at orte- ps.c:275 Any thoughts on why this is happening for only the tools case? -- Josh On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) New Revision: 20562 URL: https://svn.open-mpi.org/trac/ompi/changeset/20562 Log: Release the default base on finalize. Text files modified: trunk/opal/event/event.c | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/opal/event/event.c = = = = = = = = == --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) @@ -386,6 +386,10 @@ if (NULL != opal_event_module_include) { opal_argv_free(opal_event_module_include); } +if( NULL != opal_current_base ) { +event_base_free(opal_current_base); +opal_current_base = NULL; +} return OPAL_SUCCESS; } ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn
Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h
The basic mechanics of this is similar to the problem with the portals BTL that I fixed. However, in my case, the problem manifested itself with the Intel test MPI_Send_Fairness_c (and MPI_Isend_Fairness_c) at 60 processes (the limit that MTT imposes on the Intel tests). The original code followed the portals design document for MPI pretty well. When the receiver is overwhelmed, a "reject" entry is used to handle the excess messages. One of the features of this "reject" entry is that the receiver (at the BTL level) never interacts with the actual message. The problem was that the sender did not recognize the return ACK from portals [in mca_btl_portals_component_progress()] as a failure. So, the sender did not resend a message that the receiver was expecting. While I fixed it in the trunk, I had to disable mca_btl_portals_sendi() because there is a potential for this function to be used with a 0-byte portals message payload. For this particular test, https://svn.open-mpi.org/trac/ompi/ticket/1791, we would not have seen a failure because, the root process would not know that it had missed a message and the non-root processes would not have diagnosed a need to resend. As corrected, the root process still is FD&H and the non-root processes will keep re-transmitting until success. Sorry for boring you about portals. In the sm case, the non-root processes continually are appending to FIFOs. Obviously, these blasters can append to the FIFOs much more quickly than the processes can remove entries: S7 --> S0 S6 --> S1 S5 --> S2 S4 --> S3 In the first cycle, everyone is busy. In the second cycle, S7, S6, S5, and S4 are ready for the next reduction, but S3, S2, S1, and S0 still are on the hook, meaning that the latter FIFOs are going to grow at a faster rate: S3 --> S0 S2 --> S1 Now, S3 and S2 are ready for the next reduction, but S0 and S1 still have work left in the current reduction: S1 --> S0 Since S0 (the root process) takes a little time to finish processing the reduction, it is going to be a little behind S1. So we end up with the Following timings: S0: (3+Δ)T S1: 3T S2: 2T S2: 2T S2: 2T S2: 2T -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Friday, February 13, 2009 11:42 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h George Bosilca wrote: > I can't confirm or deny. The only thing I can tell is that the same > test works fine over other BTL, so this tent either to pinpoint a > problem in the sm BTL or in a particular path in the PML (the one > used by the sm BTL). I'll have to dig a little bit more into it, but > I was hoping to do it in the context of the new sm BTL (just to avoid > having to do it twice). Okay. I'll try to get "single queue" put back soon and might look at 1791 along the way. But here is what I wonder. Let's say you have one-way traffic -- either rank A sending rank B messages without ever any traffic in the other direction, or repeated MPI_Reduce operations always with the same root -- and the senders somehow get well ahead of the receiver. Say, A wants to pump 1,000,000 messages over and B is busy doing something else. What should happen? What should the PML and BTL do? The conditions could range from B not being in MPI at all, or B listening to the BTL without yet having the posted receives to match. Should the connection become congested and force the sender to wait -- and if so, is this in the BTL or PML? Or, should B keep on queueing up the unexpected messages? After some basic "single queue" putbacks, I'll try to look at the code and understand what the PML is doing in cases like this. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h
Sorry about the premature send... The basic mechanics of this is similar to the problem with the portals BTL that I fixed. However, in my case, the problem manifested itself with the Intel test MPI_Send_Fairness_c (and MPI_Isend_Fairness_c) at 60 processes (the limit that MTT imposes on the Intel tests). The original code followed the portals design document for MPI pretty well. When the receiver is overwhelmed, a "reject" entry is used to handle the excess messages. One of the features of this "reject" entry is that the receiver (at the BTL level) never interacts with the actual message. The problem was that the sender did not recognize the return ACK from portals [in mca_btl_portals_component_progress()] as a failure. So, the sender did not resend a message that the receiver was expecting. While I fixed it in the trunk, I had to disable mca_btl_portals_sendi() because there is a potential for this function to be used with a 0-byte portals message payload. For this particular test, https://svn.open-mpi.org/trac/ompi/ticket/1791, we would not have seen a failure because, the root process would not know that it had missed a message and the non-root processes would not have diagnosed a need to resend. As corrected, the root process still is FD&H and the non-root processes will keep re-transmitting until success. Sorry for boring you about portals. In the sm case, the non-root processes continually are appending to FIFOs. Obviously, these blasters can append to the FIFOs much more quickly than the processes can remove entries: S7 --> S0 S6 --> S1 S5 --> S2 S4 --> S3 In the first cycle, everyone is busy. In the second cycle, S7, S6, S5, and S4 are ready for the next reduction, but S3, S2, S1, and S0 still are on the hook, meaning that the latter FIFOs are going to grow at a faster rate: S3 --> S0 S2 --> S1 Now, S3 and S2 are ready for the next reduction, but S0 and S1 still have work left in the current reduction: S1 --> S0 Since S0 (the root process) takes a little time to finish processing the reduction, it is going to be a little behind S1. So we end up with the Following timings: S0: (3+Δ)T S1: 3T S2: 2T S3: 2T S4: 1T S5: 1T S6: 1T S7: 1T If sm used a system of ACKs as in portals, we would know when we are overloading the root process. However, since it does not, and the reduction itself is non-blocking, we have the potential to exhaust memory. I guess that the real question is whether the reduction should be blocking or whether we expect the user to protect himself. -- -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Eugene Loh Sent: Friday, February 13, 2009 11:42 AM To: Open MPI Developers Subject: Re: [OMPI devel] RFC: Eliminateompi/class/ompi_[circular_buffer_]fifo.h George Bosilca wrote: > I can't confirm or deny. The only thing I can tell is that the same > test works fine over other BTL, so this tent either to pinpoint a > problem in the sm BTL or in a particular path in the PML (the one > used by the sm BTL). I'll have to dig a little bit more into it, but > I was hoping to do it in the context of the new sm BTL (just to avoid > having to do it twice). Okay. I'll try to get "single queue" put back soon and might look at 1791 along the way. But here is what I wonder. Let's say you have one-way traffic -- either rank A sending rank B messages without ever any traffic in the other direction, or repeated MPI_Reduce operations always with the same root -- and the senders somehow get well ahead of the receiver. Say, A wants to pump 1,000,000 messages over and B is busy doing something else. What should happen? What should the PML and BTL do? The conditions could range from B not being in MPI at all, or B listening to the BTL without yet having the posted receives to match. Should the connection become congested and force the sender to wait -- and if so, is this in the BTL or PML? Or, should B keep on queueing up the unexpected messages? After some basic "single queue" putbacks, I'll try to look at the code and understand what the PML is doing in cases like this. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562
Josh, Spending few minutes to understand, could have pinpointed you to the real culprit: the tool itself! The assert in the code state that on finalize there is still a registered signal handler. A quick gdb show that this is for the SIG_CHLD. Tracking the signal addition in the tool (breakpoint in gdb on opal_event_queue_insert) clearly highlight the place where this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274. So far so good, we're right of tracking the SIG_CHLD, but we're not supposed to leave it there when we're done (as the signal is registered with the PERSISTENT option). Leaving ... ah there is a function to cleanly unregister them, just by the orte_wait_init, with a very clear name: orte_wait_finalize. Wonderful, except that in the case of a tool this is never called. Strange isn't it that no other components in the ompi tree exhibit such a behavior. Maybe grep can help ... There we are: [bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn orte_wait_finalize {} \; ./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_orted.c:310:orte_wait_finalize(); ./orte/runtime/orte_wait.c:280:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:872:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void) This clearly show that with the exception of the tools everybody else clear their state before leaving. And here we are, a quick patch that really fix the problem without removing code that had a really good reason to be there. Index: orte/mca/ess/base/ess_base_std_tool.c === --- orte/mca/ess/base/ess_base_std_tool.c (revision 20564) +++ orte/mca/ess/base/ess_base_std_tool.c (working copy) @@ -158,6 +158,8 @@ int orte_ess_base_tool_finalize(void) { +orte_wait_finalize(); + /* if I am a tool, then all I will have done is * a very small subset of orte_init - ensure that * I only back those elements out george. On Feb 16, 2009, at 12:57 , Josh Hursey wrote: This commit seems to have broken the tools. If I use orte-ps then on finalize I get an abort() with the following stack: shell$ orte-ps ... (gdb) bt #0 0x2bcee155 in raise () from /lib64/libc.so.6 #1 0x2bcefbf0 in abort () from /lib64/libc.so.6 #2 0x2bce75d6 in __assert_fail () from /lib64/libc.so.6 #3 0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at signal.c:295 #4 0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0) at poll.c:390 #5 0x2af70667 in opal_event_base_free (base=0x609f50) at event.c:530 #6 0x2af70519 in opal_event_fini () at event.c:390 #7 0x2af5f624 in opal_finalize () at runtime/ opal_finalize.c:117 #8 0x2acd4fc4 in orte_finalize () at runtime/ orte_finalize.c:84 #9 0x0040196a in main (argc=1, argv=0x7fffdf38) at orte- ps.c:275 Any thoughts on why this is happening for only the tools case? -- Josh On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) New Revision: 20562 URL: https://svn.open-mpi.org/trac/ompi/changeset/20562 Log: Release the default base on finalize. Text files modified: trunk/opal/event/event.c | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/opal/event/event.c = = = = = = = = = = --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) @@ -386,6 +386,10 @@ if (NULL != opal_event_module_include) { opal_argv_free(opal_event_module_include); } +if( NULL != opal_current_base ) { +event_base_free(opal_current_base); +opal_current_base = NULL; +} return OPAL_SUCCESS; } ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562
George -- Will you commit? On Feb 16, 2009, at 2:59 PM, George Bosilca wrote: Josh, Spending few minutes to understand, could have pinpointed you to the real culprit: the tool itself! The assert in the code state that on finalize there is still a registered signal handler. A quick gdb show that this is for the SIG_CHLD. Tracking the signal addition in the tool (breakpoint in gdb on opal_event_queue_insert) clearly highlight the place where this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274. So far so good, we're right of tracking the SIG_CHLD, but we're not supposed to leave it there when we're done (as the signal is registered with the PERSISTENT option). Leaving ... ah there is a function to cleanly unregister them, just by the orte_wait_init, with a very clear name: orte_wait_finalize. Wonderful, except that in the case of a tool this is never called. Strange isn't it that no other components in the ompi tree exhibit such a behavior. Maybe grep can help ... There we are: [bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn orte_wait_finalize {} \; ./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_orted.c:310:orte_wait_finalize(); ./orte/runtime/orte_wait.c:280:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:872:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void) This clearly show that with the exception of the tools everybody else clear their state before leaving. And here we are, a quick patch that really fix the problem without removing code that had a really good reason to be there. Index: orte/mca/ess/base/ess_base_std_tool.c === --- orte/mca/ess/base/ess_base_std_tool.c (revision 20564) +++ orte/mca/ess/base/ess_base_std_tool.c (working copy) @@ -158,6 +158,8 @@ int orte_ess_base_tool_finalize(void) { +orte_wait_finalize(); + /* if I am a tool, then all I will have done is * a very small subset of orte_init - ensure that * I only back those elements out george. On Feb 16, 2009, at 12:57 , Josh Hursey wrote: This commit seems to have broken the tools. If I use orte-ps then on finalize I get an abort() with the following stack: shell$ orte-ps ... (gdb) bt #0 0x2bcee155 in raise () from /lib64/libc.so.6 #1 0x2bcefbf0 in abort () from /lib64/libc.so.6 #2 0x2bce75d6 in __assert_fail () from /lib64/libc.so.6 #3 0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at signal.c:295 #4 0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0) at poll.c:390 #5 0x2af70667 in opal_event_base_free (base=0x609f50) at event.c:530 #6 0x2af70519 in opal_event_fini () at event.c:390 #7 0x2af5f624 in opal_finalize () at runtime/ opal_finalize.c:117 #8 0x2acd4fc4 in orte_finalize () at runtime/ orte_finalize.c:84 #9 0x0040196a in main (argc=1, argv=0x7fffdf38) at orte-ps.c:275 Any thoughts on why this is happening for only the tools case? -- Josh On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) New Revision: 20562 URL: https://svn.open-mpi.org/trac/ompi/changeset/20562 Log: Release the default base on finalize. Text files modified: trunk/opal/event/event.c | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/opal/event/event.c = = = = = = = = = = --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) @@ -386,6 +386,10 @@ if (NULL != opal_event_module_include) { opal_argv_free(opal_event_module_include); } +if( NULL != opal_current_base ) { +event_base_free(opal_current_base); +opal_current_base = NULL; +} return OPAL_SUCCESS; } ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562
Never mind -- you just did. Thanks! :-) On Feb 16, 2009, at 3:07 PM, Jeff Squyres wrote: George -- Will you commit? On Feb 16, 2009, at 2:59 PM, George Bosilca wrote: Josh, Spending few minutes to understand, could have pinpointed you to the real culprit: the tool itself! The assert in the code state that on finalize there is still a registered signal handler. A quick gdb show that this is for the SIG_CHLD. Tracking the signal addition in the tool (breakpoint in gdb on opal_event_queue_insert) clearly highlight the place where this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274. So far so good, we're right of tracking the SIG_CHLD, but we're not supposed to leave it there when we're done (as the signal is registered with the PERSISTENT option). Leaving ... ah there is a function to cleanly unregister them, just by the orte_wait_init, with a very clear name: orte_wait_finalize. Wonderful, except that in the case of a tool this is never called. Strange isn't it that no other components in the ompi tree exhibit such a behavior. Maybe grep can help ... There we are: [bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn orte_wait_finalize {} \; ./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_orted.c:310: orte_wait_finalize(); ./orte/runtime/orte_wait.c:280:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:872:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void) This clearly show that with the exception of the tools everybody else clear their state before leaving. And here we are, a quick patch that really fix the problem without removing code that had a really good reason to be there. Index: orte/mca/ess/base/ess_base_std_tool.c === --- orte/mca/ess/base/ess_base_std_tool.c (revision 20564) +++ orte/mca/ess/base/ess_base_std_tool.c (working copy) @@ -158,6 +158,8 @@ int orte_ess_base_tool_finalize(void) { +orte_wait_finalize(); + /* if I am a tool, then all I will have done is * a very small subset of orte_init - ensure that * I only back those elements out george. On Feb 16, 2009, at 12:57 , Josh Hursey wrote: This commit seems to have broken the tools. If I use orte-ps then on finalize I get an abort() with the following stack: shell$ orte-ps ... (gdb) bt #0 0x2bcee155 in raise () from /lib64/libc.so.6 #1 0x2bcefbf0 in abort () from /lib64/libc.so.6 #2 0x2bce75d6 in __assert_fail () from /lib64/libc.so.6 #3 0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at signal.c:295 #4 0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0) at poll.c:390 #5 0x2af70667 in opal_event_base_free (base=0x609f50) at event.c:530 #6 0x2af70519 in opal_event_fini () at event.c:390 #7 0x2af5f624 in opal_finalize () at runtime/ opal_finalize.c:117 #8 0x2acd4fc4 in orte_finalize () at runtime/ orte_finalize.c:84 #9 0x0040196a in main (argc=1, argv=0x7fffdf38) at orte-ps.c:275 Any thoughts on why this is happening for only the tools case? -- Josh On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) New Revision: 20562 URL: https://svn.open-mpi.org/trac/ompi/changeset/20562 Log: Release the default base on finalize. Text files modified: trunk/opal/event/event.c | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/opal/event/event.c = = = = = = = = = = = === --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) @@ -386,6 +386,10 @@ if (NULL != opal_event_module_include) { opal_argv_free(opal_event_module_include); } +if( NULL != opal_current_base ) { +event_base_free(opal_current_base); +opal_current_base = NULL; +} return OPAL_SUCCESS; } ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn] svn:open-mpi r20562
Thanks for the fix. -- Josh On Feb 16, 2009, at 2:59 PM, George Bosilca wrote: Josh, Spending few minutes to understand, could have pinpointed you to the real culprit: the tool itself! The assert in the code state that on finalize there is still a registered signal handler. A quick gdb show that this is for the SIG_CHLD. Tracking the signal addition in the tool (breakpoint in gdb on opal_event_queue_insert) clearly highlight the place where this happens, i.e. orte_wait_init in orte/runtime/orte_wait.c:274. So far so good, we're right of tracking the SIG_CHLD, but we're not supposed to leave it there when we're done (as the signal is registered with the PERSISTENT option). Leaving ... ah there is a function to cleanly unregister them, just by the orte_wait_init, with a very clear name: orte_wait_finalize. Wonderful, except that in the case of a tool this is never called. Strange isn't it that no other components in the ompi tree exhibit such a behavior. Maybe grep can help ... There we are: [bosilca@dancer ompi]$ find . -name "*.c" -exec grep -Hn orte_wait_finalize {} \; ./orte/mca/ess/hnp/ess_hnp_module.c:486:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_app.c:222:orte_wait_finalize(); ./orte/mca/ess/base/ess_base_std_orted.c:310:orte_wait_finalize(); ./orte/runtime/orte_wait.c:280:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:872:orte_wait_finalize(void) ./orte/runtime/orte_wait.c:1182:orte_wait_finalize(void) This clearly show that with the exception of the tools everybody else clear their state before leaving. And here we are, a quick patch that really fix the problem without removing code that had a really good reason to be there. Index: orte/mca/ess/base/ess_base_std_tool.c === --- orte/mca/ess/base/ess_base_std_tool.c (revision 20564) +++ orte/mca/ess/base/ess_base_std_tool.c (working copy) @@ -158,6 +158,8 @@ int orte_ess_base_tool_finalize(void) { +orte_wait_finalize(); + /* if I am a tool, then all I will have done is * a very small subset of orte_init - ensure that * I only back those elements out george. On Feb 16, 2009, at 12:57 , Josh Hursey wrote: This commit seems to have broken the tools. If I use orte-ps then on finalize I get an abort() with the following stack: shell$ orte-ps ... (gdb) bt #0 0x2bcee155 in raise () from /lib64/libc.so.6 #1 0x2bcefbf0 in abort () from /lib64/libc.so.6 #2 0x2bce75d6 in __assert_fail () from /lib64/libc.so.6 #3 0x2af734e1 in opal_evsignal_dealloc (base=0x609f50) at signal.c:295 #4 0x2af73f36 in poll_dealloc (base=0x609f50, arg=0x60a9a0) at poll.c:390 #5 0x2af70667 in opal_event_base_free (base=0x609f50) at event.c:530 #6 0x2af70519 in opal_event_fini () at event.c:390 #7 0x2af5f624 in opal_finalize () at runtime/ opal_finalize.c:117 #8 0x2acd4fc4 in orte_finalize () at runtime/ orte_finalize.c:84 #9 0x0040196a in main (argc=1, argv=0x7fffdf38) at orte-ps.c:275 Any thoughts on why this is happening for only the tools case? -- Josh On Feb 14, 2009, at 4:51 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) New Revision: 20562 URL: https://svn.open-mpi.org/trac/ompi/changeset/20562 Log: Release the default base on finalize. Text files modified: trunk/opal/event/event.c | 4 1 files changed, 4 insertions(+), 0 deletions(-) Modified: trunk/opal/event/event.c == --- trunk/opal/event/event.c(original) +++ trunk/opal/event/event.c 2009-02-14 16:51:09 EST (Sat, 14 Feb 2009) @@ -386,6 +386,10 @@ if (NULL != opal_event_module_include) { opal_argv_free(opal_event_module_include); } +if( NULL != opal_current_base ) { +event_base_free(opal_current_base); +opal_current_base = NULL; +} return OPAL_SUCCESS; } ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
Unfortunately, this doesn't fully fix the problem -- I'm still getting bad frees: [16:47] svbu-mpi:~/mpi % ./hello stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) malloc debug: Invalid free (min_heap.h, 58) [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello [svbu-mpi001:27549] ** Parsing receive_queues stdout: Hello, world! I am 0 of 1 (svbu-mpi001) stderr: Hello, world! I am 0 of 1 (svbu-mpi001) malloc debug: Invalid free (min_heap.h, 58) On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote: Author: bosilca Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) New Revision: 20568 URL: https://svn.open-mpi.org/trac/ompi/changeset/20568 Log: Make sure we correctly unregister all persistent events and signal handlers. Text files modified: trunk/orte/orted/orted_main.c | 8 trunk/orte/runtime/orte_wait.c | 4 ++-- 2 files changed, 10 insertions(+), 2 deletions(-) Modified: trunk/orte/orted/orted_main.c = = = = = = = = == --- trunk/orte/orted/orted_main.c (original) +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) @@ -754,6 +754,14 @@ exit(ORTE_ERROR_DEFAULT_EXIT_CODE); } +/* Release all local signal handlers */ +opal_event_del(&term_handler); +opal_event_del(&int_handler); +#ifndef __WINDOWS__ +opal_signal_del(&sigusr1_handler); +opal_signal_del(&sigusr2_handler); +#endif /* __WINDOWS__ */ + /* Finalize and clean up ourselves */ ret = orte_finalize(); exit(ret); Modified: trunk/orte/runtime/orte_wait.c = = = = = = = = == --- trunk/orte/runtime/orte_wait.c (original) +++ trunk/orte/runtime/orte_wait.c 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) @@ -517,8 +517,8 @@ /* define the event to fire when someone writes to the pipe */ opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL); - /* Add it to the active events, without a timeout */ - opal_event_add(*event, NULL); +/* Add it to the active events, without a timeout */ +opal_event_add(*event, NULL); /* all done */ return ORTE_SUCCESS; ___ svn-full mailing list svn-f...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
r20569 fixes the problem, but I'm not 100% sure it's the Right Way. Short version: now that we're guaranteeing to free the event base, we're exercising a code path that was never used before. Apparently the orted initializes the ev->timebase min_heap_t structure, but then never uses it. So the pointer to the array of events in the heap is still NULL when we get to the destructor. Previously, the destructor just unconditionally freed the array. I put in a NULL check, which avoids the problem. But it begs the question -- why is that data structure being initialized/freed if we're never using it? Is it something inherent in libevent? On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote: Unfortunately, this doesn't fully fix the problem -- I'm still getting bad frees: [16:47] svbu-mpi:~/mpi % ./hello stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) malloc debug: Invalid free (min_heap.h, 58) [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello [svbu-mpi001:27549] ** Parsing receive_queues stdout: Hello, world! I am 0 of 1 (svbu-mpi001) stderr: Hello, world! I am 0 of 1 (svbu-mpi001) malloc debug: Invalid free (min_heap.h, 58) On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) > New Revision: 20568 > URL: https://svn.open-mpi.org/trac/ompi/changeset/20568 > > Log: > Make sure we correctly unregister all persistent events > and signal handlers. > > Text files modified: > trunk/orte/orted/orted_main.c | 8 > trunk/orte/runtime/orte_wait.c | 4 ++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > Modified: trunk/orte/orted/orted_main.c > = > = > = > = > = > = > = > = > == > --- trunk/orte/orted/orted_main.c (original) > +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -754,6 +754,14 @@ > exit(ORTE_ERROR_DEFAULT_EXIT_CODE); > } > > +/* Release all local signal handlers */ > +opal_event_del(&term_handler); > +opal_event_del(&int_handler); > +#ifndef __WINDOWS__ > +opal_signal_del(&sigusr1_handler); > +opal_signal_del(&sigusr2_handler); > +#endif /* __WINDOWS__ */ > + > /* Finalize and clean up ourselves */ > ret = orte_finalize(); > exit(ret); > > Modified: trunk/orte/runtime/orte_wait.c > = > = > = > = > = > = > = > = > == > --- trunk/orte/runtime/orte_wait.c(original) > +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -517,8 +517,8 @@ > /* define the event to fire when someone writes to the pipe */ > opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL); > > - /* Add it to the active events, without a timeout */ > - opal_event_add(*event, NULL); > +/* Add it to the active events, without a timeout */ > +opal_event_add(*event, NULL); > > /* all done */ > return ORTE_SUCCESS; > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r20568
Based on several man pages, free is capable of handling a NULL argument. What is really puzzling is that on your system it doesn't ... I tried on two system a 64 bits Debian and on my MAC OS X with all memory allocator options on, and I'm unable to get such a warning :( george. On Feb 16, 2009, at 20:13 , Jeff Squyres wrote: r20569 fixes the problem, but I'm not 100% sure it's the Right Way. Short version: now that we're guaranteeing to free the event base, we're exercising a code path that was never used before. Apparently the orted initializes the ev->timebase min_heap_t structure, but then never uses it. So the pointer to the array of events in the heap is still NULL when we get to the destructor. Previously, the destructor just unconditionally freed the array. I put in a NULL check, which avoids the problem. But it begs the question -- why is that data structure being initialized/freed if we're never using it? Is it something inherent in libevent? On Feb 16, 2009, at 7:49 PM, Jeff Squyres (jsquyres) wrote: Unfortunately, this doesn't fully fix the problem -- I'm still getting bad frees: [16:47] svbu-mpi:~/mpi % ./hello stdout: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) stderr: Hello, world! I am 0 of 1 (svbu-mpi.cisco.com) malloc debug: Invalid free (min_heap.h, 58) [16:48] svbu-mpi:~/mpi % mpirun -np 1 hello [svbu-mpi001:27549] ** Parsing receive_queues stdout: Hello, world! I am 0 of 1 (svbu-mpi001) stderr: Hello, world! I am 0 of 1 (svbu-mpi001) malloc debug: Invalid free (min_heap.h, 58) On Feb 16, 2009, at 7:20 PM, bosi...@osl.iu.edu wrote: > Author: bosilca > Date: 2009-02-16 19:20:05 EST (Mon, 16 Feb 2009) > New Revision: 20568 > URL: https://svn.open-mpi.org/trac/ompi/changeset/20568 > > Log: > Make sure we correctly unregister all persistent events > and signal handlers. > > Text files modified: > trunk/orte/orted/orted_main.c | 8 > trunk/orte/runtime/orte_wait.c | 4 ++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > Modified: trunk/orte/orted/orted_main.c > = > = > = > = > = > = > = > = > = = > --- trunk/orte/orted/orted_main.c (original) > +++ trunk/orte/orted/orted_main.c 2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -754,6 +754,14 @@ > exit(ORTE_ERROR_DEFAULT_EXIT_CODE); > } > > +/* Release all local signal handlers */ > +opal_event_del(&term_handler); > +opal_event_del(&int_handler); > +#ifndef __WINDOWS__ > +opal_signal_del(&sigusr1_handler); > +opal_signal_del(&sigusr2_handler); > +#endif /* __WINDOWS__ */ > + > /* Finalize and clean up ourselves */ > ret = orte_finalize(); > exit(ret); > > Modified: trunk/orte/runtime/orte_wait.c > = > = > = > = > = > = > = > = > = = > --- trunk/orte/runtime/orte_wait.c(original) > +++ trunk/orte/runtime/orte_wait.c2009-02-16 19:20:05 EST (Mon, 16 > Feb 2009) > @@ -517,8 +517,8 @@ > /* define the event to fire when someone writes to the pipe */ > opal_event_set(*event, p[0], OPAL_EV_READ, cbfunc, NULL); > > - /* Add it to the active events, without a timeout */ > - opal_event_add(*event, NULL); > +/* Add it to the active events, without a timeout */ > +opal_event_add(*event, NULL); > > /* all done */ > return ORTE_SUCCESS; > ___ > svn-full mailing list > svn-f...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/svn-full -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel