That's a great idea Jeff. I did not know it had made it on the agenda for
that meeting. I would like to attend, and my Friday morning is pretty open
at the moment. With timezones, would a doodle poll be useful here or do you
think we can sort it out via email?

Thanks.

Josh


On Fri, Dec 6, 2013 at 8:36 AM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com>wrote:

> Good points.
>
> You know, this checkpoint stuff is all on the agenda to discuss next week
> at the OMPI dev meeting in Chicago.  Ralph correctly points out that since
> the fundamental design of ORTE has changed (which caused all this FT bit
> rot), a bunch of the original FT design isn't right (or necessary) any
> more, anyway.  We need to talk through this stuff to figure out where to go.
>
> Adrian: do you want to join us at the meeting via webex?  I think you're
> in Germany; we can do this part of the OMPI dev meeting first thing Friday
> morning US Central time, which would put it mid/late-afternoon for you.  It
> would probably be good for us to be introduced to you, and for you to hear
> all the discussion about how we think the FT design will need to be
> changed, etc.
>
>     https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting
>
>
>
> On Dec 6, 2013, at 9:30 AM, Josh Hursey <jjhur...@open-mpi.org> wrote:
>
> > Since the blocking semantics are important for correctness of the prior
> code, I would not just replace send_buffer with send_buffer_nb. This makes
> the semantics incorrect, and will make things confusing later when you try
> to sort out prior calls to send_buffer_nb with those that you replaced.
> >
> > As an alternative I would suggest that you "#ifdef 0" out those sections
> of code and leave the send_buffer_nb alternative in a comment. Then leave a
> big TODO comment there for you to go back and fix the semantics - which
> will likely involve just rewriting large sections of that framework. But at
> least you will be able to see what was there before when you try to move it
> to a more nonblocking model.
> >
> > The bkmrk component is subtle, maybe more that it should be. So keeping
> the old blocking interfaces there will probably help quite a bit when you
> get to it later. In that component the blocking calls are critical to
> correctness, so we will need to sort out how to make that more asynchronous
> in our redesign.
> >
> > Other than that modification (#ifdef comments instead of nonblocking
> replacements), I think this patch is fine. As was mentioned previously, we
> will need to go back (after things compile) and figure out a new model for
> this behavior.
> >
> > Thanks!
> > Josh
> >
> >
> >
> > On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Err... upon further thought, I might be totally wrong about emulating
> blocking.  There might be (probably are?) rules/assumptions in the ORTE
> layer (of which I am *not* an expert) that disallow you from [emulating]
> blocking.
> >
> > If that's the case, then there's architectural issues with converting
> from blocking to nonblocking on both the sending and the receiving sides
> that might be a bit thorny to sort out.
> >
> >
> >
> > On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)" <
> jsquy...@cisco.com> wrote:
> >
> > > On Nov 25, 2013, at 9:59 AM, Adrian Reber <adr...@lisas.de> wrote:
> > >
> > >> * Send Non-blocking
> > >> */
> > >> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
> > >>                          struct iovec* msg,
> > >>                          int count,
> > >>                          orte_rml_tag_t tag,
> > >> -                          int flags,
> > >>                          orte_rml_callback_fn_t cbfunc,
> > >>                          void* cbdata)
> > >> {
> > >>    int ret;
> > >>
> > >>    opal_output_verbose(20, rml_ftrm_output_handle,
> > >> -                        "orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> > >> -                        ORTE_NAME_PRINT(peer), count, tag, flags);
> > >> +                        "orte_rml_ftrm: send_nb(%s, %d, %d )",
> > >> +                        ORTE_NAME_PRINT(peer), count, tag);
> > >>
> > >>    if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> > >> -        if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc,
> cbdata) ) ) {
> > >> -            return ret;
> > >> -        }
> > >> -    }
> > >> -
> > >> -    return ORTE_SUCCESS;
> > >> -}
> > >> -
> > >> -/*
> > >> - * Send Buffer
> > >> - */
> > >> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> > >> -                              opal_buffer_t* buffer,
> > >> -                              orte_rml_tag_t tag,
> > >> -                              int flags)
> > >> -{
> > >> -    int ret;
> > >> -
> > >> -    opal_output_verbose(20, rml_ftrm_output_handle,
> > >> -                        "orte_rml_ftrm: send_buffer(%s, %d, %d )",
> > >> -                        ORTE_NAME_PRINT(peer), tag, flags);
> > >> -
> > >> -    if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) {
> > >> -        if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) {
> > >> +        if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, cbdata)
> ) ) {
> > >>            return ret;
> > >>        }
> > >>    }
> > >
> > > Similar to my reply about patch 3, I don't think this hunk is correct.
> > >
> > > This routine accepts an iovec and sends it in a non-blocking fashion.
>  I'll bet that the caller frees the iovec upon return from the function
> (because it used to be a blocking send, and freeing it immediately was
> acceptable).
> > >
> > > But now the iovec may well still be in use when this function returns,
> so the caller should *not* free/reuse the iovec until it knows that the
> send has complete.
> > >
> > > It may be more desirable to keep the blocking send function
> orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under
> the covers, but then not returning until the send callback has actually
> been invoked.
> > >
> > > Then the blocking semantics expected by the caller may well be
> acceptable/safe.
> > >
> > > This loses some potential optimizations of asynchronicity, but it may
> be worth it: a) performance in this part of the code isn't too critical,
> and b) blocking semantics are usually simpler and easier to maintain, from
> the caller's perspective.
> > >
> > > This idea may also apply to what I said in reply to patch 3...?
>  (i.e., preserve a blocking send by using the _nb variant under the covers,
> but not returning until the nonblocking variant has actually completed the
> receive).
> > >
> > > Since this is a fairly large change, I didn't look too closely
> throughout the rest of this patch.  I assume that there are a few other
> architectural cases similar to this one.
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > > _______________________________________________
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> >
> > --
> > Joshua Hursey
> > Assistant Professor of Computer Science
> > University of Wisconsin-La Crosse
> > http://cs.uwlax.edu/~jjhursey
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>



-- 
Joshua Hursey
Assistant Professor of Computer Science
University of Wisconsin-La Crosse
http://cs.uwlax.edu/~jjhursey

Reply via email to