Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Jeff Squyres (jsquyres)
Great!

Would any particular time be good for you?  Josh won't be there in person, but 
will be in the same timezone as the meeting (US Central).  I'm assuming you'd 
want a start time like 9am, or 10am US Central at the latest.


On Dec 6, 2013, at 4:12 PM, Adrian Reber  wrote:

> I saw that the meeting would be available via webex and was planning to
> join. So yes, I will be there and it would be great to hear what
> is discussed about the FT design and what needs to be changed.
> 
>   Adrian
> 
> On Fri, Dec 06, 2013 at 02:36:09PM +, Jeff Squyres (jsquyres) wrote:
>> Good points.
>> 
>> You know, this checkpoint stuff is all on the agenda to discuss next week at 
>> the OMPI dev meeting in Chicago.  Ralph correctly points out that since the 
>> fundamental design of ORTE has changed (which caused all this FT bit rot), a 
>> bunch of the original FT design isn't right (or necessary) any more, anyway. 
>>  We need to talk through this stuff to figure out where to go.
>> 
>> Adrian: do you want to join us at the meeting via webex?  I think you're in 
>> Germany; we can do this part of the OMPI dev meeting first thing Friday 
>> morning US Central time, which would put it mid/late-afternoon for you.  It 
>> would probably be good for us to be introduced to you, and for you to hear 
>> all the discussion about how we think the FT design will need to be changed, 
>> etc.
>> 
>>https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting
>> 
>> 
>> 
>> On Dec 6, 2013, at 9:30 AM, Josh Hursey  wrote:
>> 
>>> Since the blocking semantics are important for correctness of the prior 
>>> code, I would not just replace send_buffer with send_buffer_nb. This makes 
>>> the semantics incorrect, and will make things confusing later when you try 
>>> to sort out prior calls to send_buffer_nb with those that you replaced.
>>> 
>>> As an alternative I would suggest that you "#ifdef 0" out those sections of 
>>> code and leave the send_buffer_nb alternative in a comment. Then leave a 
>>> big TODO comment there for you to go back and fix the semantics - which 
>>> will likely involve just rewriting large sections of that framework. But at 
>>> least you will be able to see what was there before when you try to move it 
>>> to a more nonblocking model.
>>> 
>>> The bkmrk component is subtle, maybe more that it should be. So keeping the 
>>> old blocking interfaces there will probably help quite a bit when you get 
>>> to it later. In that component the blocking calls are critical to 
>>> correctness, so we will need to sort out how to make that more asynchronous 
>>> in our redesign.
>>> 
>>> Other than that modification (#ifdef comments instead of nonblocking 
>>> replacements), I think this patch is fine. As was mentioned previously, we 
>>> will need to go back (after things compile) and figure out a new model for 
>>> this behavior.
>>> 
>>> Thanks!
>>> Josh
>>> 
>>> 
>>> 
>>> On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) 
>>>  wrote:
>>> Err... upon further thought, I might be totally wrong about emulating 
>>> blocking.  There might be (probably are?) rules/assumptions in the ORTE 
>>> layer (of which I am *not* an expert) that disallow you from [emulating] 
>>> blocking.
>>> 
>>> If that's the case, then there's architectural issues with converting from 
>>> blocking to nonblocking on both the sending and the receiving sides that 
>>> might be a bit thorny to sort out.
>>> 
>>> 
>>> 
>>> On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)"  
>>> wrote:
>>> 
 On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:
 
> * Send Non-blocking
> */
> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
> struct iovec* msg,
> int count,
> orte_rml_tag_t tag,
> -  int flags,
> orte_rml_callback_fn_t cbfunc,
> void* cbdata)
> {
>   int ret;
> 
>   opal_output_verbose(20, rml_ftrm_output_handle,
> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> -ORTE_NAME_PRINT(peer), count, tag, flags);
> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
> +ORTE_NAME_PRINT(peer), count, tag);
> 
>   if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> -if( ORTE_SUCCESS != (ret = 
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, 
> cbfunc, cbdata) ) ) {
> -return ret;
> -}
> -}
> -
> -return ORTE_SUCCESS;
> -}
> -
> -/*
> - * Send Buffer
> - */
> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> -  opal_buffer_t* buffer,
> -  orte_rml_tag_t 

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Adrian Reber
I saw that the meeting would be available via webex and was planning to
join. So yes, I will be there and it would be great to hear what
is discussed about the FT design and what needs to be changed.

Adrian

On Fri, Dec 06, 2013 at 02:36:09PM +, Jeff Squyres (jsquyres) wrote:
> Good points.
> 
> You know, this checkpoint stuff is all on the agenda to discuss next week at 
> the OMPI dev meeting in Chicago.  Ralph correctly points out that since the 
> fundamental design of ORTE has changed (which caused all this FT bit rot), a 
> bunch of the original FT design isn't right (or necessary) any more, anyway.  
> We need to talk through this stuff to figure out where to go.
> 
> Adrian: do you want to join us at the meeting via webex?  I think you're in 
> Germany; we can do this part of the OMPI dev meeting first thing Friday 
> morning US Central time, which would put it mid/late-afternoon for you.  It 
> would probably be good for us to be introduced to you, and for you to hear 
> all the discussion about how we think the FT design will need to be changed, 
> etc.
> 
> https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting
> 
> 
> 
> On Dec 6, 2013, at 9:30 AM, Josh Hursey  wrote:
> 
> > Since the blocking semantics are important for correctness of the prior 
> > code, I would not just replace send_buffer with send_buffer_nb. This makes 
> > the semantics incorrect, and will make things confusing later when you try 
> > to sort out prior calls to send_buffer_nb with those that you replaced.
> > 
> > As an alternative I would suggest that you "#ifdef 0" out those sections of 
> > code and leave the send_buffer_nb alternative in a comment. Then leave a 
> > big TODO comment there for you to go back and fix the semantics - which 
> > will likely involve just rewriting large sections of that framework. But at 
> > least you will be able to see what was there before when you try to move it 
> > to a more nonblocking model.
> > 
> > The bkmrk component is subtle, maybe more that it should be. So keeping the 
> > old blocking interfaces there will probably help quite a bit when you get 
> > to it later. In that component the blocking calls are critical to 
> > correctness, so we will need to sort out how to make that more asynchronous 
> > in our redesign.
> > 
> > Other than that modification (#ifdef comments instead of nonblocking 
> > replacements), I think this patch is fine. As was mentioned previously, we 
> > will need to go back (after things compile) and figure out a new model for 
> > this behavior.
> > 
> > Thanks!
> > Josh
> > 
> > 
> > 
> > On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) 
> >  wrote:
> > Err... upon further thought, I might be totally wrong about emulating 
> > blocking.  There might be (probably are?) rules/assumptions in the ORTE 
> > layer (of which I am *not* an expert) that disallow you from [emulating] 
> > blocking.
> > 
> > If that's the case, then there's architectural issues with converting from 
> > blocking to nonblocking on both the sending and the receiving sides that 
> > might be a bit thorny to sort out.
> > 
> > 
> > 
> > On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)"  
> > wrote:
> > 
> > > On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:
> > >
> > >> * Send Non-blocking
> > >> */
> > >> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
> > >>  struct iovec* msg,
> > >>  int count,
> > >>  orte_rml_tag_t tag,
> > >> -  int flags,
> > >>  orte_rml_callback_fn_t cbfunc,
> > >>  void* cbdata)
> > >> {
> > >>int ret;
> > >>
> > >>opal_output_verbose(20, rml_ftrm_output_handle,
> > >> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> > >> -ORTE_NAME_PRINT(peer), count, tag, flags);
> > >> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
> > >> +ORTE_NAME_PRINT(peer), count, tag);
> > >>
> > >>if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> > >> -if( ORTE_SUCCESS != (ret = 
> > >> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, 
> > >> cbfunc, cbdata) ) ) {
> > >> -return ret;
> > >> -}
> > >> -}
> > >> -
> > >> -return ORTE_SUCCESS;
> > >> -}
> > >> -
> > >> -/*
> > >> - * Send Buffer
> > >> - */
> > >> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> > >> -  opal_buffer_t* buffer,
> > >> -  orte_rml_tag_t tag,
> > >> -  int flags)
> > >> -{
> > >> -int ret;
> > >> -
> > >> -opal_output_verbose(20, rml_ftrm_output_handle,
> > >> -"orte_rml_ftrm: send_buffer(%s, %d, %d )",
> > >> -ORTE_NAME_PRINT(peer), 

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Josh Hursey
That's a great idea Jeff. I did not know it had made it on the agenda for
that meeting. I would like to attend, and my Friday morning is pretty open
at the moment. With timezones, would a doodle poll be useful here or do you
think we can sort it out via email?

Thanks.

Josh


On Fri, Dec 6, 2013 at 8:36 AM, Jeff Squyres (jsquyres)
wrote:

> Good points.
>
> You know, this checkpoint stuff is all on the agenda to discuss next week
> at the OMPI dev meeting in Chicago.  Ralph correctly points out that since
> the fundamental design of ORTE has changed (which caused all this FT bit
> rot), a bunch of the original FT design isn't right (or necessary) any
> more, anyway.  We need to talk through this stuff to figure out where to go.
>
> Adrian: do you want to join us at the meeting via webex?  I think you're
> in Germany; we can do this part of the OMPI dev meeting first thing Friday
> morning US Central time, which would put it mid/late-afternoon for you.  It
> would probably be good for us to be introduced to you, and for you to hear
> all the discussion about how we think the FT design will need to be
> changed, etc.
>
> https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting
>
>
>
> On Dec 6, 2013, at 9:30 AM, Josh Hursey  wrote:
>
> > Since the blocking semantics are important for correctness of the prior
> code, I would not just replace send_buffer with send_buffer_nb. This makes
> the semantics incorrect, and will make things confusing later when you try
> to sort out prior calls to send_buffer_nb with those that you replaced.
> >
> > As an alternative I would suggest that you "#ifdef 0" out those sections
> of code and leave the send_buffer_nb alternative in a comment. Then leave a
> big TODO comment there for you to go back and fix the semantics - which
> will likely involve just rewriting large sections of that framework. But at
> least you will be able to see what was there before when you try to move it
> to a more nonblocking model.
> >
> > The bkmrk component is subtle, maybe more that it should be. So keeping
> the old blocking interfaces there will probably help quite a bit when you
> get to it later. In that component the blocking calls are critical to
> correctness, so we will need to sort out how to make that more asynchronous
> in our redesign.
> >
> > Other than that modification (#ifdef comments instead of nonblocking
> replacements), I think this patch is fine. As was mentioned previously, we
> will need to go back (after things compile) and figure out a new model for
> this behavior.
> >
> > Thanks!
> > Josh
> >
> >
> >
> > On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
> > Err... upon further thought, I might be totally wrong about emulating
> blocking.  There might be (probably are?) rules/assumptions in the ORTE
> layer (of which I am *not* an expert) that disallow you from [emulating]
> blocking.
> >
> > If that's the case, then there's architectural issues with converting
> from blocking to nonblocking on both the sending and the receiving sides
> that might be a bit thorny to sort out.
> >
> >
> >
> > On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)" <
> jsquy...@cisco.com> wrote:
> >
> > > On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:
> > >
> > >> * Send Non-blocking
> > >> */
> > >> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
> > >>  struct iovec* msg,
> > >>  int count,
> > >>  orte_rml_tag_t tag,
> > >> -  int flags,
> > >>  orte_rml_callback_fn_t cbfunc,
> > >>  void* cbdata)
> > >> {
> > >>int ret;
> > >>
> > >>opal_output_verbose(20, rml_ftrm_output_handle,
> > >> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> > >> -ORTE_NAME_PRINT(peer), count, tag, flags);
> > >> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
> > >> +ORTE_NAME_PRINT(peer), count, tag);
> > >>
> > >>if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> > >> -if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc,
> cbdata) ) ) {
> > >> -return ret;
> > >> -}
> > >> -}
> > >> -
> > >> -return ORTE_SUCCESS;
> > >> -}
> > >> -
> > >> -/*
> > >> - * Send Buffer
> > >> - */
> > >> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> > >> -  opal_buffer_t* buffer,
> > >> -  orte_rml_tag_t tag,
> > >> -  int flags)
> > >> -{
> > >> -int ret;
> > >> -
> > >> -opal_output_verbose(20, rml_ftrm_output_handle,
> > >> -"orte_rml_ftrm: send_buffer(%s, %d, %d )",
> > >> -ORTE_NAME_PRINT(peer), tag, flags);
> > >> -
> > >> -if( NULL 

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Jeff Squyres (jsquyres)
Good points.

You know, this checkpoint stuff is all on the agenda to discuss next week at 
the OMPI dev meeting in Chicago.  Ralph correctly points out that since the 
fundamental design of ORTE has changed (which caused all this FT bit rot), a 
bunch of the original FT design isn't right (or necessary) any more, anyway.  
We need to talk through this stuff to figure out where to go.

Adrian: do you want to join us at the meeting via webex?  I think you're in 
Germany; we can do this part of the OMPI dev meeting first thing Friday morning 
US Central time, which would put it mid/late-afternoon for you.  It would 
probably be good for us to be introduced to you, and for you to hear all the 
discussion about how we think the FT design will need to be changed, etc.

https://svn.open-mpi.org/trac/ompi/wiki/Dec13Meeting



On Dec 6, 2013, at 9:30 AM, Josh Hursey  wrote:

> Since the blocking semantics are important for correctness of the prior code, 
> I would not just replace send_buffer with send_buffer_nb. This makes the 
> semantics incorrect, and will make things confusing later when you try to 
> sort out prior calls to send_buffer_nb with those that you replaced.
> 
> As an alternative I would suggest that you "#ifdef 0" out those sections of 
> code and leave the send_buffer_nb alternative in a comment. Then leave a big 
> TODO comment there for you to go back and fix the semantics - which will 
> likely involve just rewriting large sections of that framework. But at least 
> you will be able to see what was there before when you try to move it to a 
> more nonblocking model.
> 
> The bkmrk component is subtle, maybe more that it should be. So keeping the 
> old blocking interfaces there will probably help quite a bit when you get to 
> it later. In that component the blocking calls are critical to correctness, 
> so we will need to sort out how to make that more asynchronous in our 
> redesign.
> 
> Other than that modification (#ifdef comments instead of nonblocking 
> replacements), I think this patch is fine. As was mentioned previously, we 
> will need to go back (after things compile) and figure out a new model for 
> this behavior.
> 
> Thanks!
> Josh
> 
> 
> 
> On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres)  
> wrote:
> Err... upon further thought, I might be totally wrong about emulating 
> blocking.  There might be (probably are?) rules/assumptions in the ORTE layer 
> (of which I am *not* an expert) that disallow you from [emulating] blocking.
> 
> If that's the case, then there's architectural issues with converting from 
> blocking to nonblocking on both the sending and the receiving sides that 
> might be a bit thorny to sort out.
> 
> 
> 
> On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)"  
> wrote:
> 
> > On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:
> >
> >> * Send Non-blocking
> >> */
> >> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
> >>  struct iovec* msg,
> >>  int count,
> >>  orte_rml_tag_t tag,
> >> -  int flags,
> >>  orte_rml_callback_fn_t cbfunc,
> >>  void* cbdata)
> >> {
> >>int ret;
> >>
> >>opal_output_verbose(20, rml_ftrm_output_handle,
> >> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> >> -ORTE_NAME_PRINT(peer), count, tag, flags);
> >> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
> >> +ORTE_NAME_PRINT(peer), count, tag);
> >>
> >>if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> >> -if( ORTE_SUCCESS != (ret = 
> >> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc, 
> >> cbdata) ) ) {
> >> -return ret;
> >> -}
> >> -}
> >> -
> >> -return ORTE_SUCCESS;
> >> -}
> >> -
> >> -/*
> >> - * Send Buffer
> >> - */
> >> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> >> -  opal_buffer_t* buffer,
> >> -  orte_rml_tag_t tag,
> >> -  int flags)
> >> -{
> >> -int ret;
> >> -
> >> -opal_output_verbose(20, rml_ftrm_output_handle,
> >> -"orte_rml_ftrm: send_buffer(%s, %d, %d )",
> >> -ORTE_NAME_PRINT(peer), tag, flags);
> >> -
> >> -if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) {
> >> -if( ORTE_SUCCESS != (ret = 
> >> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) {
> >> +if( ORTE_SUCCESS != (ret = 
> >> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, 
> >> cbdata) ) ) {
> >>return ret;
> >>}
> >>}
> >
> > Similar to my reply about patch 3, I don't think this hunk is correct.
> >
> > This routine accepts an iovec and sends it 

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-06 Thread Josh Hursey
Since the blocking semantics are important for correctness of the prior
code, I would not just replace send_buffer with send_buffer_nb. This makes
the semantics incorrect, and will make things confusing later when you try
to sort out prior calls to send_buffer_nb with those that you replaced.

As an alternative I would suggest that you "#ifdef 0" out those sections of
code and leave the send_buffer_nb alternative in a comment. Then leave a
big TODO comment there for you to go back and fix the semantics - which
will likely involve just rewriting large sections of that framework. But at
least you will be able to see what was there before when you try to move it
to a more nonblocking model.

The bkmrk component is subtle, maybe more that it should be. So keeping the
old blocking interfaces there will probably help quite a bit when you get
to it later. In that component the blocking calls are critical to
correctness, so we will need to sort out how to make that more asynchronous
in our redesign.

Other than that modification (#ifdef comments instead of nonblocking
replacements), I think this patch is fine. As was mentioned previously, we
will need to go back (after things compile) and figure out a new model for
this behavior.

Thanks!
Josh



On Wed, Dec 4, 2013 at 9:58 AM, Jeff Squyres (jsquyres)
wrote:

> Err... upon further thought, I might be totally wrong about emulating
> blocking.  There might be (probably are?) rules/assumptions in the ORTE
> layer (of which I am *not* an expert) that disallow you from [emulating]
> blocking.
>
> If that's the case, then there's architectural issues with converting from
> blocking to nonblocking on both the sending and the receiving sides that
> might be a bit thorny to sort out.
>
>
>
> On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)" 
> wrote:
>
> > On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:
> >
> >> * Send Non-blocking
> >> */
> >> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
> >>  struct iovec* msg,
> >>  int count,
> >>  orte_rml_tag_t tag,
> >> -  int flags,
> >>  orte_rml_callback_fn_t cbfunc,
> >>  void* cbdata)
> >> {
> >>int ret;
> >>
> >>opal_output_verbose(20, rml_ftrm_output_handle,
> >> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> >> -ORTE_NAME_PRINT(peer), count, tag, flags);
> >> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
> >> +ORTE_NAME_PRINT(peer), count, tag);
> >>
> >>if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> >> -if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc,
> cbdata) ) ) {
> >> -return ret;
> >> -}
> >> -}
> >> -
> >> -return ORTE_SUCCESS;
> >> -}
> >> -
> >> -/*
> >> - * Send Buffer
> >> - */
> >> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> >> -  opal_buffer_t* buffer,
> >> -  orte_rml_tag_t tag,
> >> -  int flags)
> >> -{
> >> -int ret;
> >> -
> >> -opal_output_verbose(20, rml_ftrm_output_handle,
> >> -"orte_rml_ftrm: send_buffer(%s, %d, %d )",
> >> -ORTE_NAME_PRINT(peer), tag, flags);
> >> -
> >> -if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) {
> >> -if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) {
> >> +if( ORTE_SUCCESS != (ret =
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, cbdata)
> ) ) {
> >>return ret;
> >>}
> >>}
> >
> > Similar to my reply about patch 3, I don't think this hunk is correct.
> >
> > This routine accepts an iovec and sends it in a non-blocking fashion.
>  I'll bet that the caller frees the iovec upon return from the function
> (because it used to be a blocking send, and freeing it immediately was
> acceptable).
> >
> > But now the iovec may well still be in use when this function returns,
> so the caller should *not* free/reuse the iovec until it knows that the
> send has complete.
> >
> > It may be more desirable to keep the blocking send function
> orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under
> the covers, but then not returning until the send callback has actually
> been invoked.
> >
> > Then the blocking semantics expected by the caller may well be
> acceptable/safe.
> >
> > This loses some potential optimizations of asynchronicity, but it may be
> worth it: a) performance in this part of the code isn't too critical, and
> b) blocking semantics are usually simpler and easier to maintain, from the
> caller's perspective.
> >
> > This idea may also apply to what I said in reply to 

Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-04 Thread Jeff Squyres (jsquyres)
Err... upon further thought, I might be totally wrong about emulating blocking. 
 There might be (probably are?) rules/assumptions in the ORTE layer (of which I 
am *not* an expert) that disallow you from [emulating] blocking.

If that's the case, then there's architectural issues with converting from 
blocking to nonblocking on both the sending and the receiving sides that might 
be a bit thorny to sort out.



On Dec 4, 2013, at 10:54 AM, "Jeff Squyres (jsquyres)"  
wrote:

> On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:
> 
>> * Send Non-blocking
>> */
>> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
>>  struct iovec* msg,
>>  int count,
>>  orte_rml_tag_t tag,
>> -  int flags,
>>  orte_rml_callback_fn_t cbfunc,
>>  void* cbdata)
>> {
>>int ret;
>> 
>>opal_output_verbose(20, rml_ftrm_output_handle,
>> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
>> -ORTE_NAME_PRINT(peer), count, tag, flags);
>> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
>> +ORTE_NAME_PRINT(peer), count, tag);
>> 
>>if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
>> -if( ORTE_SUCCESS != (ret = 
>> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc, 
>> cbdata) ) ) {
>> -return ret;
>> -}
>> -}
>> -
>> -return ORTE_SUCCESS;
>> -}
>> -
>> -/*
>> - * Send Buffer
>> - */
>> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
>> -  opal_buffer_t* buffer,
>> -  orte_rml_tag_t tag,
>> -  int flags)
>> -{
>> -int ret;
>> -
>> -opal_output_verbose(20, rml_ftrm_output_handle,
>> -"orte_rml_ftrm: send_buffer(%s, %d, %d )",
>> -ORTE_NAME_PRINT(peer), tag, flags);
>> -
>> -if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) {
>> -if( ORTE_SUCCESS != (ret = 
>> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) {
>> +if( ORTE_SUCCESS != (ret = 
>> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, cbdata) 
>> ) ) {
>>return ret;
>>}
>>}
> 
> Similar to my reply about patch 3, I don't think this hunk is correct.
> 
> This routine accepts an iovec and sends it in a non-blocking fashion.  I'll 
> bet that the caller frees the iovec upon return from the function (because it 
> used to be a blocking send, and freeing it immediately was acceptable).
> 
> But now the iovec may well still be in use when this function returns, so the 
> caller should *not* free/reuse the iovec until it knows that the send has 
> complete.
> 
> It may be more desirable to keep the blocking send function 
> orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under 
> the covers, but then not returning until the send callback has actually been 
> invoked.
> 
> Then the blocking semantics expected by the caller may well be 
> acceptable/safe.
> 
> This loses some potential optimizations of asynchronicity, but it may be 
> worth it: a) performance in this part of the code isn't too critical, and b) 
> blocking semantics are usually simpler and easier to maintain, from the 
> caller's perspective.
> 
> This idea may also apply to what I said in reply to patch 3...?  (i.e., 
> preserve a blocking send by using the _nb variant under the covers, but not 
> returning until the nonblocking variant has actually completed the receive).
> 
> Since this is a fairly large change, I didn't look too closely throughout the 
> rest of this patch.  I assume that there are a few other architectural cases 
> similar to this one.
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] [PATCH 2/4] Trying to get the C/R code to compile again. (send_*_nb)

2013-12-04 Thread Jeff Squyres (jsquyres)
On Nov 25, 2013, at 9:59 AM, Adrian Reber  wrote:

>  * Send Non-blocking
>  */
> int orte_rml_ftrm_send_nb(orte_process_name_t* peer,
>   struct iovec* msg,
>   int count,
>   orte_rml_tag_t tag,
> -  int flags,
>   orte_rml_callback_fn_t cbfunc,
>   void* cbdata)
> {
> int ret;
> 
> opal_output_verbose(20, rml_ftrm_output_handle,
> -"orte_rml_ftrm: send_nb(%s, %d, %d, %d )",
> -ORTE_NAME_PRINT(peer), count, tag, flags);
> +"orte_rml_ftrm: send_nb(%s, %d, %d )",
> +ORTE_NAME_PRINT(peer), count, tag);
> 
> if( NULL != orte_rml_ftrm_wrapped_module.send_nb ) {
> -if( ORTE_SUCCESS != (ret = 
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, flags, cbfunc, 
> cbdata) ) ) {
> -return ret;
> -}
> -}
> -
> -return ORTE_SUCCESS;
> -}
> -
> -/*
> - * Send Buffer
> - */
> -int orte_rml_ftrm_send_buffer(orte_process_name_t* peer,
> -  opal_buffer_t* buffer,
> -  orte_rml_tag_t tag,
> -  int flags)
> -{
> -int ret;
> -
> -opal_output_verbose(20, rml_ftrm_output_handle,
> -"orte_rml_ftrm: send_buffer(%s, %d, %d )",
> -ORTE_NAME_PRINT(peer), tag, flags);
> -
> -if( NULL != orte_rml_ftrm_wrapped_module.send_buffer ) {
> -if( ORTE_SUCCESS != (ret = 
> orte_rml_ftrm_wrapped_module.send_buffer(peer, buffer, tag, flags) ) ) {
> +if( ORTE_SUCCESS != (ret = 
> orte_rml_ftrm_wrapped_module.send_nb(peer, msg, count, tag, cbfunc, cbdata) ) 
> ) {
> return ret;
> }
> }

Similar to my reply about patch 3, I don't think this hunk is correct.

This routine accepts an iovec and sends it in a non-blocking fashion.  I'll bet 
that the caller frees the iovec upon return from the function (because it used 
to be a blocking send, and freeing it immediately was acceptable).

But now the iovec may well still be in use when this function returns, so the 
caller should *not* free/reuse the iovec until it knows that the send has 
complete.

It may be more desirable to keep the blocking send function 
orte_rml_ftrm_send_buffer() and emulate blocking by invoking send_nb under the 
covers, but then not returning until the send callback has actually been 
invoked.

Then the blocking semantics expected by the caller may well be acceptable/safe.

This loses some potential optimizations of asynchronicity, but it may be worth 
it: a) performance in this part of the code isn't too critical, and b) blocking 
semantics are usually simpler and easier to maintain, from the caller's 
perspective.

This idea may also apply to what I said in reply to patch 3...?  (i.e., 
preserve a blocking send by using the _nb variant under the covers, but not 
returning until the nonblocking variant has actually completed the receive).

Since this is a fairly large change, I didn't look too closely throughout the 
rest of this patch.  I assume that there are a few other architectural cases 
similar to this one.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/