Re: [OMPI devel] Barrier/coll_tuned/pml_ob1 segfault for derived data types

2012-06-15 Thread George Bosilca
There should be no datatype attached to the barrier, so it is normal you get 
the zero values in the convertor.

Something weird is definitively going on. As there is no data to be sent, the 
opal_convertor_set_position function is supposed to trigger the special path, 
mark the convertor as completed and return successfully. However, this seems 
not to be the case anymore as in your backtrace I see the call to 
opal_convertor_set_position_nocheck, which only happens if the above described 
test fails.

I had some doubts about r26597, but I don't have time to check into it until 
Monday. Maybe you can remove it and se if you continue to have the same 
segfault.

  george.

On Jun 15, 2012, at 01:24 , Eugene Loh wrote:

> I see a segfault show up in trunk testing starting with r26598 when tests like
> 
>ibm  collective/struct_gatherv
>intel src/MPI_Type_free_[types|pending_msg]_[f|c]
> 
> are run over openib.  Here is a typical stack trace:
> 
>   opal_convertor_create_stack_at_begining(convertor = 0x689730, sizes), line 
> 404 in "opal_convertor.c"
>   opal_convertor_set_position_nocheck(convertor = 0x689730, position), line 
> 423 in "opal_convertor.c"
>   opal_convertor_set_position(convertor = 0x689730, position = 
> 0x7fffc36e0bf0), line 321 in "opal_convertor.h"
>   mca_pml_ob1_send_request_start_copy(sendreq, bml_btl = 0x6a3ea0, size = 0), 
> line 485 in "pml_ob1_sendreq.c"
>   mca_pml_ob1_send_request_start_btl(sendreq, bml_btl), line 387 in 
> "pml_ob1_sendreq.h"
>   mca_pml_ob1_send_request_start(sendreq = 0x689680), line 458 in 
> "pml_ob1_sendreq.h"
>   mca_pml_ob1_isend(buf = (nil), count = 0, datatype, dst = 2, tag = -16, 
> sendmode = MCA_PML_BASE_SEND_STANDARD, comm, request), line 87 in 
> "pml_ob1_isend.c"
>   ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0, sdatatype, 
> dest = 2, stag = -16, recvbuf = (nil), rcount = 0, rdatatype, source = 2, 
> rtag = -16, comm, status = (nil)), line 51 in "coll_tuned_util.c"
>   ompi_coll_tuned_barrier_intra_recursivedoubling(comm, module), line 172 in 
> "coll_tuned_barrier.c"
>   ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line 207 in 
> "coll_tuned_decision_fixed.c"
>   PMPI_Barrier(comm = 0x5195a0), line 62 in "pbarrier.c"
>   main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x403219
> 
> The fact that some derived data types were sent before seems to have 
> something to do with it.  I see this sort of problem cropping up in Cisco and 
> Oracle testing.  Up at the level of pml_ob1_send_request_start_copy, at line 
> 485:
> 
>   MCA_PML_OB1_SEND_REQUEST_RESET(sendreq);
> 
> I see
> 
>*sendreq->req_send.req_base.req_convertor.use_desc = {
>length = 0
>used   = 0
>desc   = (nil)
>}
> 
> and I guess that desc=NULL is causing the segfault at opal_convertor.c line 
> 404.
> 
> Anyhow, I'm trudging along, but thought I would share at least that much with 
> you helpful folks in case any of this is ringing a bell.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Remove non-standard MPI_MAX_NAME_LEN constant

2012-06-15 Thread George Bosilca
Indeed MPI_MAX_PORT_NAME is the right constant. A quick check indicate we're 
using the right one, so feel free to remove this little piece of historic LAM 
heritage.

  george.

On Jun 15, 2012, at 00:00 , Jeff Squyres wrote:

> On Jun 14, 2012, at 3:53 PM, Ralph Castain wrote:
> 
>> I believe we use that constant in several places to define a static array 
>> size - you might check to be safe.
> 
> I can't find it used anywhere in the code base other than mpi.h.in.
> 
> It's a non-standard name (i.e., it's not in the MPI spec).   I believe the 
> standard name is MPI_MAX_PORT_NAME (which is OPAL_MAX_PORT_NAME).
> 
> 
>> On Jun 14, 2012, at 11:52 AM, Jeff Squyres wrote:
>> 
>>> WHAT: Remove non-standard MPI_MAX_NAME_LEN from mpi.h.
>>> 
>>> WHY: It looks like this was a carryover from LAM/MPI, but it's not in any 
>>> MPI spec.
>>> 
>>> WHERE: mpi.h
>>> 
>>> TIMEOUT: This seems non-controversial, so I'll set the timeout to the 
>>> teleconf next Tuesday: June 19, 2012
>>> 
>>> --
>>> 
>>> More details:
>>> 
>>> MPI_MAX_NAME_LEN is in mpi.h, but *not* in mpif.h, nor the C++ bindings.  
>>> It looks like this is some kind of hold over from LAM/MPI, per the comment 
>>> in mpi.h:
>>> 
>>> #define MPI_MAX_NAME_LEN MPI_MAX_PORT_NAME /* max port name 
>>> length, non-std. (LAM < 6.3b1) */
>>> 
>>> This really should be removed to avoid confusion.
>>> 
>>> If there's any discussion needed, I'm happy to push back the timeout -- I'm 
>>> just assuming that there won't need to be any.
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to: 
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] Modex

2012-06-15 Thread Josh Hursey
(I'm catching up on email from an unanticipated absence - forgive the delay)

Pineapple did not hit a roadblock during the call. It is still on
track. I will start a separate thread for the discussion. As I have
said many (many, many) times, if the pineapple interface needs to
change for OMPI/ORTE/OPAL then we will change it. George's problem (as
best I could tell) was not with the interface, but with pineapple
being a separate project in the tree versus being a framework in OMPI.
But that is a discussion we can have on another thread.

-- Jsoh

On Wed, Jun 13, 2012 at 9:07 AM, Ralph Castain  wrote:
> ?
>
> I'm talking about how to implement it, not what level holds the interface. 
> Besides, "pineapple" hit a roadblock during the call and is a totally 
> separate discussion.
>
>
> On Jun 13, 2012, at 7:03 AM, Richard Graham wrote:
>
>> I would suggest exposing modex at the pineapple level, and not tie it to a 
>> particular instance of run-time instantiation.  This decouples the 
>> instantiation from the details of the run-time, and also gives the freedom 
>> to provide different instantiations for different job scenarios.
>>
>> Rich
>>
>> -Original Message-
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
>> Behalf Of Ralph Castain
>> Sent: Wednesday, June 13, 2012 12:10 AM
>> To: Open MPI Developers
>> Subject: [OMPI devel] Modex
>>
>> George raised something during this morning's call that I wanted to 
>> follow-up on relating to improving our modex operation. I've been playing 
>> with an approach that sounded similar to what he suggested, and perhaps we 
>> could pursue it in accordance with moving the BTL's to OPAL.
>>
>> We currently block on exchange of contact information for the BTL's when we 
>> perform an all-to-all operation we term the "modex". At the end of that 
>> operation, each process constructs a list of information for all processes 
>> in the job, and each process contains the complete BTL contact info for 
>> every process in its modex database. This consumes a significant amount of 
>> memory, especially as we scale to ever larger applications. In addition, the 
>> modex operation itself is one of the largest time consumers during MPI_Init.
>>
>> An alternative approach is for the BTL's to "add proc" only on "first 
>> message" to or from that process - i.e., we would not construct a list of 
>> all procs during MPI_Init, but only add an entry for a process with which we 
>> communicate. The method would go like this:
>>
>> 1. during MPI_Init, each BTL posts its contact info to the local modex
>>
>> 2. the "modex" call in MPI_Init simply sends that data to the local daemon, 
>> which asynchronously executes an all-to-all collective with the other 
>> daemons in the job. At the end of that operation, each daemon holds a 
>> complete modex database for the job. Meantime, the application process 
>> continues to run.
>>
>> 3. we remove the "add_procs" call within MPI_Init, and perhaps can eliminate 
>> the ORTE barrier at the end of MPI_Init. The reason we had that barrier was 
>> to ensure that all procs were ready to communicate before we allowed anyone 
>> to send a message. However, with this method, that may no longer be required.
>>
>> 4. we modify the BTL's so they (a) can receive a message from an unknown 
>> source, adding that source to their local proc list, and (b) when sending a 
>> message to another process, obtain the required contact info from their 
>> local daemon if they don't already have it. Thus, we will see an increased 
>> latency on first message - but we will ONLY store info for processes with 
>> which we actually communicate (thus reducing the memory burden) and will 
>> wireup much faster than we do today.
>>
>> I'm not (yet) that familiar with the details of many of the BTLs, but my 
>> initial review of them didn't see any showstoppers for this approach. If 
>> people think this might work and be an interesting approach, I'd be happy to 
>> help implement a prototype to quantify its behavior.
>>
>> Ralph
>>
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey



Re: [OMPI devel] Barrier/coll_tuned/pml_ob1 segfault for derived data types

2012-06-15 Thread Eugene Loh
Backing out r26597 solves my particular test cases.  I'll back it out of 
the trunk as well unless someone has objections.


I like how you say "same segfault."  In certain cases, I just go on to 
different segfaults.  E.g.,


  [2] btl_openib_handle_incoming(openib_btl, ep, frag, byte_len = 20U), 
line 3208 in "btl_openib_component.c"

  [3] handle_wc(device, cq = 0, wc), line 3516 in "btl_openib_component.c"
  [4] poll_device(device, count = 1), line 3654 in "btl_openib_component.c"
  [5] progress_one_device(device), line 3762 in "btl_openib_component.c"
  [6] btl_openib_component_progress(), line 3787 in 
"btl_openib_component.c"

  [7] opal_progress(), line 207 in "opal_progress.c"
  [8] opal_condition_wait(c, m), line 100 in "condition.h"
  [9] ompi_request_default_wait_all(count = 2U, requests, statuses), 
line 281 in "req_wait.c"
  [10] ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0, 
sdatatype, dest = 0, stag = -16, recvbuf = (nil), rcount = 0, rdatatype, 
source = 0, rtag = -16, comm, status = (nil)), line 54 in 
"coll_tuned_util.c"
  [11] ompi_coll_tuned_barrier_intra_recursivedoubling(comm, module), 
line 172 in "coll_tuned_barrier.c"
  [12] ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line 207 
in "coll_tuned_decision_fixed.c"

  [13] PMPI_Barrier(comm = 0x518370), line 62 in "pbarrier.c"

The reg->cbfunc is NULL.  I'm still considering whether that's an 
artifact of how I build that particular case, though.


On 06/15/12 09:44, George Bosilca wrote:

There should be no datatype attached to the barrier, so it is normal you get 
the zero values in the convertor.

Something weird is definitively going on. As there is no data to be sent, the 
opal_convertor_set_position function is supposed to trigger the special path, 
mark the convertor as completed and return successfully. However, this seems 
not to be the case anymore as in your backtrace I see the call to 
opal_convertor_set_position_nocheck, which only happens if the above described 
test fails.

I had some doubts about r26597, but I don't have time to check into it until 
Monday. Maybe you can remove it and se if you continue to have the same 
segfault.

   george.

On Jun 15, 2012, at 01:24 , Eugene Loh wrote:


I see a segfault show up in trunk testing starting with r26598 when tests like

ibm  collective/struct_gatherv
intel src/MPI_Type_free_[types|pending_msg]_[f|c]

are run over openib.  Here is a typical stack trace:

   opal_convertor_create_stack_at_begining(convertor = 0x689730, sizes), line 404 in 
"opal_convertor.c"
   opal_convertor_set_position_nocheck(convertor = 0x689730, position), line 423 in 
"opal_convertor.c"
   opal_convertor_set_position(convertor = 0x689730, position = 0x7fffc36e0bf0), line 321 
in "opal_convertor.h"
   mca_pml_ob1_send_request_start_copy(sendreq, bml_btl = 0x6a3ea0, size = 0), line 485 
in "pml_ob1_sendreq.c"
   mca_pml_ob1_send_request_start_btl(sendreq, bml_btl), line 387 in 
"pml_ob1_sendreq.h"
   mca_pml_ob1_send_request_start(sendreq = 0x689680), line 458 in 
"pml_ob1_sendreq.h"
   mca_pml_ob1_isend(buf = (nil), count = 0, datatype, dst = 2, tag = -16, sendmode = 
MCA_PML_BASE_SEND_STANDARD, comm, request), line 87 in "pml_ob1_isend.c"
   ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0, sdatatype, dest = 2, stag 
= -16, recvbuf = (nil), rcount = 0, rdatatype, source = 2, rtag = -16, comm, status = 
(nil)), line 51 in "coll_tuned_util.c"
   ompi_coll_tuned_barrier_intra_recursivedoubling(comm, module), line 172 in 
"coll_tuned_barrier.c"
   ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line 207 in 
"coll_tuned_decision_fixed.c"
   PMPI_Barrier(comm = 0x5195a0), line 62 in "pbarrier.c"
   main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x403219

The fact that some derived data types were sent before seems to have something 
to do with it.  I see this sort of problem cropping up in Cisco and Oracle 
testing.  Up at the level of pml_ob1_send_request_start_copy, at line 485:

   MCA_PML_OB1_SEND_REQUEST_RESET(sendreq);

I see

*sendreq->req_send.req_base.req_convertor.use_desc = {
length = 0
used   = 0
desc   = (nil)
}

and I guess that desc=NULL is causing the segfault at opal_convertor.c line 404.

Anyhow, I'm trudging along, but thought I would share at least that much with 
you helpful folks in case any of this is ringing a bell.


Re: [OMPI devel] Barrier/coll_tuned/pml_ob1 segfault for derived data types

2012-06-15 Thread Nathan Hjelm
Seems like either a bug in the converter code or in setting up the send 
request. r26597 ensures correctness in the case the btl's sendi does all three 
of the following: returns an error, changes the converter, and returns a 
descriptor.

Until we can find the root cause I pushed a change that protects the reset by 
checking if size > 0.

Let me know if that works for you.

-Nathan

On Fri, Jun 15, 2012 at 12:34:32PM -0400, Eugene Loh wrote:
> Backing out r26597 solves my particular test cases.  I'll back it
> out of the trunk as well unless someone has objections.
> 
> I like how you say "same segfault."  In certain cases, I just go on
> to different segfaults.  E.g.,
> 
>   [2] btl_openib_handle_incoming(openib_btl, ep, frag, byte_len =
> 20U), line 3208 in "btl_openib_component.c"
>   [3] handle_wc(device, cq = 0, wc), line 3516 in "btl_openib_component.c"
>   [4] poll_device(device, count = 1), line 3654 in "btl_openib_component.c"
>   [5] progress_one_device(device), line 3762 in "btl_openib_component.c"
>   [6] btl_openib_component_progress(), line 3787 in
> "btl_openib_component.c"
>   [7] opal_progress(), line 207 in "opal_progress.c"
>   [8] opal_condition_wait(c, m), line 100 in "condition.h"
>   [9] ompi_request_default_wait_all(count = 2U, requests, statuses),
> line 281 in "req_wait.c"
>   [10] ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0,
> sdatatype, dest = 0, stag = -16, recvbuf = (nil), rcount = 0,
> rdatatype, source = 0, rtag = -16, comm, status = (nil)), line 54 in
> "coll_tuned_util.c"
>   [11] ompi_coll_tuned_barrier_intra_recursivedoubling(comm,
> module), line 172 in "coll_tuned_barrier.c"
>   [12] ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line
> 207 in "coll_tuned_decision_fixed.c"
>   [13] PMPI_Barrier(comm = 0x518370), line 62 in "pbarrier.c"
> 
> The reg->cbfunc is NULL.  I'm still considering whether that's an
> artifact of how I build that particular case, though.
> 
> On 06/15/12 09:44, George Bosilca wrote:
> >There should be no datatype attached to the barrier, so it is normal you get 
> >the zero values in the convertor.
> >
> >Something weird is definitively going on. As there is no data to be sent, 
> >the opal_convertor_set_position function is supposed to trigger the special 
> >path, mark the convertor as completed and return successfully. However, this 
> >seems not to be the case anymore as in your backtrace I see the call to 
> >opal_convertor_set_position_nocheck, which only happens if the above 
> >described test fails.
> >
> >I had some doubts about r26597, but I don't have time to check into it until 
> >Monday. Maybe you can remove it and se if you continue to have the same 
> >segfault.
> >
> >   george.
> >
> >On Jun 15, 2012, at 01:24 , Eugene Loh wrote:
> >
> >>I see a segfault show up in trunk testing starting with r26598 when tests 
> >>like
> >>
> >>ibm  collective/struct_gatherv
> >>intel src/MPI_Type_free_[types|pending_msg]_[f|c]
> >>
> >>are run over openib.  Here is a typical stack trace:
> >>
> >>   opal_convertor_create_stack_at_begining(convertor = 0x689730, sizes), 
> >> line 404 in "opal_convertor.c"
> >>   opal_convertor_set_position_nocheck(convertor = 0x689730, position), 
> >> line 423 in "opal_convertor.c"
> >>   opal_convertor_set_position(convertor = 0x689730, position = 
> >> 0x7fffc36e0bf0), line 321 in "opal_convertor.h"
> >>   mca_pml_ob1_send_request_start_copy(sendreq, bml_btl = 0x6a3ea0, size = 
> >> 0), line 485 in "pml_ob1_sendreq.c"
> >>   mca_pml_ob1_send_request_start_btl(sendreq, bml_btl), line 387 in 
> >> "pml_ob1_sendreq.h"
> >>   mca_pml_ob1_send_request_start(sendreq = 0x689680), line 458 in 
> >> "pml_ob1_sendreq.h"
> >>   mca_pml_ob1_isend(buf = (nil), count = 0, datatype, dst = 2, tag = -16, 
> >> sendmode = MCA_PML_BASE_SEND_STANDARD, comm, request), line 87 in 
> >> "pml_ob1_isend.c"
> >>   ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0, sdatatype, 
> >> dest = 2, stag = -16, recvbuf = (nil), rcount = 0, rdatatype, source = 2, 
> >> rtag = -16, comm, status = (nil)), line 51 in "coll_tuned_util.c"
> >>   ompi_coll_tuned_barrier_intra_recursivedoubling(comm, module), line 172 
> >> in "coll_tuned_barrier.c"
> >>   ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line 207 in 
> >> "coll_tuned_decision_fixed.c"
> >>   PMPI_Barrier(comm = 0x5195a0), line 62 in "pbarrier.c"
> >>   main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x403219
> >>
> >>The fact that some derived data types were sent before seems to have 
> >>something to do with it.  I see this sort of problem cropping up in Cisco 
> >>and Oracle testing.  Up at the level of pml_ob1_send_request_start_copy, at 
> >>line 485:
> >>
> >>   MCA_PML_OB1_SEND_REQUEST_RESET(sendreq);
> >>
> >>I see
> >>
> >>*sendreq->req_send.req_base.req_convertor.use_desc = {
> >>length = 0
> >>used   = 0
> >>desc   = (nil)
> >>}
> >>
> >>and I guess that desc=NULL is causing the segfault at opal_conver

[OMPI devel] RFC: Pineapple Runtime Interposition Project

2012-06-15 Thread Josh Hursey
What: A Runtime Interposition Project - Codename Pineapple

Why: Define clear API and semantics for runtime requirements of the OMPI layer.

When:
 - F June 22, 2012 - Work completed
 - T June 26, 2012 - Discuss on teleconf
 - R June 28, 2012 - Commit to trunk

Where: Trunk (development BitBucket branch below)
  https://bitbucket.org/jjhursey/ompi-pineapple

Attached:
  PDF of slides presented on the June 12, 2012 teleconf. Note that the
timeline was slightly adjusted above (work completed date moved
ealier).


Description: Short Version
--
Define, in an 'rte.h', the interfaces and semantics that the OMPI
layer requires of a runtime environment. Currently this interface
matches the subset of ORTE functionality that is used by the OMPI
layer. Runtime symbols (e.g., orte_ess.proc_get_locality) are isolated
to a framework inside this project to provide linker-level protection
against accidental breakage of the pineapple interposition layer.

The interposition project provides researchers working on side
projects above and below the 'rte.h' interface a single location in
the code base to watch for interface and semantic changes that they
need to be concerned about. Researchers working above the pineapple
layer might explore something other than (or in addition to) OMPI
(e.g., Extended OMPI, UPC+OMPI). Researchers working below the
pineapple layer might explore something other than (or in addition to)
ORTE under OMPI (e.g., specialized runtimes for specific
environments).


Description: Other notes

The pineapple interface provides OMPI developers with a runtime API to
program against without requiring detailed knowledge of the layout of
ORTE and its frameworks. In some places in OMPI a single source file
needs to include >5 (up to 12 in one place) different header files to
get all of the necessary symbols. Developers must not only know where
these headers are, but must also understand the differences between
the various frameworks in ORTE to use ORTE. The developer must also be
aware that there are certain APIs and data structure fields that are
not available to the MPI process, so should not be used. The pineapple
project provides an API representing the small subset of ORTE that is
used by OMPI. With this API a developer only needs to look at a single
location in the code base to understand what is provided by the
runtime for use in the OMPI layer.

A similar statement could be made for runtime developers trying to
figure out what the OMPI layer requires from the a runtime
environment. Currently they need a deep understanding of the behavior
of ORTE to understand the semantics of various calls to ORTE from the
OMPI layer. Then they must develop a custom patch for the OMPI layer
that extracts the ORTE symbols, and replaces them with their own
symbols. This process is messy, error prone, and tedious to say the
least. Having a single set of interfaces and semantics will allow such
developers to focus their efforts on supporting the Open MPI community
defined API, and not necessarily the evolution of the ORTE or OMPI
project internals. This is advantageous when porting Open MPI to an
environment with a full featured runtime already running on the
machine, and for researchers exploring radical runtime designs for
future systems. The pineapple API allows such projects to develop
beside the mainline Open MPI trunk a little more easily than without
the pineapple API.


FAQ:

(1) Why is this a separate project and not a framework of OMPI? or a
framework of ORTE?

After much deliberation between the developers, from a software
engineering perspective, making the pineapple rte.h interface a
separate project was the most flexible solution. So neither the OMPI
layer nor the ORTE layer 'own' the interface, but it is 'owned' by the
Open MPI project primarily to support the interaction between these
two layers.

Consider that if we decided to place the interface in the OMPI layer
as a framework then we would be able to place something other than (or
in addition to) ORTE underneath OMPI, but we would be limited in our
ability to place something other than (or in addition to) OMPI over
ORTE. Alternatively, if we decided to place the rte.h interface in the
ORTE layer then we would be able to place something other than (or in
addition to) OMPI over ORTE, but we would be limited in our ability to
place something other than (or in addition to) ORTE under OMPI.
Defining the interposition layer as a separate project between these
two layers allows maximal flexibility for the project and researchers
working on side branches.


(2) What if another project outside of Open MPI needs interface
changes to the pineapple 'rte.h'?

The rule of thumb is that 'The OMPI/ORTE/OPAL stack is king!'. This
means that the pineapple project should always err on the side of
supporting the OMPI/ORTE/OPAL stack, as that is the flagship product
of the Open MPI project. Interface suggestions are always w

Re: [OMPI devel] Barrier/coll_tuned/pml_ob1 segfault for derived data types

2012-06-15 Thread George Bosilca
On Jun 15, 2012, at 20:59 , Nathan Hjelm wrote:

> Seems like either a bug in the converter code or in setting up the send 
> request. r26597 ensures correctness in the case the btl's sendi does all 
> three of the following: returns an error, changes the converter, and returns 
> a descriptor.

None of the above. There is a shortcut in the PML preventing the creation of a 
convertor in case the amount of data is zero. This shortcut saves few tens of 
instructions in the critical path.

  george.



> 
> Until we can find the root cause I pushed a change that protects the reset by 
> checking if size > 0.
> 
> Let me know if that works for you.
> 
> -Nathan
> 
> On Fri, Jun 15, 2012 at 12:34:32PM -0400, Eugene Loh wrote:
>> Backing out r26597 solves my particular test cases.  I'll back it
>> out of the trunk as well unless someone has objections.
>> 
>> I like how you say "same segfault."  In certain cases, I just go on
>> to different segfaults.  E.g.,
>> 
>>  [2] btl_openib_handle_incoming(openib_btl, ep, frag, byte_len =
>> 20U), line 3208 in "btl_openib_component.c"
>>  [3] handle_wc(device, cq = 0, wc), line 3516 in "btl_openib_component.c"
>>  [4] poll_device(device, count = 1), line 3654 in "btl_openib_component.c"
>>  [5] progress_one_device(device), line 3762 in "btl_openib_component.c"
>>  [6] btl_openib_component_progress(), line 3787 in
>> "btl_openib_component.c"
>>  [7] opal_progress(), line 207 in "opal_progress.c"
>>  [8] opal_condition_wait(c, m), line 100 in "condition.h"
>>  [9] ompi_request_default_wait_all(count = 2U, requests, statuses),
>> line 281 in "req_wait.c"
>>  [10] ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0,
>> sdatatype, dest = 0, stag = -16, recvbuf = (nil), rcount = 0,
>> rdatatype, source = 0, rtag = -16, comm, status = (nil)), line 54 in
>> "coll_tuned_util.c"
>>  [11] ompi_coll_tuned_barrier_intra_recursivedoubling(comm,
>> module), line 172 in "coll_tuned_barrier.c"
>>  [12] ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line
>> 207 in "coll_tuned_decision_fixed.c"
>>  [13] PMPI_Barrier(comm = 0x518370), line 62 in "pbarrier.c"
>> 
>> The reg->cbfunc is NULL.  I'm still considering whether that's an
>> artifact of how I build that particular case, though.
>> 
>> On 06/15/12 09:44, George Bosilca wrote:
>>> There should be no datatype attached to the barrier, so it is normal you 
>>> get the zero values in the convertor.
>>> 
>>> Something weird is definitively going on. As there is no data to be sent, 
>>> the opal_convertor_set_position function is supposed to trigger the special 
>>> path, mark the convertor as completed and return successfully. However, 
>>> this seems not to be the case anymore as in your backtrace I see the call 
>>> to opal_convertor_set_position_nocheck, which only happens if the above 
>>> described test fails.
>>> 
>>> I had some doubts about r26597, but I don't have time to check into it 
>>> until Monday. Maybe you can remove it and se if you continue to have the 
>>> same segfault.
>>> 
>>>  george.
>>> 
>>> On Jun 15, 2012, at 01:24 , Eugene Loh wrote:
>>> 
 I see a segfault show up in trunk testing starting with r26598 when tests 
 like
 
   ibm  collective/struct_gatherv
   intel src/MPI_Type_free_[types|pending_msg]_[f|c]
 
 are run over openib.  Here is a typical stack trace:
 
  opal_convertor_create_stack_at_begining(convertor = 0x689730, sizes), 
 line 404 in "opal_convertor.c"
  opal_convertor_set_position_nocheck(convertor = 0x689730, position), line 
 423 in "opal_convertor.c"
  opal_convertor_set_position(convertor = 0x689730, position = 
 0x7fffc36e0bf0), line 321 in "opal_convertor.h"
  mca_pml_ob1_send_request_start_copy(sendreq, bml_btl = 0x6a3ea0, size = 
 0), line 485 in "pml_ob1_sendreq.c"
  mca_pml_ob1_send_request_start_btl(sendreq, bml_btl), line 387 in 
 "pml_ob1_sendreq.h"
  mca_pml_ob1_send_request_start(sendreq = 0x689680), line 458 in 
 "pml_ob1_sendreq.h"
  mca_pml_ob1_isend(buf = (nil), count = 0, datatype, dst = 2, tag = -16, 
 sendmode = MCA_PML_BASE_SEND_STANDARD, comm, request), line 87 in 
 "pml_ob1_isend.c"
  ompi_coll_tuned_sendrecv_actual(sendbuf = (nil), scount = 0, sdatatype, 
 dest = 2, stag = -16, recvbuf = (nil), rcount = 0, rdatatype, source = 2, 
 rtag = -16, comm, status = (nil)), line 51 in "coll_tuned_util.c"
  ompi_coll_tuned_barrier_intra_recursivedoubling(comm, module), line 172 
 in "coll_tuned_barrier.c"
  ompi_coll_tuned_barrier_intra_dec_fixed(comm, module), line 207 in 
 "coll_tuned_decision_fixed.c"
  PMPI_Barrier(comm = 0x5195a0), line 62 in "pbarrier.c"
  main(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x403219
 
 The fact that some derived data types were sent before seems to have 
 something to do with it.  I see this sort of problem cropping up in Cisco 
 and Oracle testing.  Up at the level of pml_ob1_send_request_start_co

Re: [OMPI devel] Barrier/coll_tuned/pml_ob1 segfault for derived data types

2012-06-15 Thread Eugene Loh

On 6/15/2012 11:59 AM, Nathan Hjelm wrote:

Until we can find the root cause I pushed a change that protects the reset by 
checking if size>  0.
Let me know if that works for you.

It does.