Re: [OMPI devel] Amateur Guidance

2008-11-03 Thread Eugene Loh




Timothy Hayes wrote:
I'm a regular OpenMPI user but I'm new to the strange
world of development and hence this mailing list. I'm currently working
on a project that involves OpenMPI and I was wondering if I might get
some guidance and pointers in the right direction.
  The problem I'm having is jumping into the
OpenMPI code. I've read two papers I found on the homepage: "Open MPI:
Goals, Concept, and Design of a Next Generation MPI Implementation" and
"TEG: A High-Performance, Scalable, Multi-Network Point-to-Point
Communications Methodology" which gave me some insight about the MCA,
PML and PTL. However, I'm finding it quite difficult to get a foothold
into the codebase and I'm wondering if anyone might be able to point me
to a guide or some documentation that might help get me started.
  
  
  I'm very eager to do this project well and
contribute to the OpenMPI community, and if anyone has some advice or
pointers I'd really appreciate it.

I'm no expert.  Indeed, I'm quite the opposite.  I started looking at
OMPI a few months ago.  As a newbie, I'd say:

There seems to be no really great docs here for developers.  You just
need to start reading source code, asking questions, stepping through
with a debugger, etc., and immerse yourself for a while ... a few
months?  This is a little frustrating since one of the objectives of
OMPI is to provide a framework in which a researcher should be able to
modify only one component and do something interesting.  Meanwhile,
there is no good description of what the interfaces are among the
various components nor what they all really do.  And, you do kind of
need an understanding of what other pieces are doing and what your
component is supposed to do.  So, instead of just reading up on one
component and writing it, you end up having to study a big body of
source code, reverse engineering a number of its parts, and then try
implementing the piece you're interested in playing with.

I do have a bunch of notes I've accumulated that could theoretically
help someone else who is trying to learn the same things I am.  My
focus has been on the sm BTL, so might not be 100% of interest to you. 
I've walked through and found the code paths of interest to me,
expanded data structures, done some analysis, etc.  I guess I should
try to clean these notes up for other people and share them.  There are
lots of pointers in there to source code so one can look at the notes
and click to see the relevant source code.  These notes are invaluable
to me (and the product of 3 buckets full of blood, sweat, and tears),
but again reflect my own interests.  The pointers to the source code
use OpenGrok -- http://opensolaris.org/os/project/opengrok/ -- but you
may have your own favorite tools.

Main answer: no great docs to look at.  I think I've asked some OMPI
experts and that was basically the answer they gave me.




Re: [OMPI devel] Amateur Guidance

2008-11-03 Thread Shipman, Galen M.
The TEG paper is woefully out of date, we don't use that interface anymore.

Try the following for dated, but more relevant info:

http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols

http://www.open-mpi.org/papers/workshop-2006/wed_01_pt2pt.pdf


These cover the point-to-point infrastructure which has changed in some ways
but not dramatically.

There are quite a few other areas covered here:

http://www.open-mpi.org/papers/workshop-2006/


Happy Hunting.. 

- Galen 



On 11/3/08 11:39 AM, "Eugene Loh"  wrote:

> Timothy Hayes wrote:
>> I'm a regular OpenMPI user but I'm new to the strange world of development
>> and 
>> hence this mailing list. I'm currently working on a project that involves
>> OpenMPI and I was wondering if I might get some guidance and pointers in the
>> right direction.
>>  
>> The problem I'm having is jumping into the OpenMPI code. I've read two papers
>> I found on the homepage: "Open MPI: Goals, Concept, and Design of a Next
>> Generation MPI Implementation" and "TEG: A High-Performance, Scalable,
>> Multi-Network Point-to-Point Communications Methodology" which gave me some
>> insight about the MCA, PML and PTL. However, I'm finding it quite difficult
>> to 
>> get a foothold into the codebase and I'm wondering if anyone might be able to
>> point me to a guide or some documentation that might help get me started.
>>  
>>  
>>  
>> 
>> I'm very eager to do this project well and contribute to the OpenMPI
>> community, and if anyone has some advice or pointers I'd really appreciate
>> it.
> I'm no expert.  Indeed, I'm quite the opposite.  I started looking at OMPI a
> few months ago.  As a newbie, I'd say:
> 
> There seems to be no really great docs here for developers.  You just need to
> start reading source code, asking questions, stepping through with a debugger,
> etc., and immerse yourself for a while ... a few months?  This is a little
> frustrating since one of the objectives of OMPI is to provide a framework in
> which a researcher should be able to modify only one component and do
> something interesting.  Meanwhile, there is no good description of what the
> interfaces are among the various components nor what they all really do.  And,
> you do kind of need an understanding of what other pieces are doing and what
> your component is supposed to do.  So, instead of just reading up on one
> component and writing it, you end up having to study a big body of source
> code, reverse engineering a number of its parts, and then try implementing the
> piece you're interested in playing with.
> 
> I do have a bunch of notes I've accumulated that could theoretically help
> someone else who is trying to learn the same things I am.  My focus has been
> on the sm BTL, so might not be 100% of interest to you. I've walked through
> and found the code paths of interest to me, expanded data structures, done
> some analysis, etc.  I guess I should try to clean these notes up for other
> people and share them.  There are lots of pointers in there to source code so
> one can look at the notes and click to see the relevant source code.  These
> notes are invaluable to me (and the product of 3 buckets full of blood, sweat,
> and tears), but again reflect my own interests.  The pointers to the source
> code use OpenGrok -- http://opensolaris.org/os/project/opengrok/ -- but you
> may have your own favorite tools.
> 
> Main answer: no great docs to look at.  I think I've asked some OMPI experts
> and that was basically the answer they gave me.



[OMPI devel] Error after ompi-restart

2008-11-03 Thread Leonardo Fialho

Hi All,

I think that exists an error in the trunk version while trying to 
restore a checkpoint.


The function orte_util_decode_pidmap while attempts to execute the 
following code


   /* store the data */
   for (i=0; i < num_procs; i++) {
   pmap.node = nodes[i];
   pmap.local_rank = local_rank[i];
   pmap.node_rank = node_rank[i];
   opal_value_array_set_item(procs, i, &pmap);
   }

produces a segmentation fault

[nodo2:18027] *** Process received signal ***
[nodo2:18027] Signal: Segmentation fault (11)
[nodo2:18027] Signal code: Address not mapped (1)
[nodo2:18027] Failing at address: (nil)

I was trying to trace the problem and I think that it occurs in the line 
opal_value_array_set_item(procs, i, &pmap);


Thanks,

--
Leonardo Fialho
Computer Architecture and Operating Systems Department - CAOS
Universidad Autonoma de Barcelona - UAB
ETSE, Edifcio Q, QC/3088
http://www.caos.uab.es
Phone: +34-93-581-2888
Fax: +34-93-581-2478



Re: [OMPI devel] Amateur Guidance

2008-11-03 Thread Jeff Squyres

On Nov 3, 2008, at 10:39 AM, Eugene Loh wrote:

Main answer: no great docs to look at.  I think I've asked some OMPI  
experts and that was basically the answer they gave me.


This is unfortunately the current state of the art -- no one has had  
time to write up good docs.


Galen pointed to the new papers -- our main PML these days is  
"ob1" (teg died a long time ago).


PML = Point to point messaging layer; it's basically the layer that is  
right behind MPI_SEND and friends.


The ob1 PML uses BTL modules underneath.  BTL = Byte transfer layer;  
individual modules that send bytes back and forth over individual  
transports (e.g., shared memory, TCP, openfabrics, etc.).  There's a  
BTL for each of the major transports that we support.  The protocols  
that ob1 uses are described nicely in the papers that Galen sent, but  
the specific function interfaces are only best described in ompi/mca/ 
btl/btl.h.


Alternatively, we have a "cm" PML which uses MTL modules underneath.   
MTL = Matching transport layer; it's basically for transports that  
expose very MPI-like interfaces (e.g., elan, tports, PSM, portals,  
MX).  This cm component is extremely thin; it basically provides a  
shim between Open MPI and the underlying transport.


The big difference between cm and ob1 is that ob1 is a progress engine  
that tracks multiple transport interfaces (e.g., shared memory, tcp,  
openfabrics, ...etc. -- and therefore potentially multiple BTL module  
instances) and cm is a thin shim that simply translates between OMPI  
and the back-end interface -- cm will only use *ONE* MTL module  
instance.  Specifically: it is expected that the one MTL module will  
do all the progression, striping, ...or whatever it wants to do to  
move bytes from A to B by itself (very little/no help at all from  
OMPI's infrastructure).


Does that help some?

--
Jeff Squyres
Cisco Systems