Re: [OMPI devel] Amateur Guidance
Timothy Hayes wrote: I'm a regular OpenMPI user but I'm new to the strange world of development and hence this mailing list. I'm currently working on a project that involves OpenMPI and I was wondering if I might get some guidance and pointers in the right direction. The problem I'm having is jumping into the OpenMPI code. I've read two papers I found on the homepage: "Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation" and "TEG: A High-Performance, Scalable, Multi-Network Point-to-Point Communications Methodology" which gave me some insight about the MCA, PML and PTL. However, I'm finding it quite difficult to get a foothold into the codebase and I'm wondering if anyone might be able to point me to a guide or some documentation that might help get me started. I'm very eager to do this project well and contribute to the OpenMPI community, and if anyone has some advice or pointers I'd really appreciate it. I'm no expert. Indeed, I'm quite the opposite. I started looking at OMPI a few months ago. As a newbie, I'd say: There seems to be no really great docs here for developers. You just need to start reading source code, asking questions, stepping through with a debugger, etc., and immerse yourself for a while ... a few months? This is a little frustrating since one of the objectives of OMPI is to provide a framework in which a researcher should be able to modify only one component and do something interesting. Meanwhile, there is no good description of what the interfaces are among the various components nor what they all really do. And, you do kind of need an understanding of what other pieces are doing and what your component is supposed to do. So, instead of just reading up on one component and writing it, you end up having to study a big body of source code, reverse engineering a number of its parts, and then try implementing the piece you're interested in playing with. I do have a bunch of notes I've accumulated that could theoretically help someone else who is trying to learn the same things I am. My focus has been on the sm BTL, so might not be 100% of interest to you. I've walked through and found the code paths of interest to me, expanded data structures, done some analysis, etc. I guess I should try to clean these notes up for other people and share them. There are lots of pointers in there to source code so one can look at the notes and click to see the relevant source code. These notes are invaluable to me (and the product of 3 buckets full of blood, sweat, and tears), but again reflect my own interests. The pointers to the source code use OpenGrok -- http://opensolaris.org/os/project/opengrok/ -- but you may have your own favorite tools. Main answer: no great docs to look at. I think I've asked some OMPI experts and that was basically the answer they gave me.
Re: [OMPI devel] Amateur Guidance
The TEG paper is woefully out of date, we don't use that interface anymore. Try the following for dated, but more relevant info: http://www.open-mpi.org/papers/euro-pvmmpi-2006-hpc-protocols http://www.open-mpi.org/papers/workshop-2006/wed_01_pt2pt.pdf These cover the point-to-point infrastructure which has changed in some ways but not dramatically. There are quite a few other areas covered here: http://www.open-mpi.org/papers/workshop-2006/ Happy Hunting.. - Galen On 11/3/08 11:39 AM, "Eugene Loh" wrote: > Timothy Hayes wrote: >> I'm a regular OpenMPI user but I'm new to the strange world of development >> and >> hence this mailing list. I'm currently working on a project that involves >> OpenMPI and I was wondering if I might get some guidance and pointers in the >> right direction. >> >> The problem I'm having is jumping into the OpenMPI code. I've read two papers >> I found on the homepage: "Open MPI: Goals, Concept, and Design of a Next >> Generation MPI Implementation" and "TEG: A High-Performance, Scalable, >> Multi-Network Point-to-Point Communications Methodology" which gave me some >> insight about the MCA, PML and PTL. However, I'm finding it quite difficult >> to >> get a foothold into the codebase and I'm wondering if anyone might be able to >> point me to a guide or some documentation that might help get me started. >> >> >> >> >> I'm very eager to do this project well and contribute to the OpenMPI >> community, and if anyone has some advice or pointers I'd really appreciate >> it. > I'm no expert. Indeed, I'm quite the opposite. I started looking at OMPI a > few months ago. As a newbie, I'd say: > > There seems to be no really great docs here for developers. You just need to > start reading source code, asking questions, stepping through with a debugger, > etc., and immerse yourself for a while ... a few months? This is a little > frustrating since one of the objectives of OMPI is to provide a framework in > which a researcher should be able to modify only one component and do > something interesting. Meanwhile, there is no good description of what the > interfaces are among the various components nor what they all really do. And, > you do kind of need an understanding of what other pieces are doing and what > your component is supposed to do. So, instead of just reading up on one > component and writing it, you end up having to study a big body of source > code, reverse engineering a number of its parts, and then try implementing the > piece you're interested in playing with. > > I do have a bunch of notes I've accumulated that could theoretically help > someone else who is trying to learn the same things I am. My focus has been > on the sm BTL, so might not be 100% of interest to you. I've walked through > and found the code paths of interest to me, expanded data structures, done > some analysis, etc. I guess I should try to clean these notes up for other > people and share them. There are lots of pointers in there to source code so > one can look at the notes and click to see the relevant source code. These > notes are invaluable to me (and the product of 3 buckets full of blood, sweat, > and tears), but again reflect my own interests. The pointers to the source > code use OpenGrok -- http://opensolaris.org/os/project/opengrok/ -- but you > may have your own favorite tools. > > Main answer: no great docs to look at. I think I've asked some OMPI experts > and that was basically the answer they gave me.
[OMPI devel] Error after ompi-restart
Hi All, I think that exists an error in the trunk version while trying to restore a checkpoint. The function orte_util_decode_pidmap while attempts to execute the following code /* store the data */ for (i=0; i < num_procs; i++) { pmap.node = nodes[i]; pmap.local_rank = local_rank[i]; pmap.node_rank = node_rank[i]; opal_value_array_set_item(procs, i, &pmap); } produces a segmentation fault [nodo2:18027] *** Process received signal *** [nodo2:18027] Signal: Segmentation fault (11) [nodo2:18027] Signal code: Address not mapped (1) [nodo2:18027] Failing at address: (nil) I was trying to trace the problem and I think that it occurs in the line opal_value_array_set_item(procs, i, &pmap); Thanks, -- Leonardo Fialho Computer Architecture and Operating Systems Department - CAOS Universidad Autonoma de Barcelona - UAB ETSE, Edifcio Q, QC/3088 http://www.caos.uab.es Phone: +34-93-581-2888 Fax: +34-93-581-2478
Re: [OMPI devel] Amateur Guidance
On Nov 3, 2008, at 10:39 AM, Eugene Loh wrote: Main answer: no great docs to look at. I think I've asked some OMPI experts and that was basically the answer they gave me. This is unfortunately the current state of the art -- no one has had time to write up good docs. Galen pointed to the new papers -- our main PML these days is "ob1" (teg died a long time ago). PML = Point to point messaging layer; it's basically the layer that is right behind MPI_SEND and friends. The ob1 PML uses BTL modules underneath. BTL = Byte transfer layer; individual modules that send bytes back and forth over individual transports (e.g., shared memory, TCP, openfabrics, etc.). There's a BTL for each of the major transports that we support. The protocols that ob1 uses are described nicely in the papers that Galen sent, but the specific function interfaces are only best described in ompi/mca/ btl/btl.h. Alternatively, we have a "cm" PML which uses MTL modules underneath. MTL = Matching transport layer; it's basically for transports that expose very MPI-like interfaces (e.g., elan, tports, PSM, portals, MX). This cm component is extremely thin; it basically provides a shim between Open MPI and the underlying transport. The big difference between cm and ob1 is that ob1 is a progress engine that tracks multiple transport interfaces (e.g., shared memory, tcp, openfabrics, ...etc. -- and therefore potentially multiple BTL module instances) and cm is a thin shim that simply translates between OMPI and the back-end interface -- cm will only use *ONE* MTL module instance. Specifically: it is expected that the one MTL module will do all the progression, striping, ...or whatever it wants to do to move bytes from A to B by itself (very little/no help at all from OMPI's infrastructure). Does that help some? -- Jeff Squyres Cisco Systems