[OMPI devel] how to add a component in the ompi?

2010-03-10 Thread hu yaohui
Hi Jeff & All
i want to add a new component in the ompi,
1: i make a dir ~/mca/btl/ht
2:Then,i have made sure some basic functions i need to implement.
such as:
mca_btl_ht_add_procs,
mca_btl_ht_del_procs,
mca_btl_ht_alloc
mca_btl_ht_free
mca_btl_ht_finalize.
3:after these functions,i must compile these funcitons,i copied the makefire
under ~/mca/btl/tcp/,
i have seen these:

 MCA_btl_ALL_COMPONENTS =  self sm elan gm mx ofud openib portals tcp udapl
 MCA_btl_ALL_SUBDIRS =  mca/btl/self mca/btl/sm mca/btl/elan mca/btl/gm
mca/btl/mx mca/btl/o
fud mca/btl/openib mca/btl/portals mca/btl/tcp mca/btl/udapl

add my component into these two lines is just not enough.
Can you help me out on making a right Makefile for my component under folder
~/mca/blt/ht?

Thanks & Regards,
Yaohui Hu


Re: [OMPI devel] RFC: increase default AC/AM/LT requirements

2010-03-10 Thread Jeff Squyres
Brian and I chatted about this in person -- he doesn't have too strong of an 
opinion here.

We checked versions shipped in RHEL:

RHEL4: AC 2.59, AM 1.9.2, LT 1.5.6
RHEL5: AC 2.59, AM 1.9.2, LT 1.5.22

Meaning: they're both really ancient.

I personally don't mind forcing developers to have more modern versions because 
we're a fairly small group of people (vs. users).  Does anyone else have an 
opinion here?


On Feb 25, 2010, at 1:55 PM, Barrett, Brian W wrote:

> I think our last set of minimums was based on being able to use RHEL4 out of 
> the box.  Updating to whatever ships with RHEL5 probably makes sense, but I 
> think that still leaves you at a LT 1.5.x release.  Being higher than that 
> requires new Autotools, which seems like asking for trouble.
> 
> Brian
> 
> On Feb 25, 2010, at 4:47 PM, Jeff Squyres wrote:
> 
> > WHAT: Bump minimum required versions of GNU autotools up to modern 
> > versions.  I suggest the following, but could be talked down a version or 
> > two:
> >  Autoconf: 2.65
> >  Automake: 1.11.1
> >  Libtool: 2.2.6b
> >
> > WHY: Stop carrying patches and workarounds for old versions.
> >
> > WHERE: autogen.sh, make_dist_tarball, various Makefile.am's, configure.ac, 
> > *.m4.
> >
> > WHEN: No real rush.  Somewhere in 1.5.x.
> >
> > TIMEOUT: Friday March 5, 2010
> >
> > 
> >
> > I was debugging a complex Automake timestamp issue yesterday and discovered 
> > that it was caused by the fact that we are patching an old version of 
> > libtool.m4.  It took a little while to figure out both the problem and an 
> > acceptable workaround.  During this process, I noticed that autogen.sh 
> > still carries patches to fix bugs in some *really* old versions of Libtool 
> > (e.g., 1.5.22).  Hence, I am send this RFC to increase the minimum required 
> > versions.
> >
> > Keep in mind:
> >
> > 1. This ONLY affects developers.  Those who build from tarballs don't even 
> > need to have the Autotools installed.
> > 2. Autotool patches should always be pushed upstream.  We should only 
> > maintain patches for things that have been pushed upstream but have not yet 
> > been released.
> > 3. We already have much more recent Autotools requirements for official 
> > distribution tarballs; see the chart here:
> >
> >http://www.open-mpi.org/svn/building.php
> >
> > Specifically: although official tarballs require recent Autotools, we allow 
> > developers to use much older versions.   Why are we still carrying around 
> > this old kruft?  Does some developer out there have a requirement to use 
> > older Autotools?
> >
> > If not, this RFC proposes to only allow recent versions of the Autotools to 
> > build Open MPI.  I believe there's reasonable m4 these days that can make 
> > autogen/configure/whatever abort early if the versions are not new enough.  
> > This would allow us, at a minimum, to drop some of the libtool patches 
> > we're carrying.  There may be some Makefile.am workarounds that are no 
> > longer necessary, too.
> >
> > There's no real rush on this; if this RFC passes, we can set a concrete, 
> > fixed date some point in the future where we switch over to requiring new 
> > versions.  This should give everyone plenty of time to update if you need 
> > to, etc.
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> 
> --
>   Brian W. Barrett
>   Dept. 1423: Scalable System Software
>   Sandia National Laboratories
> 
> 
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: Rename --enable-*-threads andENABLE*THREAD*(take 2)

2010-03-10 Thread George Bosilca
There was way too much information on this thread that I was looking for ;) And 
a lot of misunderstandings ...

If we want to allow ORTE to be on his own thread, then we should clearly banish 
the progress_thread from this equation. I would prefer ORTE to be as separated 
from the rest of the MPI library as possible, and therefore avoid most of the 
locks and overheads on the MPI itself. Moving ORTE (as it only use TCP sockets) 
on it's own poll is the best looking approach, and this can be easily done once 
we upgrade out libevent to the 2.0.

To be honest the progress thread was not the smartest idea we had. It makes the 
code more complex, and the added benefit is quite small. Again, once we move to 
the libevent-2 we will cleanup the code, and have a more consistent approach.

  george.

On Mar 8, 2010, at 10:11 , Ralph Castain wrote:

> 
> On Mar 8, 2010, at 6:43 AM, Jeff Squyres wrote:
> 
>> On Mar 7, 2010, at 8:13 PM, Ralph Castain wrote:
>> 
 How about calling it --enable-opal-event-progress-thread, or even 
 --enable-open-libevent-progress-thread?
>>> 
>>> Why not add another 100+ characters to the name while we are at it? :-/
>>> 
>> 
>> :-)
>> 
>> I didn't really think the length mattered here, since it's a configure 
>> argument.  There has been a *lot* of confusion over the name of this 
>> particular switch over the past few years, so I'm suggesting that a longer, 
>> more descriptive name might be a little better.  Just my $0.02...
> 
> I honestly don't think that is the source of the confusion. The revised name 
> tells you exactly what that configure option does - it enables a thread at 
> the opal layer that calls opal_progress. Period.
> 
> The confusion is over how that is used within the code, given that opal 
> doesn't have any communication system (as George pointed out). So having an 
> opal progress thread running will cause the event library to tick over, but 
> that does? It isn't directly tied to any existing subsystem, but rather 
> cuts across any of them that are sitting on sockets/file descriptors etc. 
> using the event library.
> 
> If you look at the other progress threads in the system (e.g., openib), 
> you'll find that they don't use the event library to monitor their fd's - 
> they poll them directly. So enabling the opal progress thread doesn't 
> directly affect them.
> 
> So I would say let's leave the name alone, and change it if/when someone 
> figures out how to utilize that capability.
> 
>> 
 The openib BTL can have up to 2 progress threads (!) -- the async verbs 
 event notifier and the RDMA CM agent.  They really should be consolidated. 
  If there's infrastructure to consolidate them via opal or something else, 
 then so much the better...
>>> 
>>> Agreed, though I think that is best done as a separate effort from this RFC.
>>> 
>> 
>> Agreed -- sorry, I wasn't clear.  I wasn't trying to propose that that work 
>> be added to this RFC; I was just trying to mention that there could be a 
>> good use for the work from this RFC if such infrastructure was provided.
>> 
>>> I believe there was a concern over latency if all the BTLs are driven by 
>>> one progress thread that sequentially runs across their respective file 
>>> descriptors, but I may be remembering it incorrectly...
>>> 
>> 
>> 
>> FWIW, I believe the openib progress threads were written the they way they 
>> were (i.e., without any opal progress thread support) because, at least in 
>> the current setup, to get the opal progress thread support, you have to turn 
>> on all the heavyweight locks, etc.  These two progress threads now simply 
>> pthread_create() and do minimal locking between the main thread and 
>> themselves, without affecting the rest of the locking machinery in the code 
>> base.
>> 
>> I'm not saying that's perfect (or even good); I'm just saying that that's 
>> the way it currently is.  Indeed, at a minimum, Pasha and I have long talked 
>> about merging these two progress threads into 1.  It would be even better if 
>> we could merge these two project threads into some other infrastructure.  
>> But it's always been somewhat of a low priority; we've never gotten a round 
>> tuit...
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] RFC: Rename --enable-*-threads andENABLE*THREAD*(take 2)

2010-03-10 Thread Ralph Castain
No problem with me. Why don't we then modify this RFC to simply eliminate the 
--enable-opal-progress-thread option, and leave the rest as-is so we can enable 
the opal thread machinery without enabling MPI thread multiple?

I agree that having an ORTE-level thread makes the most sense, if it is needed.

On Mar 10, 2010, at 10:20 AM, George Bosilca wrote:

> There was way too much information on this thread that I was looking for ;) 
> And a lot of misunderstandings ...
> 
> If we want to allow ORTE to be on his own thread, then we should clearly 
> banish the progress_thread from this equation. I would prefer ORTE to be as 
> separated from the rest of the MPI library as possible, and therefore avoid 
> most of the locks and overheads on the MPI itself. Moving ORTE (as it only 
> use TCP sockets) on it's own poll is the best looking approach, and this can 
> be easily done once we upgrade out libevent to the 2.0.
> 
> To be honest the progress thread was not the smartest idea we had. It makes 
> the code more complex, and the added benefit is quite small. Again, once we 
> move to the libevent-2 we will cleanup the code, and have a more consistent 
> approach.
> 
>  george.
> 
> On Mar 8, 2010, at 10:11 , Ralph Castain wrote:
> 
>> 
>> On Mar 8, 2010, at 6:43 AM, Jeff Squyres wrote:
>> 
>>> On Mar 7, 2010, at 8:13 PM, Ralph Castain wrote:
>>> 
> How about calling it --enable-opal-event-progress-thread, or even 
> --enable-open-libevent-progress-thread?
 
 Why not add another 100+ characters to the name while we are at it? :-/
 
>>> 
>>> :-)
>>> 
>>> I didn't really think the length mattered here, since it's a configure 
>>> argument.  There has been a *lot* of confusion over the name of this 
>>> particular switch over the past few years, so I'm suggesting that a longer, 
>>> more descriptive name might be a little better.  Just my $0.02...
>> 
>> I honestly don't think that is the source of the confusion. The revised name 
>> tells you exactly what that configure option does - it enables a thread at 
>> the opal layer that calls opal_progress. Period.
>> 
>> The confusion is over how that is used within the code, given that opal 
>> doesn't have any communication system (as George pointed out). So having an 
>> opal progress thread running will cause the event library to tick over, but 
>> that does? It isn't directly tied to any existing subsystem, but rather 
>> cuts across any of them that are sitting on sockets/file descriptors etc. 
>> using the event library.
>> 
>> If you look at the other progress threads in the system (e.g., openib), 
>> you'll find that they don't use the event library to monitor their fd's - 
>> they poll them directly. So enabling the opal progress thread doesn't 
>> directly affect them.
>> 
>> So I would say let's leave the name alone, and change it if/when someone 
>> figures out how to utilize that capability.
>> 
>>> 
> The openib BTL can have up to 2 progress threads (!) -- the async verbs 
> event notifier and the RDMA CM agent.  They really should be 
> consolidated.  If there's infrastructure to consolidate them via opal or 
> something else, then so much the better...
 
 Agreed, though I think that is best done as a separate effort from this 
 RFC.
 
>>> 
>>> Agreed -- sorry, I wasn't clear.  I wasn't trying to propose that that work 
>>> be added to this RFC; I was just trying to mention that there could be a 
>>> good use for the work from this RFC if such infrastructure was provided.
>>> 
 I believe there was a concern over latency if all the BTLs are driven by 
 one progress thread that sequentially runs across their respective file 
 descriptors, but I may be remembering it incorrectly...
 
>>> 
>>> 
>>> FWIW, I believe the openib progress threads were written the they way they 
>>> were (i.e., without any opal progress thread support) because, at least in 
>>> the current setup, to get the opal progress thread support, you have to 
>>> turn on all the heavyweight locks, etc.  These two progress threads now 
>>> simply pthread_create() and do minimal locking between the main thread and 
>>> themselves, without affecting the rest of the locking machinery in the code 
>>> base.
>>> 
>>> I'm not saying that's perfect (or even good); I'm just saying that that's 
>>> the way it currently is.  Indeed, at a minimum, Pasha and I have long 
>>> talked about merging these two progress threads into 1.  It would be even 
>>> better if we could merge these two project threads into some other 
>>> infrastructure.  But it's always been somewhat of a low priority; we've 
>>> never gotten a round tuit...
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/

Re: [OMPI devel] RFC: increase default AC/AM/LT requirements

2010-03-10 Thread Rainer Keller
Hey Jeff,
as said I'd be in favor of updating.
This would get rid of some of the tests in autogen, speeding up things.

Regards,
Rainer

On Wednesday 10 March 2010 12:07:56 pm Jeff Squyres wrote:
> Brian and I chatted about this in person -- he doesn't have too strong of
>  an opinion here.
> 
> We checked versions shipped in RHEL:
> 
> RHEL4: AC 2.59, AM 1.9.2, LT 1.5.6
> RHEL5: AC 2.59, AM 1.9.2, LT 1.5.22
> 
> Meaning: they're both really ancient.
> 
> I personally don't mind forcing developers to have more modern versions
>  because we're a fairly small group of people (vs. users).  Does anyone
>  else have an opinion here?
> 
> On Feb 25, 2010, at 1:55 PM, Barrett, Brian W wrote:
> > I think our last set of minimums was based on being able to use RHEL4 out
> > of the box.  Updating to whatever ships with RHEL5 probably makes sense,
> > but I think that still leaves you at a LT 1.5.x release.  Being higher
> > than that requires new Autotools, which seems like asking for trouble.
> >
> > Brian
> >
> > On Feb 25, 2010, at 4:47 PM, Jeff Squyres wrote:
> > > WHAT: Bump minimum required versions of GNU autotools up to modern
> > > versions.  I suggest the following, but could be talked down a version
> > > or two: Autoconf: 2.65
> > >  Automake: 1.11.1
> > >  Libtool: 2.2.6b
> > >
> > > WHY: Stop carrying patches and workarounds for old versions.
> > >
> > > WHERE: autogen.sh, make_dist_tarball, various Makefile.am's,
> > > configure.ac, *.m4.
> > >
> > > WHEN: No real rush.  Somewhere in 1.5.x.
> > >
> > > TIMEOUT: Friday March 5, 2010
> > >
> > > 
> > >
> > > I was debugging a complex Automake timestamp issue yesterday and
> > > discovered that it was caused by the fact that we are patching an old
> > > version of libtool.m4.  It took a little while to figure out both the
> > > problem and an acceptable workaround.  During this process, I noticed
> > > that autogen.sh still carries patches to fix bugs in some *really* old
> > > versions of Libtool (e.g., 1.5.22).  Hence, I am send this RFC to
> > > increase the minimum required versions.
> > >
> > > Keep in mind:
> > >
> > > 1. This ONLY affects developers.  Those who build from tarballs don't
> > > even need to have the Autotools installed. 2. Autotool patches should
> > > always be pushed upstream.  We should only maintain patches for things
> > > that have been pushed upstream but have not yet been released. 3. We
> > > already have much more recent Autotools requirements for official
> > > distribution tarballs; see the chart here:
> > >
> > >http://www.open-mpi.org/svn/building.php
> > >
> > > Specifically: although official tarballs require recent Autotools, we
> > > allow developers to use much older versions.   Why are we still
> > > carrying around this old kruft?  Does some developer out there have a
> > > requirement to use older Autotools?
> > >
> > > If not, this RFC proposes to only allow recent versions of the
> > > Autotools to build Open MPI.  I believe there's reasonable m4 these
> > > days that can make autogen/configure/whatever abort early if the
> > > versions are not new enough.  This would allow us, at a minimum, to
> > > drop some of the libtool patches we're carrying.  There may be some
> > > Makefile.am workarounds that are no longer necessary, too.
> > >
> > > There's no real rush on this; if this RFC passes, we can set a
> > > concrete, fixed date some point in the future where we switch over to
> > > requiring new versions.  This should give everyone plenty of time to
> > > update if you need to, etc.
> > >
> > > --
> > > Jeff Squyres
> > > jsquy...@cisco.com
> > > For corporate legal information go to:
> > > http://www.cisco.com/web/about/doing_business/legal/cri/
> > >
> > >
> > > ___
> > > devel mailing list
> > > de...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > --
> >   Brian W. Barrett
> >   Dept. 1423: Scalable System Software
> >   Sandia National Laboratories
> >
> >
> >
> >
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

-- 

Rainer Keller, PhD  Tel: +1 (865) 241-6293
Oak Ridge National Lab  Fax: +1 (865) 241-4811
PO Box 2008 MS 6164   Email: kel...@ornl.gov
Oak Ridge, TN 37831-2008AIM/Skype: rusraink



Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-03-10 Thread Josh Hursey
Yesterday evening George, Thomas and I discussed some of their concerns about 
this RFC at the MPI Forum meeting. After the discussion, we seemed to be in 
agreement that the RecoS framework is a good idea and the concepts and fixes in 
this RFC should move forward with a couple of notes:

 - They wanted to test the branch a bit more over the next couple of days. Some 
MCA parameters that you will need are at the bottom of this message.

 - Reiterate that this RFC only addresses ORTE stability, not OMPI stability. 
The OMPI stability extension is a second step for the line of work, and 
should/will fit in nicely with the RecoS framework being proposed in this RFC. 
The OMPI layer stability will require a significant amount of work, but the 
RecoS framework will provide the ORTE layer stability that is required as a 
foundation for OMPI layer stability in the future.

 - The purpose of the ErrMgr becomes slightly unclear with the addition of the 
RecoS framework, since both are focused on responding to faults in the system 
(and RecoS, when enabled, overrides most/all of the ErrMgr functionality). 
Should the RecoS framework be merged with the ErrMgr framework to create a new 
ErrMgr interface?

We are typing to decide if we should merge these frameworks, but at this point 
we are interested in hearing how other developers feel about merging the ErrMgr 
and RecoS frameworks, which would change the ErrMgr API. Are there any 
developers out there that are developing ErrMgr components, or are using any 
particular features of the existing ErrMgr framework that they would like to 
see preserved in the next revision. By default, the existing default abort 
behavior of the ErrMgr framework will be preserved, so the user will have to 
'opt-in' to any fault recovery capabilities.

So we are continuing the discussion a bit more off-list, and will return to the 
list with an updated RFC (and possibly a new branch) soon (hopefully end of the 
week/early next week). I would like to briefly discuss this RFC at the Open MPI 
teleconf next Tuesday.

-- Josh

On Feb 26, 2010, at 8:06 AM, Josh Hursey wrote:

> Sounds good to me.
> 
> For those casually following this RFC let me summarize its current state.
> 
> Josh and George (and anyone else that wishes to participate attending the 
> forum) will meet sometime at the next MPI Forum meeting (March 8-10). I will 
> post any relevant notes from this meeting back to the list afterwards. So the 
> RFC is on hold pending the outcome of that meeting. For those developers 
> interested in this RFC that will not be able to attend, feel free to continue 
> using this thread for discussion.
> 
> Thanks,
> Josh
> 
> On Feb 26, 2010, at 6:09 AM, George Bosilca wrote:
> 
>> 
>> On Feb 26, 2010, at 01:50 , Josh Hursey wrote:
>> 
>>> Any of those options are fine with me. I was thinking that if you wanted to 
>>> talk sooner, we might be able to help explain our intentions with this 
>>> framework a bit better. I figure that the framework interface will change a 
>>> bit as we all advance and incorporate our various techniques into it. I 
>>> think that the current interface is a good first step, but there are 
>>> certainly many more steps to come.
>>> 
>>> I am fine delaying this code a bit, just not too long. Meeting at the forum 
>>> for a while might be a good option (we could probably even arrange to call 
>>> in others if you wanted).
>> 
>> Sounds good, let do this.
>> 
>> Thanks,
>>   george.
>> 
>>> 
>>> Cheers,
>>> Josh
>>> 
>>> On Feb 25, 2010, at 6:45 PM, Ralph Castain wrote:
>>> 
 If Josh is going to be at the forum, perhaps you folks could chat there? 
 Might as well take advantage of being colocated, if possible.
 
 Otherwise, I'm available pretty much any time. I can't contribute much 
 about the MPI recovery issues, but can contribute to the RTE issues if 
 that helps.
 
 
 On Thu, Feb 25, 2010 at 7:39 PM, George Bosilca  
 wrote:
 Josh,
 
 Next week is a little bit too early as will need some time to figure out 
 how to integrate with this new framework, and at what extent our code and 
 requirements fit into. Then the week after is the MPI Forum. How about on 
 Thursday 11 March?
 
 Thanks,
 george.
 
 On Feb 25, 2010, at 12:46 , Josh Hursey wrote:
 
> Per my previous suggestion, would it be useful to chat on the phone early 
> next week about our various strategies?
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> __

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-03-10 Thread Wesley Bland
Josh,

You mentioned some MCA parameters that you would include in the email, but I
don't see those parameters anywhere.  Could you please put those in here to
make testing easier for people.

Wesley

On Wed, Mar 10, 2010 at 1:26 PM, Josh Hursey  wrote:

> Yesterday evening George, Thomas and I discussed some of their concerns
> about this RFC at the MPI Forum meeting. After the discussion, we seemed to
> be in agreement that the RecoS framework is a good idea and the concepts and
> fixes in this RFC should move forward with a couple of notes:
>
>  - They wanted to test the branch a bit more over the next couple of days.
> Some MCA parameters that you will need are at the bottom of this message.
>
>  - Reiterate that this RFC only addresses ORTE stability, not OMPI
> stability. The OMPI stability extension is a second step for the line of
> work, and should/will fit in nicely with the RecoS framework being proposed
> in this RFC. The OMPI layer stability will require a significant amount of
> work, but the RecoS framework will provide the ORTE layer stability that is
> required as a foundation for OMPI layer stability in the future.
>
>  - The purpose of the ErrMgr becomes slightly unclear with the addition of
> the RecoS framework, since both are focused on responding to faults in the
> system (and RecoS, when enabled, overrides most/all of the ErrMgr
> functionality). Should the RecoS framework be merged with the ErrMgr
> framework to create a new ErrMgr interface?
>
> We are typing to decide if we should merge these frameworks, but at this
> point we are interested in hearing how other developers feel about merging
> the ErrMgr and RecoS frameworks, which would change the ErrMgr API. Are
> there any developers out there that are developing ErrMgr components, or are
> using any particular features of the existing ErrMgr framework that they
> would like to see preserved in the next revision. By default, the existing
> default abort behavior of the ErrMgr framework will be preserved, so the
> user will have to 'opt-in' to any fault recovery capabilities.
>
> So we are continuing the discussion a bit more off-list, and will return to
> the list with an updated RFC (and possibly a new branch) soon (hopefully end
> of the week/early next week). I would like to briefly discuss this RFC at
> the Open MPI teleconf next Tuesday.
>
> -- Josh
>
> On Feb 26, 2010, at 8:06 AM, Josh Hursey wrote:
>
> > Sounds good to me.
> >
> > For those casually following this RFC let me summarize its current state.
> >
> > Josh and George (and anyone else that wishes to participate attending the
> forum) will meet sometime at the next MPI Forum meeting (March 8-10). I will
> post any relevant notes from this meeting back to the list afterwards. So
> the RFC is on hold pending the outcome of that meeting. For those developers
> interested in this RFC that will not be able to attend, feel free to
> continue using this thread for discussion.
> >
> > Thanks,
> > Josh
> >
> > On Feb 26, 2010, at 6:09 AM, George Bosilca wrote:
> >
> >>
> >> On Feb 26, 2010, at 01:50 , Josh Hursey wrote:
> >>
> >>> Any of those options are fine with me. I was thinking that if you
> wanted to talk sooner, we might be able to help explain our intentions with
> this framework a bit better. I figure that the framework interface will
> change a bit as we all advance and incorporate our various techniques into
> it. I think that the current interface is a good first step, but there are
> certainly many more steps to come.
> >>>
> >>> I am fine delaying this code a bit, just not too long. Meeting at the
> forum for a while might be a good option (we could probably even arrange to
> call in others if you wanted).
> >>
> >> Sounds good, let do this.
> >>
> >> Thanks,
> >>   george.
> >>
> >>>
> >>> Cheers,
> >>> Josh
> >>>
> >>> On Feb 25, 2010, at 6:45 PM, Ralph Castain wrote:
> >>>
>  If Josh is going to be at the forum, perhaps you folks could chat
> there? Might as well take advantage of being colocated, if possible.
> 
>  Otherwise, I'm available pretty much any time. I can't contribute much
> about the MPI recovery issues, but can contribute to the RTE issues if that
> helps.
> 
> 
>  On Thu, Feb 25, 2010 at 7:39 PM, George Bosilca 
> wrote:
>  Josh,
> 
>  Next week is a little bit too early as will need some time to figure
> out how to integrate with this new framework, and at what extent our code
> and requirements fit into. Then the week after is the MPI Forum. How about
> on Thursday 11 March?
> 
>  Thanks,
>  george.
> 
>  On Feb 25, 2010, at 12:46 , Josh Hursey wrote:
> 
> > Per my previous suggestion, would it be useful to chat on the phone
> early next week about our various strategies?
> 
> 
>  ___
>  devel mailing list
>  de...@open-mpi.org
>  http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
>  ___

Re: [OMPI devel] RFC: Merge tmp fault recovery branch into trunk

2010-03-10 Thread Josh Hursey
Wesley,

Thanks for catching that oversight. Below are the MCA parameters that you 
should need at the moment:
#
# Use the C/R Process Migration Recovery Supervisor
recos_base_enable=1
# Only use the 'rsh' launcher, other launchers will be supported later
plm=rsh
# The resilient mapper knows how to use RecoS and deal with recovering procs
rmaps=resilient
# 'cm' component is the only one that can handle failures at the moment
routed=cm
#

Let me know if you have any troubles.

-- Josh

On Mar 10, 2010, at 10:36 AM, Wesley Bland wrote:

> Josh,
> 
> You mentioned some MCA parameters that you would include in the email, but I 
> don't see those parameters anywhere.  Could you please put those in here to 
> make testing easier for people.
> 
> Wesley
> 
> On Wed, Mar 10, 2010 at 1:26 PM, Josh Hursey  wrote:
> Yesterday evening George, Thomas and I discussed some of their concerns about 
> this RFC at the MPI Forum meeting. After the discussion, we seemed to be in 
> agreement that the RecoS framework is a good idea and the concepts and fixes 
> in this RFC should move forward with a couple of notes:
> 
>  - They wanted to test the branch a bit more over the next couple of days. 
> Some MCA parameters that you will need are at the bottom of this message.
> 
>  - Reiterate that this RFC only addresses ORTE stability, not OMPI stability. 
> The OMPI stability extension is a second step for the line of work, and 
> should/will fit in nicely with the RecoS framework being proposed in this 
> RFC. The OMPI layer stability will require a significant amount of work, but 
> the RecoS framework will provide the ORTE layer stability that is required as 
> a foundation for OMPI layer stability in the future.
> 
>  - The purpose of the ErrMgr becomes slightly unclear with the addition of 
> the RecoS framework, since both are focused on responding to faults in the 
> system (and RecoS, when enabled, overrides most/all of the ErrMgr 
> functionality). Should the RecoS framework be merged with the ErrMgr 
> framework to create a new ErrMgr interface?
> 
> We are typing to decide if we should merge these frameworks, but at this 
> point we are interested in hearing how other developers feel about merging 
> the ErrMgr and RecoS frameworks, which would change the ErrMgr API. Are there 
> any developers out there that are developing ErrMgr components, or are using 
> any particular features of the existing ErrMgr framework that they would like 
> to see preserved in the next revision. By default, the existing default abort 
> behavior of the ErrMgr framework will be preserved, so the user will have to 
> 'opt-in' to any fault recovery capabilities.
> 
> So we are continuing the discussion a bit more off-list, and will return to 
> the list with an updated RFC (and possibly a new branch) soon (hopefully end 
> of the week/early next week). I would like to briefly discuss this RFC at the 
> Open MPI teleconf next Tuesday.
> 
> -- Josh
> 
> On Feb 26, 2010, at 8:06 AM, Josh Hursey wrote:
> 
> > Sounds good to me.
> >
> > For those casually following this RFC let me summarize its current state.
> >
> > Josh and George (and anyone else that wishes to participate attending the 
> > forum) will meet sometime at the next MPI Forum meeting (March 8-10). I 
> > will post any relevant notes from this meeting back to the list afterwards. 
> > So the RFC is on hold pending the outcome of that meeting. For those 
> > developers interested in this RFC that will not be able to attend, feel 
> > free to continue using this thread for discussion.
> >
> > Thanks,
> > Josh
> >
> > On Feb 26, 2010, at 6:09 AM, George Bosilca wrote:
> >
> >>
> >> On Feb 26, 2010, at 01:50 , Josh Hursey wrote:
> >>
> >>> Any of those options are fine with me. I was thinking that if you wanted 
> >>> to talk sooner, we might be able to help explain our intentions with this 
> >>> framework a bit better. I figure that the framework interface will change 
> >>> a bit as we all advance and incorporate our various techniques into it. I 
> >>> think that the current interface is a good first step, but there are 
> >>> certainly many more steps to come.
> >>>
> >>> I am fine delaying this code a bit, just not too long. Meeting at the 
> >>> forum for a while might be a good option (we could probably even arrange 
> >>> to call in others if you wanted).
> >>
> >> Sounds good, let do this.
> >>
> >> Thanks,
> >>   george.
> >>
> >>>
> >>> Cheers,
> >>> Josh
> >>>
> >>> On Feb 25, 2010, at 6:45 PM, Ralph Castain wrote:
> >>>
>  If Josh is going to be at the forum, perhaps you folks could chat there? 
>  Might as well take advantage of being colocated, if possible.
> 
>  Otherwise, I'm available pretty much any time. I can't contribute much 
>  about the MPI recovery issues, but can contribute to the RTE issues if 
>  that helps.
> 
> 
>  On Thu, Feb 25, 2010 at 7:39 PM, 

Re: [OMPI devel] how to add a component in the ompi?

2010-03-10 Thread Jeff Squyres
Once you add a directory under ompi/mca/btl/ and add the relevant files, then 
the next time you run "autogen.sh", it should just "find" the component and add 
it to the configure and build process.  You should not need to edit 
ompi/mca/btl/Makefile.am yourself.

Have a look at these wiki pages; they explain point-by-point how to add a 
component into OMPI's source tree:

https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateComponent
https://svn.open-mpi.org/trac/ompi/wiki/devel/CreateFramework
https://svn.open-mpi.org/trac/ompi/wiki/devel/Autogen


On Mar 10, 2010, at 8:53 AM, hu yaohui wrote:

> Hi Jeff & All
> i want to add a new component in the ompi,
> 1: i make a dir ~/mca/btl/ht
> 2:Then,i have made sure some basic functions i need to implement.
> such as:
> mca_btl_ht_add_procs,
> mca_btl_ht_del_procs,
> mca_btl_ht_alloc
> mca_btl_ht_free
> mca_btl_ht_finalize.
> 3:after these functions,i must compile these funcitons,i copied the makefire 
> under ~/mca/btl/tcp/,
> i have seen these:
> 
>  MCA_btl_ALL_COMPONENTS =  self sm elan gm mx ofud openib portals tcp udapl
>  MCA_btl_ALL_SUBDIRS =  mca/btl/self mca/btl/sm mca/btl/elan mca/btl/gm 
> mca/btl/mx mca/btl/o
> fud mca/btl/openib mca/btl/portals mca/btl/tcp mca/btl/udapl
> 
> add my component into these two lines is just not enough.
> Can you help me out on making a right Makefile for my component under folder 
> ~/mca/blt/ht?
>  
> Thanks & Regards,
> Yaohui Hu
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/