Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Tim Prins
We have used '^' elsewhere to indicate not, so maybe just have the 
syntax be if you put '^' at the beginning of a line, that node is not used.


So we could have:
n0
n1
^headnode
n3

I understand the idea of having a flag to indicate that all nodes below 
a certain point should be ignored, but I think this might get confusing, 
and I'm unsure how useful it would be. I just see the usefulness of this 
to block out a couple of nodes by default. Besides, if you do want to 
block out many nodes, any reasonable text editor allows you to insert 
'^' in front of any number of lines easily.


Alternatively, for the particular situation that Edgar mentions, it may 
be good enough just to set rmaps_base_no_schedule_local in the mca 
params default file.


One question though: If I am in a slurm allocation which contains n1, 
and there is a default hostfile that contains "^n1", will I run on 'n1'?


I'm not sure what the answer is, I know we talked about the precedence 
earlier...


Tim

Ralph H Castain wrote:

I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:


Ralph,

could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition  on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
front-end node.

If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...

THanks
edgar

Ralph Castain wrote:

I forget all the formatting we are supposed to use, so I hope you'll all
just bear with me.

George brought up the fact that we used to have an MCA param to specify a
hostfile to use for a job. The hostfile behavior described on the wiki,
however, doesn't provide for that option. It associates a hostfile with a
specific app_context, and provides a detailed hierarchical layout of how
mpirun is to interpret that information.

What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
to replace the deprecated capability. If found, the system's behavior will
be:

1. in a managed environment, the default hostfile will be used to filter the
discovered nodes to define the available node pool. Any hostfile and/or dash
host options provided to an app_context will be used to further filter the
node pool to define the specific nodes for use by that app_context. Thus,
nodes in the hostfile and dash host options given to an app_context -must-
also be in the default hostfile in order to be available for use by that
app_context - any nodes in the app_context options that are not in the
default hostfile will be ignored.

2. in an unmanaged environment, the default hostfile will be used to define
the available node pool. Any hostfile and/or dash host options provided to
an app_context will be used to filter the node pool to define the specific
nodes for use by that app_context, subject to the previous caveat. However,
add-hostfile and add-host options will add nodes to the node pool for use
-only- by the associated app_context.


I believe this proposed behavior is consistent with that described on the
wiki, and would be relatively easy to implement. If nobody objects, I will
do so by end-of-day 3/6.

Comments, suggestions, objections - all are welcome!
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Ralph H Castain



On 3/4/08 5:51 AM, "Tim Prins"  wrote:

> We have used '^' elsewhere to indicate not, so maybe just have the
> syntax be if you put '^' at the beginning of a line, that node is not used.
> 
> So we could have:
> n0
> n1
> ^headnode
> n3
> 

That works for me and sounds like the right solution.

> I understand the idea of having a flag to indicate that all nodes below
> a certain point should be ignored, but I think this might get confusing,
> and I'm unsure how useful it would be. I just see the usefulness of this
> to block out a couple of nodes by default. Besides, if you do want to
> block out many nodes, any reasonable text editor allows you to insert
> '^' in front of any number of lines easily.
> 
> Alternatively, for the particular situation that Edgar mentions, it may
> be good enough just to set rmaps_base_no_schedule_local in the mca
> params default file.
> 
> One question though: If I am in a slurm allocation which contains n1,
> and there is a default hostfile that contains "^n1", will I run on 'n1'?

According to the precedence rules in the wiki, you would -not- run on n1.

> 
> I'm not sure what the answer is, I know we talked about the precedence
> earlier...
> 
> Tim
> 
> Ralph H Castain wrote:
>> I personally have no objection, but I would ask then that the wiki be
>> modified to cover this case. All I require is that someone define the syntax
>> to be used to indicate "this is a node I do -not- want used", or
>> alternatively a flag that indicates "all nodes below are -not- to be used".
>> 
>> Implementation isn't too hard once I have that...
>> 
>> 
>> On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:
>> 
>>> Ralph,
>>> 
>>> could this mechanism be used also to exclude a node, indicating to never
>>> run a job there? Here is the problem that I face quite often: students
>>> working on the homework forget to allocate a partition  on the cluster,
>>> and just type mpirun. Because of that, all jobs end up running on the
>>> front-end node.
>>> 
>>> If we would have now the ability to specify in a default hostfile, to
>>> never run a job on a specified node (e.g. the front end node), users
>>> would get an error message when trying to do that. I am aware that
>>> that's a little ugly...
>>> 
>>> THanks
>>> edgar
>>> 
>>> Ralph Castain wrote:
 I forget all the formatting we are supposed to use, so I hope you'll all
 just bear with me.
 
 George brought up the fact that we used to have an MCA param to specify a
 hostfile to use for a job. The hostfile behavior described on the wiki,
 however, doesn't provide for that option. It associates a hostfile with a
 specific app_context, and provides a detailed hierarchical layout of how
 mpirun is to interpret that information.
 
 What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
 to replace the deprecated capability. If found, the system's behavior will
 be:
 
 1. in a managed environment, the default hostfile will be used to filter
 the
 discovered nodes to define the available node pool. Any hostfile and/or
 dash
 host options provided to an app_context will be used to further filter the
 node pool to define the specific nodes for use by that app_context. Thus,
 nodes in the hostfile and dash host options given to an app_context -must-
 also be in the default hostfile in order to be available for use by that
 app_context - any nodes in the app_context options that are not in the
 default hostfile will be ignored.
 
 2. in an unmanaged environment, the default hostfile will be used to define
 the available node pool. Any hostfile and/or dash host options provided to
 an app_context will be used to filter the node pool to define the specific
 nodes for use by that app_context, subject to the previous caveat. However,
 add-hostfile and add-host options will add nodes to the node pool for use
 -only- by the associated app_context.
 
 
 I believe this proposed behavior is consistent with that described on the
 wiki, and would be relatively easy to implement. If nobody objects, I will
 do so by end-of-day 3/6.
 
 Comments, suggestions, objections - all are welcome!
 Ralph
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] disabling vt by default

2008-03-04 Thread Jeff Squyres
Per prior e-mails on this list, I finally got around to disabling VT  
builds by default this morning (https://svn.open-mpi.org/trac/ompi/changeset/17683 
 -- I committed before 9am Eastern, so it's, er, sorta/mostly before  
the US workday :p ).


Once the VT configury stuff is incorporated into OMPI's autogen stuff  
and we don't have timestamp issues that can cause re-autoconfs/etc.,  
we can re-enable it by default.


Dresden: any estimates on when the integration will occur?

--
Jeff Squyres
Cisco Systems



[OMPI devel] make check failing

2008-03-04 Thread Tim Prins

Hi,

We have been having a problem lately with our MTT runs where make check 
would fail when mpi threads were enabled.


Turns out the problem is that opal_init now calls 
opal_base_carto_select, which cannot find any carto modules since we 
have not done an install yet. So it returns a failure. This causes 
opal_init to abort before initializing the event engine. So when we try 
to do the threading tests, the event engine is uninitialized and fails.


So this is why it fails, but I do not know how best to fix it. Any 
suggestions would be appreciated.


Tim


Re: [OMPI devel] make check failing

2008-03-04 Thread Ralph H Castain
Carto select failing if it doesn't find any modules was called out in an
earlier message (might have been a commit log) when we set an mca-no-build
flag on that framework. This should probably be fixed - there are times when
someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we crate a
default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins"  wrote:

> Hi,
> 
> We have been having a problem lately with our MTT runs where make check
> would fail when mpi threads were enabled.
> 
> Turns out the problem is that opal_init now calls
> opal_base_carto_select, which cannot find any carto modules since we
> have not done an install yet. So it returns a failure. This causes
> opal_init to abort before initializing the event engine. So when we try
> to do the threading tests, the event engine is uninitialized and fails.
> 
> So this is why it fails, but I do not know how best to fix it. Any
> suggestions would be appreciated.
> 
> Tim
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] make check failing

2008-03-04 Thread Jeff Squyres
I think another important question is: why is this related to  
threads?  (i.e., why does it work in non-threaded builds)



On Mar 4, 2008, at 9:44 AM, Ralph H Castain wrote:

Carto select failing if it doesn't find any modules was called out  
in an
earlier message (might have been a commit log) when we set an mca-no- 
build
flag on that framework. This should probably be fixed - there are  
times when

someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we  
crate a

default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins"  wrote:


Hi,

We have been having a problem lately with our MTT runs where make  
check

would fail when mpi threads were enabled.

Turns out the problem is that opal_init now calls
opal_base_carto_select, which cannot find any carto modules since we
have not done an install yet. So it returns a failure. This causes
opal_init to abort before initializing the event engine. So when we  
try
to do the threading tests, the event engine is uninitialized and  
fails.


So this is why it fails, but I do not know how best to fix it. Any
suggestions would be appreciated.

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Edgar Gabriel

Tim Prins wrote:
We have used '^' elsewhere to indicate not, so maybe just have the 
syntax be if you put '^' at the beginning of a line, that node is not used.


So we could have:
n0
n1
^headnode
n3


this would sound fine for me.



I understand the idea of having a flag to indicate that all nodes below 
a certain point should be ignored, but I think this might get confusing, 
and I'm unsure how useful it would be. I just see the usefulness of this 
to block out a couple of nodes by default. Besides, if you do want to 
block out many nodes, any reasonable text editor allows you to insert 
'^' in front of any number of lines easily.


Alternatively, for the particular situation that Edgar mentions, it may 
be good enough just to set rmaps_base_no_schedule_local in the mca 
params default file.


hm, ok, here is another flag which I was not aware of. Anyway, I can 
think of other scenarios where this feature could be useful, e.g. when 
hunting down performance problems on a cluster and you would like to 
avoid to have to get a new allocation or do a major rewrite of the 
hostfile every time. Or including an I/O node into an allocation (in 
order to have it exclusively), but make sure that no MPI process gets 
scheduled onto the node.


Thanks
Edgar



One question though: If I am in a slurm allocation which contains n1, 
and there is a default hostfile that contains "^n1", will I run on 'n1'?


I'm not sure what the answer is, I know we talked about the precedence 
earlier...


Tim

Ralph H Castain wrote:

I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:


Ralph,

could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition  on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
front-end node.

If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...

THanks
edgar

Ralph Castain wrote:

I forget all the formatting we are supposed to use, so I hope you'll all
just bear with me.

George brought up the fact that we used to have an MCA param to specify a
hostfile to use for a job. The hostfile behavior described on the wiki,
however, doesn't provide for that option. It associates a hostfile with a
specific app_context, and provides a detailed hierarchical layout of how
mpirun is to interpret that information.

What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
to replace the deprecated capability. If found, the system's behavior will
be:

1. in a managed environment, the default hostfile will be used to filter the
discovered nodes to define the available node pool. Any hostfile and/or dash
host options provided to an app_context will be used to further filter the
node pool to define the specific nodes for use by that app_context. Thus,
nodes in the hostfile and dash host options given to an app_context -must-
also be in the default hostfile in order to be available for use by that
app_context - any nodes in the app_context options that are not in the
default hostfile will be ignored.

2. in an unmanaged environment, the default hostfile will be used to define
the available node pool. Any hostfile and/or dash host options provided to
an app_context will be used to filter the node pool to define the specific
nodes for use by that app_context, subject to the previous caveat. However,
add-hostfile and add-host options will add nodes to the node pool for use
-only- by the associated app_context.


I believe this proposed behavior is consistent with that described on the
wiki, and would be relatively easy to implement. If nobody objects, I will
do so by end-of-day 3/6.

Comments, suggestions, objections - all are welcome!
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI devel] make check failing

2008-03-04 Thread Tim Prins
Simple, because the test that eventually segfaults only runs if ompi is 
configured with threading. Otherwise it is a no-op.


Tim

Jeff Squyres wrote:
I think another important question is: why is this related to  
threads?  (i.e., why does it work in non-threaded builds)



On Mar 4, 2008, at 9:44 AM, Ralph H Castain wrote:

Carto select failing if it doesn't find any modules was called out  
in an
earlier message (might have been a commit log) when we set an mca-no- 
build
flag on that framework. This should probably be fixed - there are  
times when

someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we  
crate a

default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins"  wrote:


Hi,

We have been having a problem lately with our MTT runs where make  
check

would fail when mpi threads were enabled.

Turns out the problem is that opal_init now calls
opal_base_carto_select, which cannot find any carto modules since we
have not done an install yet. So it returns a failure. This causes
opal_init to abort before initializing the event engine. So when we  
try
to do the threading tests, the event engine is uninitialized and  
fails.


So this is why it fails, but I do not know how best to fix it. Any
suggestions would be appreciated.

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







[OMPI devel] new plpa

2008-03-04 Thread Jeff Squyres
I have the new PLPA merged down to a local copy of my trunk.  It  
involves some autogen-worthy changes, so I'll hold off committing it  
until tonight.


--
Jeff Squyres
Cisco Systems



[OMPI devel] [RFC] Reduce the number of tests run by make check

2008-03-04 Thread Tim Prins

WHAT: Reduce the number of tests run by make check

WHY: Some of the tests will not work properly until Open MPI is 
installed. Also, many of the tests do not really test anything.


WHERE: See below.

TIMEOUT: COB Friday March 14

DESCRIPTION:
We have been having many problems with make check over the years. People 
tend to change things and not update the tests, which lead to tarball 
generation failures and nightly test run failures. Furthermore, many of 
the tests test things which have not changed for years.


So with this in mind, I propose only running the following tests when 
'make check' is run:

asm/atomic_barrier
asm/atomic_barrier_noinline
asm/atomic_spinlock
asm/atomic_spinlock_noinline
asm/atomic_math
asm/atomic_math_noinline
asm/atomic_cmpset
asm/atomic_cmpset_noinline

We we would no longer run the following tests:
class/ompi_bitmap_t
class/opal_hash_table_t
class/opal_list_t
class/opal_value_array_t
class/opal_pointer_array
class/ompi_rb_tree_t
memory/opal_memory_basic
memory/opal_memory_speed
memory/opal_memory_cxx
threads/opal_thread
threads/opal_condition
datatype/ddt_test
datatype/checksum
datatype/position
peruse/mpi_peruse

These tests would not be deleted from the repository, just made so they 
do not run by default.


[OMPI devel] suggested patch for mca-btl-openib-hca-params.ini

2008-03-04 Thread Ralph Campbell
Here is a suggested patch for adding the QLogic QLE7240 and QLE7280
DDR HCA cards to the openib params file.

I would like the MTU to default to 4K for these HCAs but I don't see
any code using the ibv_port_attr.active_mtu field to limit the MTU
to the active MTU.  If you like, I can try to make a patch to do this.

--- ompi/mca/btl/openib/mca-btl-openib-hca-params.ini   2008-02-20 
08:28:32.0 -0800
+++ ompi/mca/btl/openib/mca-btl-openib-hca-params.ini.new   2008-02-25 
18:09:24.364877000 -0800
@@ -121,6 +121,12 @@

 [QLogic InfiniPath]
 vendor_id = 0x1fc1
-vendor_part_id = 13,16
+vendor_part_id = 13
 use_eager_rdma = 1
 mtu = 2048
+
+[QLogic InfiniPath]
+vendor_id = 0x1fc1,0x1077
+vendor_part_id = 16,29216
+use_eager_rdma = 1
+mtu = 4096





Re: [OMPI devel] suggested patch for mca-btl-openib-hca-params.ini

2008-03-04 Thread Jeff Squyres
Sounds good -- I don't remember who's on the schedule A for Qlogic,  
but I know that Christian Bell can commit.


Do you need this for v1.2.6?  We are literally rolling 1.2.6rc1 right  
*now*...



On Mar 4, 2008, at 2:12 PM, Ralph Campbell wrote:


Here is a suggested patch for adding the QLogic QLE7240 and QLE7280
DDR HCA cards to the openib params file.

I would like the MTU to default to 4K for these HCAs but I don't see
any code using the ibv_port_attr.active_mtu field to limit the MTU
to the active MTU.  If you like, I can try to make a patch to do this.

--- ompi/mca/btl/openib/mca-btl-openib-hca-params.ini	2008-02-20  
08:28:32.0 -0800
+++ ompi/mca/btl/openib/mca-btl-openib-hca-params.ini.new	2008-02-25  
18:09:24.364877000 -0800

@@ -121,6 +121,12 @@

[QLogic InfiniPath]
vendor_id = 0x1fc1
-vendor_part_id = 13,16
+vendor_part_id = 13
use_eager_rdma = 1
mtu = 2048
+
+[QLogic InfiniPath]
+vendor_id = 0x1fc1,0x1077
+vendor_part_id = 16,29216
+use_eager_rdma = 1
+mtu = 4096



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Greg Watson

Hi all,

Ralph informs me that significant functionality has been removed from  
ORTE in 1.3. Unfortunately this functionality was being used by PTP to  
provide support for OMPI, and without it, it seems unlikely that PTP  
will be able to work with 1.3. Apparently restoring this lost  
functionality is an "enhancement" of 1.3, and so is something that  
will not necessarily be done. Having worked with OMPI from a very  
early stage to ensure that we were able to provide robust support, I  
must say it is a bit disappointing that this approach is being taken.  
I hope that the community will view this "enhancement" as worthwhile.


Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:

>
>
> Ralph Castain  wrote on 02/29/2008 12:18:39 AM:
>
>> Ralph Castain 
>> 02/29/08 12:18 AM
>>
>> To
>>
>> Gregory R Watson/Watson/IBM@IBMUS
>>
>> cc
>>
>> Subject
>>
>> Re: OpenMPI changes
>>
>> Hi Greg
>>
>> All of the prior options (and some new ones) for spawning a job  
are fully
>> supported in the new interface. Instead of setting them with  
"attributes",
>> you create an orte_job_t object and just fill them in. This is  
precisely how
>> mpirun does it - you can look at that code if you want an  
example, though it
>> is somewhat complex. Alternatively, you can look at the way it is  
done for
>> comm_spawn, which may be more analogous to your situation - that  
code is in

>> ompi/mca/dpm/orte.
>>
>> All the tools library does is communicate the job object to the  
target
>> persistent daemon so it can do the work. This way, you don't have  
to open

>> all the frameworks, deal directly with the plm interface, etc.
>>
>> Alternatively, you are welcome to do a full orte_init and use the  
frameworks
>> yourself - there is no requirement to use the library. I only  
offer it as an

>> alternative.
>
> As far as I can tell, neither API provides the same functionality  
as that
> available in 1.2. While this might be beneficial for OMPI-specific  
activities,
> the changes appear to severely limit the interaction of tools with  
the

> runtime. At this point, I can't see either interface supporting PTP.

I went ahead and added a notification capability to the system -  
took about
30 minutes. I can provide notice of job and process state changes  
since I
see those. Node state changes, however, are different - I can notify  
on
them, but we have no way of seeing them. None of the environments we  
support

tell us when a node fails.

>
>>
>> I know that the tool library works because it uses the identical  
APIs as
>> comm_spawn and mpirun. I have also tested them by building my own  
tools.

>
> There's a big difference being on a code path that *must* work  
because it is
> used by core components, to one that is provided as an add-on for  
external

> tools. I may be worrying needlessly if this new interface becomes an
> "officially supported" API. Is that planned? At a minimum, it  
seems like it's
> going to complicate your testing process, since you're going to  
need to
> provide a separate set of tests that exercise this interface  
independent of

> the rest of OMPI.

It is an officially supported API. Testing is not as big a problem  
as you
might expect since the library exercises the same code paths as  
mpirun and

comm_spawn. Like I said, I have written my own tools that exercise the
library - no problem using them as tests.

>
>>
>> We do not launch an orted for any tool-library query. All we do is
>> communicate the query to the target persistent daemon or mpirun.  
Those
>> entities have recv's posted to catch any incoming messages and  
execute the

>> request.
>>
>> You are correct that we no longer have event driven notification  
in the
>> system. I repeatedly asked the community (on both devel and core  
lists) for
>> input on that question, and received no indications that anyone  
wanted it
>> supported. It can be added back into the system, but would  
require the
>> approval of the OMPI community. I don't know how problematic that  
would be -
>> there is a lot of concern over the amount of memory, overhead,  
and potential
>> reliability issues that surround event notification. If you want  
that
>> capability, I suggest we discuss it, come up with a plan that  
deals with
>> those issues, and then take a proposal to the devel list for  
discussion.

>>
>> As for reliability, the objectives of the last year's effort were  
precisely
>> scalability and reliability. We did a lot of work to eliminate  
recursive
>> deadlocks and improve the reliability of the code. Our current  
testing
>> indicates we had considerable success in that regard,  
particularly with the

>> recursion elimination commit earlier today.
>>
>> I would be happy to work with you to meet the PTP's needs - we'll  
just need
>> to work with the OMPI community to ensure everyone buys into the  
plan. If it
>> would help, I could come and review the new arch with the team (I  
already

[OMPI devel] getting config.guess/config.sub from upstream

2008-03-04 Thread Ralf Wildenhues
Hello,

Please note that the CVS repo for config.guess and config.sub is
outdated, development has moved to use git.
ompi_trunk/config/distscript.csh could be adjusted to pull from

and likewise for config.sub.  I'm too dumb to fix the csh script though,
for me it seems to always fail the download (it did that before, too).

Cheers,
Ralf


Re: [OMPI devel] getting config.guess/config.sub from upstream

2008-03-04 Thread Jeff Squyres

Done -- thanks!

https://svn.open-mpi.org/trac/ompi/changeset/17695
https://svn.open-mpi.org/trac/ompi/ticket/1226


On Mar 4, 2008, at 3:45 PM, Ralf Wildenhues wrote:


Hello,

Please note that the CVS repo for config.guess and config.sub is
outdated, development has moved to use git.
ompi_trunk/config/distscript.csh could be adjusted to pull from

and likewise for config.sub.  I'm too dumb to fix the csh script  
though,

for me it seems to always fail the download (it did that before, too).

Cheers,
Ralf
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Jeff Squyres

Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about  
these changes many months ago.  Everyone said they didn't want this  
functionality -- it was seen as excess functionality that Open MPI  
didn't want or need -- so it was all removed.


As such, I have to agree with Ralph that it is an "enhancement" to re- 
add the functionality.  That being said, patches are always welcome!   
IBM has signed the OMPI 3rd party contribution agreement, so it could  
be contributed directly.


Sidenote: I was also under the impression that PTP was being re-geared  
towards STCI and moving away from ORTE anyway.  Is this incorrect?




On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:


Hi all,

Ralph informs me that significant functionality has been removed from
ORTE in 1.3. Unfortunately this functionality was being used by PTP to
provide support for OMPI, and without it, it seems unlikely that PTP
will be able to work with 1.3. Apparently restoring this lost
functionality is an "enhancement" of 1.3, and so is something that
will not necessarily be done. Having worked with OMPI from a very
early stage to ensure that we were able to provide robust support, I
must say it is a bit disappointing that this approach is being taken.
I hope that the community will view this "enhancement" as worthwhile.

Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:




Ralph Castain  wrote on 02/29/2008 12:18:39 AM:


Ralph Castain 
02/29/08 12:18 AM

To

Gregory R Watson/Watson/IBM@IBMUS

cc

Subject

Re: OpenMPI changes

Hi Greg

All of the prior options (and some new ones) for spawning a job

are fully

supported in the new interface. Instead of setting them with

"attributes",

you create an orte_job_t object and just fill them in. This is

precisely how

mpirun does it - you can look at that code if you want an

example, though it

is somewhat complex. Alternatively, you can look at the way it is

done for

comm_spawn, which may be more analogous to your situation - that

code is in

ompi/mca/dpm/orte.

All the tools library does is communicate the job object to the

target

persistent daemon so it can do the work. This way, you don't have

to open

all the frameworks, deal directly with the plm interface, etc.

Alternatively, you are welcome to do a full orte_init and use the

frameworks

yourself - there is no requirement to use the library. I only

offer it as an

alternative.


As far as I can tell, neither API provides the same functionality

as that

available in 1.2. While this might be beneficial for OMPI-specific

activities,

the changes appear to severely limit the interaction of tools with

the

runtime. At this point, I can't see either interface supporting PTP.


I went ahead and added a notification capability to the system -
took about
30 minutes. I can provide notice of job and process state changes
since I
see those. Node state changes, however, are different - I can notify
on
them, but we have no way of seeing them. None of the environments we
support
tell us when a node fails.





I know that the tool library works because it uses the identical

APIs as

comm_spawn and mpirun. I have also tested them by building my own

tools.


There's a big difference being on a code path that *must* work

because it is

used by core components, to one that is provided as an add-on for

external

tools. I may be worrying needlessly if this new interface becomes an
"officially supported" API. Is that planned? At a minimum, it

seems like it's

going to complicate your testing process, since you're going to

need to

provide a separate set of tests that exercise this interface

independent of

the rest of OMPI.


It is an officially supported API. Testing is not as big a problem
as you
might expect since the library exercises the same code paths as
mpirun and
comm_spawn. Like I said, I have written my own tools that exercise  
the

library - no problem using them as tests.





We do not launch an orted for any tool-library query. All we do is
communicate the query to the target persistent daemon or mpirun.

Those

entities have recv's posted to catch any incoming messages and

execute the

request.

You are correct that we no longer have event driven notification

in the

system. I repeatedly asked the community (on both devel and core

lists) for

input on that question, and received no indications that anyone

wanted it

supported. It can be added back into the system, but would

require the

approval of the OMPI community. I don't know how problematic that

would be -

there is a lot of concern over the amount of memory, overhead,

and potential

reliability issues that surround event notification. If you want

that

capability, I suggest we discuss it, come up with a plan that

deals with

those issues, and then take a proposal to the devel list for

discussion.


As for reliability, the objectives of the last year's effort were

precisely

scalability and reliabi

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Ralph H Castain
It is buried deep-down in the thread, but I'll just reiterate it here. I
have "restored" the ability to "subscribe" to changes in job, proc, and node
state via OMPI's tool interface library. I have -not- checked this into the
trunk yet, though, until the community has a chance to consider whether or
not it wants it.

Restoring the ability to have such changes "callback" to user functions
raises the concern again about recursive behavior. We worked hard to remove
recursion from the code base, and it would be a concern to see it
potentially re-enter.

I realize there is some difference between ORTE calling back into itself vs
calling back into a user-specified function. However, unless that user truly
understands ORTE/OMPI and takes considerable precautions, it is very easy to
recreate the recursive behavior without intending to do so.

The tool interface library was built to accomplish two things:

1. help reduce the impact on external tools of changes to ORTE/OMPI
interfaces, and

2. provide a degree of separation to prevent the tool from inadvertently
causing OMPI to "behave badly"

I think we accomplished that - I would encourage you to at least consider
using the library. If there is something missing, we can always add it.

Ralph



On 3/4/08 2:37 PM, "Jeff Squyres"  wrote:

> Greg --
> 
> I admit to being a bit puzzled here.  Ralph sent around RFCs about
> these changes many months ago.  Everyone said they didn't want this
> functionality -- it was seen as excess functionality that Open MPI
> didn't want or need -- so it was all removed.
> 
> As such, I have to agree with Ralph that it is an "enhancement" to re-
> add the functionality.  That being said, patches are always welcome!
> IBM has signed the OMPI 3rd party contribution agreement, so it could
> be contributed directly.
> 
> Sidenote: I was also under the impression that PTP was being re-geared
> towards STCI and moving away from ORTE anyway.  Is this incorrect?
> 
> 
> 
> On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:
> 
>> Hi all,
>> 
>> Ralph informs me that significant functionality has been removed from
>> ORTE in 1.3. Unfortunately this functionality was being used by PTP to
>> provide support for OMPI, and without it, it seems unlikely that PTP
>> will be able to work with 1.3. Apparently restoring this lost
>> functionality is an "enhancement" of 1.3, and so is something that
>> will not necessarily be done. Having worked with OMPI from a very
>> early stage to ensure that we were able to provide robust support, I
>> must say it is a bit disappointing that this approach is being taken.
>> I hope that the community will view this "enhancement" as worthwhile.
>> 
>> Regards,
>> 
>> Greg
>> 
>> Begin forwarded message:
>> 
>>> 
>>> On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:
>>> 
 
 
 Ralph Castain  wrote on 02/29/2008 12:18:39 AM:
 
> Ralph Castain 
> 02/29/08 12:18 AM
> 
> To
> 
> Gregory R Watson/Watson/IBM@IBMUS
> 
> cc
> 
> Subject
> 
> Re: OpenMPI changes
> 
> Hi Greg
> 
> All of the prior options (and some new ones) for spawning a job
>>> are fully
> supported in the new interface. Instead of setting them with
>>> "attributes",
> you create an orte_job_t object and just fill them in. This is
>>> precisely how
> mpirun does it - you can look at that code if you want an
>>> example, though it
> is somewhat complex. Alternatively, you can look at the way it is
>>> done for
> comm_spawn, which may be more analogous to your situation - that
>>> code is in
> ompi/mca/dpm/orte.
> 
> All the tools library does is communicate the job object to the
>>> target
> persistent daemon so it can do the work. This way, you don't have
>>> to open
> all the frameworks, deal directly with the plm interface, etc.
> 
> Alternatively, you are welcome to do a full orte_init and use the
>>> frameworks
> yourself - there is no requirement to use the library. I only
>>> offer it as an
> alternative.
 
 As far as I can tell, neither API provides the same functionality
>>> as that
 available in 1.2. While this might be beneficial for OMPI-specific
>>> activities,
 the changes appear to severely limit the interaction of tools with
>>> the
 runtime. At this point, I can't see either interface supporting PTP.
>>> 
>>> I went ahead and added a notification capability to the system -
>>> took about
>>> 30 minutes. I can provide notice of job and process state changes
>>> since I
>>> see those. Node state changes, however, are different - I can notify
>>> on
>>> them, but we have no way of seeing them. None of the environments we
>>> support
>>> tell us when a node fails.
>>> 
 
> 
> I know that the tool library works because it uses the identical
>>> APIs as
> comm_spawn and mpirun. I have also tested them by building my own
>>> tools.
 
 There's a big difference being on a code path t

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Greg Watson
I certainly don't (nor anyone in PTP as far as I know) have the  
resources to re-add functionality to OMPI, so unfortunately it appears  
that 1.2 will be the end of the line for PTP supported versions. As I  
mentioned to Ralph, I don't follow your developer discussions closely  
enough to understand the details of every change that is proposed.  
Since PTP has provided requirements and been supported since 1.0, I  
was under the (seemingly incorrect) impression that this support would  
continue in future versions.


PTP will very likely support STCI when it becomes available. However,  
the intention was to continue to support OMPI also. Maybe this will be  
possible without ORTE, but it seems uncertain at this stage.


Greg

On Mar 4, 2008, at 4:37 PM, Jeff Squyres wrote:


Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about
these changes many months ago.  Everyone said they didn't want this
functionality -- it was seen as excess functionality that Open MPI
didn't want or need -- so it was all removed.

As such, I have to agree with Ralph that it is an "enhancement" to re-
add the functionality.  That being said, patches are always welcome!
IBM has signed the OMPI 3rd party contribution agreement, so it could
be contributed directly.

Sidenote: I was also under the impression that PTP was being re-geared
towards STCI and moving away from ORTE anyway.  Is this incorrect?



On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:


Hi all,

Ralph informs me that significant functionality has been removed from
ORTE in 1.3. Unfortunately this functionality was being used by PTP  
to

provide support for OMPI, and without it, it seems unlikely that PTP
will be able to work with 1.3. Apparently restoring this lost
functionality is an "enhancement" of 1.3, and so is something that
will not necessarily be done. Having worked with OMPI from a very
early stage to ensure that we were able to provide robust support, I
must say it is a bit disappointing that this approach is being taken.
I hope that the community will view this "enhancement" as worthwhile.

Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:




Ralph Castain  wrote on 02/29/2008 12:18:39 AM:


Ralph Castain 
02/29/08 12:18 AM

To

Gregory R Watson/Watson/IBM@IBMUS

cc

Subject

Re: OpenMPI changes

Hi Greg

All of the prior options (and some new ones) for spawning a job

are fully

supported in the new interface. Instead of setting them with

"attributes",

you create an orte_job_t object and just fill them in. This is

precisely how

mpirun does it - you can look at that code if you want an

example, though it

is somewhat complex. Alternatively, you can look at the way it is

done for

comm_spawn, which may be more analogous to your situation - that

code is in

ompi/mca/dpm/orte.

All the tools library does is communicate the job object to the

target

persistent daemon so it can do the work. This way, you don't have

to open

all the frameworks, deal directly with the plm interface, etc.

Alternatively, you are welcome to do a full orte_init and use the

frameworks

yourself - there is no requirement to use the library. I only

offer it as an

alternative.


As far as I can tell, neither API provides the same functionality

as that

available in 1.2. While this might be beneficial for OMPI-specific

activities,

the changes appear to severely limit the interaction of tools with

the
runtime. At this point, I can't see either interface supporting  
PTP.


I went ahead and added a notification capability to the system -
took about
30 minutes. I can provide notice of job and process state changes
since I
see those. Node state changes, however, are different - I can notify
on
them, but we have no way of seeing them. None of the environments we
support
tell us when a node fails.





I know that the tool library works because it uses the identical

APIs as

comm_spawn and mpirun. I have also tested them by building my own

tools.


There's a big difference being on a code path that *must* work

because it is

used by core components, to one that is provided as an add-on for

external
tools. I may be worrying needlessly if this new interface becomes  
an

"officially supported" API. Is that planned? At a minimum, it

seems like it's

going to complicate your testing process, since you're going to

need to

provide a separate set of tests that exercise this interface

independent of

the rest of OMPI.


It is an officially supported API. Testing is not as big a problem
as you
might expect since the library exercises the same code paths as
mpirun and
comm_spawn. Like I said, I have written my own tools that exercise
the
library - no problem using them as tests.





We do not launch an orted for any tool-library query. All we do is
communicate the query to the target persistent daemon or mpirun.

Those

entities have recv's posted to catch any incoming messages and

execute the

request.

You are 

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Greg Watson
I don't have a problem using a different interface, assuming it's  
adequately supported and provides the functionality we need. I presume  
the recursive behavior you're referring to is calling OMPI interfaces  
from the callback functions. Any event-based system has this issue,  
and it is usually solved by clearly specifying the allowable  
interfaces that can be called (possibly none). Since PTP doesn't call  
OMPI functions from callbacks, it's not a problem for us if no  
interfaces can be called.


The major missing features appear to be:

- Ability to request a process allocation without launching the job
- I/O forwarding callbacks

Without these, PTP support will be so limited that I'd be reluctant to  
say we support OMPI.


Greg

On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:

It is buried deep-down in the thread, but I'll just reiterate it  
here. I
have "restored" the ability to "subscribe" to changes in job, proc,  
and node
state via OMPI's tool interface library. I have -not- checked this  
into the
trunk yet, though, until the community has a chance to consider  
whether or

not it wants it.

Restoring the ability to have such changes "callback" to user  
functions
raises the concern again about recursive behavior. We worked hard to  
remove

recursion from the code base, and it would be a concern to see it
potentially re-enter.

I realize there is some difference between ORTE calling back into  
itself vs
calling back into a user-specified function. However, unless that  
user truly
understands ORTE/OMPI and takes considerable precautions, it is very  
easy to

recreate the recursive behavior without intending to do so.

The tool interface library was built to accomplish two things:

1. help reduce the impact on external tools of changes to ORTE/OMPI
interfaces, and

2. provide a degree of separation to prevent the tool from  
inadvertently

causing OMPI to "behave badly"

I think we accomplished that - I would encourage you to at least  
consider
using the library. If there is something missing, we can always add  
it.


Ralph



On 3/4/08 2:37 PM, "Jeff Squyres"  wrote:


Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about
these changes many months ago.  Everyone said they didn't want this
functionality -- it was seen as excess functionality that Open MPI
didn't want or need -- so it was all removed.

As such, I have to agree with Ralph that it is an "enhancement" to  
re-

add the functionality.  That being said, patches are always welcome!
IBM has signed the OMPI 3rd party contribution agreement, so it could
be contributed directly.

Sidenote: I was also under the impression that PTP was being re- 
geared

towards STCI and moving away from ORTE anyway.  Is this incorrect?



On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:


Hi all,

Ralph informs me that significant functionality has been removed  
from
ORTE in 1.3. Unfortunately this functionality was being used by  
PTP to

provide support for OMPI, and without it, it seems unlikely that PTP
will be able to work with 1.3. Apparently restoring this lost
functionality is an "enhancement" of 1.3, and so is something that
will not necessarily be done. Having worked with OMPI from a very
early stage to ensure that we were able to provide robust support, I
must say it is a bit disappointing that this approach is being  
taken.
I hope that the community will view this "enhancement" as  
worthwhile.


Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:




Ralph Castain  wrote on 02/29/2008 12:18:39 AM:


Ralph Castain 
02/29/08 12:18 AM

To

Gregory R Watson/Watson/IBM@IBMUS

cc

Subject

Re: OpenMPI changes

Hi Greg

All of the prior options (and some new ones) for spawning a job

are fully

supported in the new interface. Instead of setting them with

"attributes",

you create an orte_job_t object and just fill them in. This is

precisely how

mpirun does it - you can look at that code if you want an

example, though it

is somewhat complex. Alternatively, you can look at the way it is

done for

comm_spawn, which may be more analogous to your situation - that

code is in

ompi/mca/dpm/orte.

All the tools library does is communicate the job object to the

target

persistent daemon so it can do the work. This way, you don't have

to open

all the frameworks, deal directly with the plm interface, etc.

Alternatively, you are welcome to do a full orte_init and use the

frameworks

yourself - there is no requirement to use the library. I only

offer it as an

alternative.


As far as I can tell, neither API provides the same functionality

as that

available in 1.2. While this might be beneficial for OMPI-specific

activities,

the changes appear to severely limit the interaction of tools with

the
runtime. At this point, I can't see either interface supporting  
PTP.


I went ahead and added a notification capability to the system -
took about
30 minutes. I can provide notic

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Ralph Castain
Yeah, the problem we had in the past was:

1. something would trigger in the system - e.g., a particular job state was
reached. This would cause us to execute a callback function via the GPR

2. the callback function would take some action. Typically, this involved
sending out a message or calling another function. Either way, the eventual
result of that action would be to cause another GPR trigger to fire - either
the job or a process changing state

This loop would continue ad infinitum. Sometimes, I would see stack traces
hundreds of calls deep. Debugging and maintaining something that intertwined
was impossible.

People tried to impose order by establishing rules about what could and
could not be called from various situations, but that also proved
intractable. Problem was that we could get it to work for a "normal" code
path, but all the variety of failure modes, combined with all the
flexibility built into the code base, created so many code paths that you
inevitably wound up deadlocked under some corner case conditions.

Which we generally agreed was unacceptable.

It -is- possible to have callback functions that avoid this situation.
However, it is very easy to make a mistake and "hang" the whole system. Just
seemed easier to avoid the entire problem. (I don't get that option!)

The ability to get an allocation without launching is easy to add.

I/O forwarding is currently an issue. Our IOF doesn't seem to like it when I
try to create an "alternate" tap (the default always goes back through the
persistent orted, so the tool looks like a second "tap" on the flow). This
is noted as a "bug" on our tracker, and I expect it will be addressed prior
to releasing 1.3. I will ask that it be raised in priority.

I'll review what I had done and see about bringing it into the trunk by the
end of the week.

Ralph



On 3/4/08 4:00 PM, "Greg Watson"  wrote:

> I don't have a problem using a different interface, assuming it's
> adequately supported and provides the functionality we need. I presume
> the recursive behavior you're referring to is calling OMPI interfaces
> from the callback functions. Any event-based system has this issue,
> and it is usually solved by clearly specifying the allowable
> interfaces that can be called (possibly none). Since PTP doesn't call
> OMPI functions from callbacks, it's not a problem for us if no
> interfaces can be called.
> 
> The major missing features appear to be:
> 
> - Ability to request a process allocation without launching the job
> - I/O forwarding callbacks
> 
> Without these, PTP support will be so limited that I'd be reluctant to
> say we support OMPI.
> 
> Greg
> 
> On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:
> 
>> It is buried deep-down in the thread, but I'll just reiterate it
>> here. I
>> have "restored" the ability to "subscribe" to changes in job, proc,
>> and node
>> state via OMPI's tool interface library. I have -not- checked this
>> into the
>> trunk yet, though, until the community has a chance to consider
>> whether or
>> not it wants it.
>> 
>> Restoring the ability to have such changes "callback" to user
>> functions
>> raises the concern again about recursive behavior. We worked hard to
>> remove
>> recursion from the code base, and it would be a concern to see it
>> potentially re-enter.
>> 
>> I realize there is some difference between ORTE calling back into
>> itself vs
>> calling back into a user-specified function. However, unless that
>> user truly
>> understands ORTE/OMPI and takes considerable precautions, it is very
>> easy to
>> recreate the recursive behavior without intending to do so.
>> 
>> The tool interface library was built to accomplish two things:
>> 
>> 1. help reduce the impact on external tools of changes to ORTE/OMPI
>> interfaces, and
>> 
>> 2. provide a degree of separation to prevent the tool from
>> inadvertently
>> causing OMPI to "behave badly"
>> 
>> I think we accomplished that - I would encourage you to at least
>> consider
>> using the library. If there is something missing, we can always add
>> it.
>> 
>> Ralph
>> 
>> 
>> 
>> On 3/4/08 2:37 PM, "Jeff Squyres"  wrote:
>> 
>>> Greg --
>>> 
>>> I admit to being a bit puzzled here.  Ralph sent around RFCs about
>>> these changes many months ago.  Everyone said they didn't want this
>>> functionality -- it was seen as excess functionality that Open MPI
>>> didn't want or need -- so it was all removed.
>>> 
>>> As such, I have to agree with Ralph that it is an "enhancement" to
>>> re-
>>> add the functionality.  That being said, patches are always welcome!
>>> IBM has signed the OMPI 3rd party contribution agreement, so it could
>>> be contributed directly.
>>> 
>>> Sidenote: I was also under the impression that PTP was being re-
>>> geared
>>> towards STCI and moving away from ORTE anyway.  Is this incorrect?
>>> 
>>> 
>>> 
>>> On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:
>>> 
 Hi all,
 
 Ralph informs me that significant functionality has been rem

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Greg Watson

Ralph,

Looking at PTP, the only thing we need is to query the process  
information (PID, rank, node) when the job is created. Perhaps if only  
queries are allowed from callbacks then recursion would be eliminated?


If you can get this functionality into your new interface and back in  
the trunk, I take a look at porting PTP to use it.


Thanks,
Greg

On Mar 4, 2008, at 6:14 PM, Ralph Castain wrote:


Yeah, the problem we had in the past was:

1. something would trigger in the system - e.g., a particular job  
state was
reached. This would cause us to execute a callback function via the  
GPR


2. the callback function would take some action. Typically, this  
involved
sending out a message or calling another function. Either way, the  
eventual
result of that action would be to cause another GPR trigger to fire  
- either

the job or a process changing state

This loop would continue ad infinitum. Sometimes, I would see stack  
traces
hundreds of calls deep. Debugging and maintaining something that  
intertwined

was impossible.

People tried to impose order by establishing rules about what could  
and

could not be called from various situations, but that also proved
intractable. Problem was that we could get it to work for a "normal"  
code

path, but all the variety of failure modes, combined with all the
flexibility built into the code base, created so many code paths  
that you

inevitably wound up deadlocked under some corner case conditions.

Which we generally agreed was unacceptable.

It -is- possible to have callback functions that avoid this situation.
However, it is very easy to make a mistake and "hang" the whole  
system. Just

seemed easier to avoid the entire problem. (I don't get that option!)

The ability to get an allocation without launching is easy to add.

I/O forwarding is currently an issue. Our IOF doesn't seem to like  
it when I
try to create an "alternate" tap (the default always goes back  
through the
persistent orted, so the tool looks like a second "tap" on the  
flow). This
is noted as a "bug" on our tracker, and I expect it will be  
addressed prior

to releasing 1.3. I will ask that it be raised in priority.

I'll review what I had done and see about bringing it into the trunk  
by the

end of the week.

Ralph



On 3/4/08 4:00 PM, "Greg Watson"  wrote:


I don't have a problem using a different interface, assuming it's
adequately supported and provides the functionality we need. I  
presume

the recursive behavior you're referring to is calling OMPI interfaces
from the callback functions. Any event-based system has this issue,
and it is usually solved by clearly specifying the allowable
interfaces that can be called (possibly none). Since PTP doesn't call
OMPI functions from callbacks, it's not a problem for us if no
interfaces can be called.

The major missing features appear to be:

- Ability to request a process allocation without launching the job
- I/O forwarding callbacks

Without these, PTP support will be so limited that I'd be reluctant  
to

say we support OMPI.

Greg

On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:


It is buried deep-down in the thread, but I'll just reiterate it
here. I
have "restored" the ability to "subscribe" to changes in job, proc,
and node
state via OMPI's tool interface library. I have -not- checked this
into the
trunk yet, though, until the community has a chance to consider
whether or
not it wants it.

Restoring the ability to have such changes "callback" to user
functions
raises the concern again about recursive behavior. We worked hard to
remove
recursion from the code base, and it would be a concern to see it
potentially re-enter.

I realize there is some difference between ORTE calling back into
itself vs
calling back into a user-specified function. However, unless that
user truly
understands ORTE/OMPI and takes considerable precautions, it is very
easy to
recreate the recursive behavior without intending to do so.

The tool interface library was built to accomplish two things:

1. help reduce the impact on external tools of changes to ORTE/OMPI
interfaces, and

2. provide a degree of separation to prevent the tool from
inadvertently
causing OMPI to "behave badly"

I think we accomplished that - I would encourage you to at least
consider
using the library. If there is something missing, we can always add
it.

Ralph



On 3/4/08 2:37 PM, "Jeff Squyres"  wrote:


Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about
these changes many months ago.  Everyone said they didn't want this
functionality -- it was seen as excess functionality that Open MPI
didn't want or need -- so it was all removed.

As such, I have to agree with Ralph that it is an "enhancement" to
re-
add the functionality.  That being said, patches are always  
welcome!
IBM has signed the OMPI 3rd party contribution agreement, so it  
could

be contributed directly.

Sidenote: I was also under the impression that PTP was being re-
geare

[OMPI devel] PLPA update: #@$!@#$

2008-03-04 Thread Jeff Squyres
My apologies; apparently the SVN merge of PLPA somehow didn't work  
properly.  The next time you "svn up", you'll get a conflict warning  
about opal/mca/paffinity/linux/plpa already existing.  Do this to fix  
the problem:


cd path-to-your-ompi-checkout
svn up
# see warning
rm -rf opal/mca/paffinity/linux/plpa
svn up

And then it should be fine.

I'm sorry about this.  :-(

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Ralph Castain
I'll try to get the code into the trunk before I go on vacation for a week
on Fri. If not, I'll let you know and get it the week I get back (3/17).
Basically, all I do is define an event in our event library that "fires" to
send a message to you when the defined trigger occurs.

If that is all you need, though, we could explore providing callback
capability inside of ORTE for it. The data you are looking for is sitting in
objects inside a set of global arrays - you could easily pick that off. We
could try having the event call your callback function and let you do
something with the data (I imagine display the status or something?).

We would just need to be careful about not getting locked in there and
blocking other events from "firing" as they are needed to do things like
respond to errors.

I'm willing to create it on a tmp branch to try when I get back, if that's
something you want to do. Either way, we'll have the tool library interface
to fall back upon as it is needed by other tools.

Ralph



On 3/4/08 4:41 PM, "Greg Watson"  wrote:

> Ralph,
> 
> Looking at PTP, the only thing we need is to query the process
> information (PID, rank, node) when the job is created. Perhaps if only
> queries are allowed from callbacks then recursion would be eliminated?
> 
> If you can get this functionality into your new interface and back in
> the trunk, I take a look at porting PTP to use it.
> 
> Thanks,
> Greg
> 
> On Mar 4, 2008, at 6:14 PM, Ralph Castain wrote:
> 
>> Yeah, the problem we had in the past was:
>> 
>> 1. something would trigger in the system - e.g., a particular job
>> state was
>> reached. This would cause us to execute a callback function via the
>> GPR
>> 
>> 2. the callback function would take some action. Typically, this
>> involved
>> sending out a message or calling another function. Either way, the
>> eventual
>> result of that action would be to cause another GPR trigger to fire
>> - either
>> the job or a process changing state
>> 
>> This loop would continue ad infinitum. Sometimes, I would see stack
>> traces
>> hundreds of calls deep. Debugging and maintaining something that
>> intertwined
>> was impossible.
>> 
>> People tried to impose order by establishing rules about what could
>> and
>> could not be called from various situations, but that also proved
>> intractable. Problem was that we could get it to work for a "normal"
>> code
>> path, but all the variety of failure modes, combined with all the
>> flexibility built into the code base, created so many code paths
>> that you
>> inevitably wound up deadlocked under some corner case conditions.
>> 
>> Which we generally agreed was unacceptable.
>> 
>> It -is- possible to have callback functions that avoid this situation.
>> However, it is very easy to make a mistake and "hang" the whole
>> system. Just
>> seemed easier to avoid the entire problem. (I don't get that option!)
>> 
>> The ability to get an allocation without launching is easy to add.
>> 
>> I/O forwarding is currently an issue. Our IOF doesn't seem to like
>> it when I
>> try to create an "alternate" tap (the default always goes back
>> through the
>> persistent orted, so the tool looks like a second "tap" on the
>> flow). This
>> is noted as a "bug" on our tracker, and I expect it will be
>> addressed prior
>> to releasing 1.3. I will ask that it be raised in priority.
>> 
>> I'll review what I had done and see about bringing it into the trunk
>> by the
>> end of the week.
>> 
>> Ralph
>> 
>> 
>> 
>> On 3/4/08 4:00 PM, "Greg Watson"  wrote:
>> 
>>> I don't have a problem using a different interface, assuming it's
>>> adequately supported and provides the functionality we need. I
>>> presume
>>> the recursive behavior you're referring to is calling OMPI interfaces
>>> from the callback functions. Any event-based system has this issue,
>>> and it is usually solved by clearly specifying the allowable
>>> interfaces that can be called (possibly none). Since PTP doesn't call
>>> OMPI functions from callbacks, it's not a problem for us if no
>>> interfaces can be called.
>>> 
>>> The major missing features appear to be:
>>> 
>>> - Ability to request a process allocation without launching the job
>>> - I/O forwarding callbacks
>>> 
>>> Without these, PTP support will be so limited that I'd be reluctant
>>> to
>>> say we support OMPI.
>>> 
>>> Greg
>>> 
>>> On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:
>>> 
 It is buried deep-down in the thread, but I'll just reiterate it
 here. I
 have "restored" the ability to "subscribe" to changes in job, proc,
 and node
 state via OMPI's tool interface library. I have -not- checked this
 into the
 trunk yet, though, until the community has a chance to consider
 whether or
 not it wants it.
 
 Restoring the ability to have such changes "callback" to user
 functions
 raises the concern again about recursive behavior. We worked hard to
 remove
 recursion from the code base

Re: [OMPI devel] PLPA update: #@$!@#$

2008-03-04 Thread Jeff Squyres
Never mind; this commit didn't work at all.  I'm going to back it  
out.  :-(


On Mar 4, 2008, at 7:25 PM, Jeff Squyres wrote:


My apologies; apparently the SVN merge of PLPA somehow didn't work
properly.  The next time you "svn up", you'll get a conflict warning
about opal/mca/paffinity/linux/plpa already existing.  Do this to fix
the problem:

cd path-to-your-ompi-checkout
svn up
# see warning
rm -rf opal/mca/paffinity/linux/plpa
svn up

And then it should be fine.

I'm sorry about this.  :-(

--
Jeff Squyres
Cisco Systems

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] documentation trac ticket type

2008-03-04 Thread Jeff Squyres
I just added a "documentation" trac ticket type.  Its intent is for  
user-visible changes that are worth documenting for the v1.3 release  
(this likely means the FAQ for the moment).


Two obvious examples (that I just filed tickets for):

- the new OMPI_* environment variables for indicating COMM_WORLD rank,  
size, local rank, and the universe size (https://svn.open-mpi.org/trac/ompi/ticket/1228 
)


- the new socket/core rankfile notation (https://svn.open-mpi.org/trac/ompi/ticket/1229 
)


--
Jeff Squyres
Cisco Systems



[OMPI devel] Orte cleanup

2008-03-04 Thread Aurélien Bouteiller
I noticed that the new release of orte is not as good as it used to be  
to cleanup the mess left by crashed/aborted mpi processes. Recently We  
have been experiencing a lot of zombie or live locked processes  
running on the cluster nodes and disturbing following experiments. I  
didn't really had time to investigate the issue, maybe ralph can set a  
ticket if he is able to reproduce this.


Aurelien
--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321







[OMPI devel] plpa upgrade

2008-03-04 Thread Jeff Squyres
I am now reasonable sure that the trunk plpa upgrade has been  
completed successfully.  You will still need to remove the old "plpa"  
directory when you "svn up":


cd path/to/your/ompi/checkout
rm -rf opal/mca/paffinity/linux/plpa
svn up

That shouldn't be necessary, but apparently I borked up somewhere  
along the line in the merge and couldn't figure out how to un-bork it  
(I can explain if anyone cares).  I'm sorry about that.  :-\


Please let me know if you run into any problems.

--
Jeff Squyres
Cisco Systems



[OMPI devel] Parallel debugger integration

2008-03-04 Thread Jeff Squyres

Per the teleconference today:

Because of the PLPA integration debacle tonight, I didn't get the new  
MPI handle debugging and totalview message queue bootstrapping stuff  
merged in today.  So please hold off on any testing of that stuff for  
another day or two.  I'll update when I get that stuff merged to the  
trunk and integrated with the ORTE changes.


Thanks.

--
Jeff Squyres
Cisco Systems