[OMPI devel] documentation trac ticket type

2008-03-04 Thread Jeff Squyres
I just added a "documentation" trac ticket type.  Its intent is for  
user-visible changes that are worth documenting for the v1.3 release  
(this likely means the FAQ for the moment).


Two obvious examples (that I just filed tickets for):

- the new OMPI_* environment variables for indicating COMM_WORLD rank,  
size, local rank, and the universe size (https://svn.open-mpi.org/trac/ompi/ticket/1228 
)


- the new socket/core rankfile notation (https://svn.open-mpi.org/trac/ompi/ticket/1229 
)


--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Greg Watson
I don't have a problem using a different interface, assuming it's  
adequately supported and provides the functionality we need. I presume  
the recursive behavior you're referring to is calling OMPI interfaces  
from the callback functions. Any event-based system has this issue,  
and it is usually solved by clearly specifying the allowable  
interfaces that can be called (possibly none). Since PTP doesn't call  
OMPI functions from callbacks, it's not a problem for us if no  
interfaces can be called.


The major missing features appear to be:

- Ability to request a process allocation without launching the job
- I/O forwarding callbacks

Without these, PTP support will be so limited that I'd be reluctant to  
say we support OMPI.


Greg

On Mar 4, 2008, at 4:50 PM, Ralph H Castain wrote:

It is buried deep-down in the thread, but I'll just reiterate it  
here. I
have "restored" the ability to "subscribe" to changes in job, proc,  
and node
state via OMPI's tool interface library. I have -not- checked this  
into the
trunk yet, though, until the community has a chance to consider  
whether or

not it wants it.

Restoring the ability to have such changes "callback" to user  
functions
raises the concern again about recursive behavior. We worked hard to  
remove

recursion from the code base, and it would be a concern to see it
potentially re-enter.

I realize there is some difference between ORTE calling back into  
itself vs
calling back into a user-specified function. However, unless that  
user truly
understands ORTE/OMPI and takes considerable precautions, it is very  
easy to

recreate the recursive behavior without intending to do so.

The tool interface library was built to accomplish two things:

1. help reduce the impact on external tools of changes to ORTE/OMPI
interfaces, and

2. provide a degree of separation to prevent the tool from  
inadvertently

causing OMPI to "behave badly"

I think we accomplished that - I would encourage you to at least  
consider
using the library. If there is something missing, we can always add  
it.


Ralph



On 3/4/08 2:37 PM, "Jeff Squyres"  wrote:


Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about
these changes many months ago.  Everyone said they didn't want this
functionality -- it was seen as excess functionality that Open MPI
didn't want or need -- so it was all removed.

As such, I have to agree with Ralph that it is an "enhancement" to  
re-

add the functionality.  That being said, patches are always welcome!
IBM has signed the OMPI 3rd party contribution agreement, so it could
be contributed directly.

Sidenote: I was also under the impression that PTP was being re- 
geared

towards STCI and moving away from ORTE anyway.  Is this incorrect?



On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:


Hi all,

Ralph informs me that significant functionality has been removed  
from
ORTE in 1.3. Unfortunately this functionality was being used by  
PTP to

provide support for OMPI, and without it, it seems unlikely that PTP
will be able to work with 1.3. Apparently restoring this lost
functionality is an "enhancement" of 1.3, and so is something that
will not necessarily be done. Having worked with OMPI from a very
early stage to ensure that we were able to provide robust support, I
must say it is a bit disappointing that this approach is being  
taken.
I hope that the community will view this "enhancement" as  
worthwhile.


Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:




Ralph Castain  wrote on 02/29/2008 12:18:39 AM:


Ralph Castain 
02/29/08 12:18 AM

To

Gregory R Watson/Watson/IBM@IBMUS

cc

Subject

Re: OpenMPI changes

Hi Greg

All of the prior options (and some new ones) for spawning a job

are fully

supported in the new interface. Instead of setting them with

"attributes",

you create an orte_job_t object and just fill them in. This is

precisely how

mpirun does it - you can look at that code if you want an

example, though it

is somewhat complex. Alternatively, you can look at the way it is

done for

comm_spawn, which may be more analogous to your situation - that

code is in

ompi/mca/dpm/orte.

All the tools library does is communicate the job object to the

target

persistent daemon so it can do the work. This way, you don't have

to open

all the frameworks, deal directly with the plm interface, etc.

Alternatively, you are welcome to do a full orte_init and use the

frameworks

yourself - there is no requirement to use the library. I only

offer it as an

alternative.


As far as I can tell, neither API provides the same functionality

as that

available in 1.2. While this might be beneficial for OMPI-specific

activities,

the changes appear to severely limit the interaction of tools with

the
runtime. At this point, I can't see either interface supporting  
PTP.


I went ahead and added a notification 

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Ralph H Castain
It is buried deep-down in the thread, but I'll just reiterate it here. I
have "restored" the ability to "subscribe" to changes in job, proc, and node
state via OMPI's tool interface library. I have -not- checked this into the
trunk yet, though, until the community has a chance to consider whether or
not it wants it.

Restoring the ability to have such changes "callback" to user functions
raises the concern again about recursive behavior. We worked hard to remove
recursion from the code base, and it would be a concern to see it
potentially re-enter.

I realize there is some difference between ORTE calling back into itself vs
calling back into a user-specified function. However, unless that user truly
understands ORTE/OMPI and takes considerable precautions, it is very easy to
recreate the recursive behavior without intending to do so.

The tool interface library was built to accomplish two things:

1. help reduce the impact on external tools of changes to ORTE/OMPI
interfaces, and

2. provide a degree of separation to prevent the tool from inadvertently
causing OMPI to "behave badly"

I think we accomplished that - I would encourage you to at least consider
using the library. If there is something missing, we can always add it.

Ralph



On 3/4/08 2:37 PM, "Jeff Squyres"  wrote:

> Greg --
> 
> I admit to being a bit puzzled here.  Ralph sent around RFCs about
> these changes many months ago.  Everyone said they didn't want this
> functionality -- it was seen as excess functionality that Open MPI
> didn't want or need -- so it was all removed.
> 
> As such, I have to agree with Ralph that it is an "enhancement" to re-
> add the functionality.  That being said, patches are always welcome!
> IBM has signed the OMPI 3rd party contribution agreement, so it could
> be contributed directly.
> 
> Sidenote: I was also under the impression that PTP was being re-geared
> towards STCI and moving away from ORTE anyway.  Is this incorrect?
> 
> 
> 
> On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:
> 
>> Hi all,
>> 
>> Ralph informs me that significant functionality has been removed from
>> ORTE in 1.3. Unfortunately this functionality was being used by PTP to
>> provide support for OMPI, and without it, it seems unlikely that PTP
>> will be able to work with 1.3. Apparently restoring this lost
>> functionality is an "enhancement" of 1.3, and so is something that
>> will not necessarily be done. Having worked with OMPI from a very
>> early stage to ensure that we were able to provide robust support, I
>> must say it is a bit disappointing that this approach is being taken.
>> I hope that the community will view this "enhancement" as worthwhile.
>> 
>> Regards,
>> 
>> Greg
>> 
>> Begin forwarded message:
>> 
>>> 
>>> On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:
>>> 
 
 
 Ralph Castain  wrote on 02/29/2008 12:18:39 AM:
 
> Ralph Castain 
> 02/29/08 12:18 AM
> 
> To
> 
> Gregory R Watson/Watson/IBM@IBMUS
> 
> cc
> 
> Subject
> 
> Re: OpenMPI changes
> 
> Hi Greg
> 
> All of the prior options (and some new ones) for spawning a job
>>> are fully
> supported in the new interface. Instead of setting them with
>>> "attributes",
> you create an orte_job_t object and just fill them in. This is
>>> precisely how
> mpirun does it - you can look at that code if you want an
>>> example, though it
> is somewhat complex. Alternatively, you can look at the way it is
>>> done for
> comm_spawn, which may be more analogous to your situation - that
>>> code is in
> ompi/mca/dpm/orte.
> 
> All the tools library does is communicate the job object to the
>>> target
> persistent daemon so it can do the work. This way, you don't have
>>> to open
> all the frameworks, deal directly with the plm interface, etc.
> 
> Alternatively, you are welcome to do a full orte_init and use the
>>> frameworks
> yourself - there is no requirement to use the library. I only
>>> offer it as an
> alternative.
 
 As far as I can tell, neither API provides the same functionality
>>> as that
 available in 1.2. While this might be beneficial for OMPI-specific
>>> activities,
 the changes appear to severely limit the interaction of tools with
>>> the
 runtime. At this point, I can't see either interface supporting PTP.
>>> 
>>> I went ahead and added a notification capability to the system -
>>> took about
>>> 30 minutes. I can provide notice of job and process state changes
>>> since I
>>> see those. Node state changes, however, are different - I can notify
>>> on
>>> them, but we have no way of seeing them. None of the environments we
>>> support
>>> tell us when a node fails.
>>> 
 
> 
> I know that the tool library works because it uses the identical
>>> APIs as
> comm_spawn and mpirun. I have also tested them by building my own

Re: [OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Jeff Squyres

Greg --

I admit to being a bit puzzled here.  Ralph sent around RFCs about  
these changes many months ago.  Everyone said they didn't want this  
functionality -- it was seen as excess functionality that Open MPI  
didn't want or need -- so it was all removed.


As such, I have to agree with Ralph that it is an "enhancement" to re- 
add the functionality.  That being said, patches are always welcome!   
IBM has signed the OMPI 3rd party contribution agreement, so it could  
be contributed directly.


Sidenote: I was also under the impression that PTP was being re-geared  
towards STCI and moving away from ORTE anyway.  Is this incorrect?




On Mar 4, 2008, at 3:24 PM, Greg Watson wrote:


Hi all,

Ralph informs me that significant functionality has been removed from
ORTE in 1.3. Unfortunately this functionality was being used by PTP to
provide support for OMPI, and without it, it seems unlikely that PTP
will be able to work with 1.3. Apparently restoring this lost
functionality is an "enhancement" of 1.3, and so is something that
will not necessarily be done. Having worked with OMPI from a very
early stage to ensure that we were able to provide robust support, I
must say it is a bit disappointing that this approach is being taken.
I hope that the community will view this "enhancement" as worthwhile.

Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:




Ralph Castain  wrote on 02/29/2008 12:18:39 AM:


Ralph Castain 
02/29/08 12:18 AM

To

Gregory R Watson/Watson/IBM@IBMUS

cc

Subject

Re: OpenMPI changes

Hi Greg

All of the prior options (and some new ones) for spawning a job

are fully

supported in the new interface. Instead of setting them with

"attributes",

you create an orte_job_t object and just fill them in. This is

precisely how

mpirun does it - you can look at that code if you want an

example, though it

is somewhat complex. Alternatively, you can look at the way it is

done for

comm_spawn, which may be more analogous to your situation - that

code is in

ompi/mca/dpm/orte.

All the tools library does is communicate the job object to the

target

persistent daemon so it can do the work. This way, you don't have

to open

all the frameworks, deal directly with the plm interface, etc.

Alternatively, you are welcome to do a full orte_init and use the

frameworks

yourself - there is no requirement to use the library. I only

offer it as an

alternative.


As far as I can tell, neither API provides the same functionality

as that

available in 1.2. While this might be beneficial for OMPI-specific

activities,

the changes appear to severely limit the interaction of tools with

the

runtime. At this point, I can't see either interface supporting PTP.


I went ahead and added a notification capability to the system -
took about
30 minutes. I can provide notice of job and process state changes
since I
see those. Node state changes, however, are different - I can notify
on
them, but we have no way of seeing them. None of the environments we
support
tell us when a node fails.





I know that the tool library works because it uses the identical

APIs as

comm_spawn and mpirun. I have also tested them by building my own

tools.


There's a big difference being on a code path that *must* work

because it is

used by core components, to one that is provided as an add-on for

external

tools. I may be worrying needlessly if this new interface becomes an
"officially supported" API. Is that planned? At a minimum, it

seems like it's

going to complicate your testing process, since you're going to

need to

provide a separate set of tests that exercise this interface

independent of

the rest of OMPI.


It is an officially supported API. Testing is not as big a problem
as you
might expect since the library exercises the same code paths as
mpirun and
comm_spawn. Like I said, I have written my own tools that exercise  
the

library - no problem using them as tests.





We do not launch an orted for any tool-library query. All we do is
communicate the query to the target persistent daemon or mpirun.

Those

entities have recv's posted to catch any incoming messages and

execute the

request.

You are correct that we no longer have event driven notification

in the

system. I repeatedly asked the community (on both devel and core

lists) for

input on that question, and received no indications that anyone

wanted it

supported. It can be added back into the system, but would

require the

approval of the OMPI community. I don't know how problematic that

would be -

there is a lot of concern over the amount of memory, overhead,

and potential

reliability issues that surround event notification. If you want

that

capability, I suggest we discuss it, come up with a plan that

deals with

those issues, and then take a proposal to the devel list for

discussion.


As for reliability, the objectives of the last year's 

[OMPI devel] getting config.guess/config.sub from upstream

2008-03-04 Thread Ralf Wildenhues
Hello,

Please note that the CVS repo for config.guess and config.sub is
outdated, development has moved to use git.
ompi_trunk/config/distscript.csh could be adjusted to pull from

and likewise for config.sub.  I'm too dumb to fix the csh script though,
for me it seems to always fail the download (it did that before, too).

Cheers,
Ralf


[OMPI devel] Fwd: OpenMPI changes

2008-03-04 Thread Greg Watson

Hi all,

Ralph informs me that significant functionality has been removed from  
ORTE in 1.3. Unfortunately this functionality was being used by PTP to  
provide support for OMPI, and without it, it seems unlikely that PTP  
will be able to work with 1.3. Apparently restoring this lost  
functionality is an "enhancement" of 1.3, and so is something that  
will not necessarily be done. Having worked with OMPI from a very  
early stage to ensure that we were able to provide robust support, I  
must say it is a bit disappointing that this approach is being taken.  
I hope that the community will view this "enhancement" as worthwhile.


Regards,

Greg

Begin forwarded message:



On 2/29/08 7:13 AM, "Gregory R Watson"  wrote:

>
>
> Ralph Castain  wrote on 02/29/2008 12:18:39 AM:
>
>> Ralph Castain 
>> 02/29/08 12:18 AM
>>
>> To
>>
>> Gregory R Watson/Watson/IBM@IBMUS
>>
>> cc
>>
>> Subject
>>
>> Re: OpenMPI changes
>>
>> Hi Greg
>>
>> All of the prior options (and some new ones) for spawning a job  
are fully
>> supported in the new interface. Instead of setting them with  
"attributes",
>> you create an orte_job_t object and just fill them in. This is  
precisely how
>> mpirun does it - you can look at that code if you want an  
example, though it
>> is somewhat complex. Alternatively, you can look at the way it is  
done for
>> comm_spawn, which may be more analogous to your situation - that  
code is in

>> ompi/mca/dpm/orte.
>>
>> All the tools library does is communicate the job object to the  
target
>> persistent daemon so it can do the work. This way, you don't have  
to open

>> all the frameworks, deal directly with the plm interface, etc.
>>
>> Alternatively, you are welcome to do a full orte_init and use the  
frameworks
>> yourself - there is no requirement to use the library. I only  
offer it as an

>> alternative.
>
> As far as I can tell, neither API provides the same functionality  
as that
> available in 1.2. While this might be beneficial for OMPI-specific  
activities,
> the changes appear to severely limit the interaction of tools with  
the

> runtime. At this point, I can't see either interface supporting PTP.

I went ahead and added a notification capability to the system -  
took about
30 minutes. I can provide notice of job and process state changes  
since I
see those. Node state changes, however, are different - I can notify  
on
them, but we have no way of seeing them. None of the environments we  
support

tell us when a node fails.

>
>>
>> I know that the tool library works because it uses the identical  
APIs as
>> comm_spawn and mpirun. I have also tested them by building my own  
tools.

>
> There's a big difference being on a code path that *must* work  
because it is
> used by core components, to one that is provided as an add-on for  
external

> tools. I may be worrying needlessly if this new interface becomes an
> "officially supported" API. Is that planned? At a minimum, it  
seems like it's
> going to complicate your testing process, since you're going to  
need to
> provide a separate set of tests that exercise this interface  
independent of

> the rest of OMPI.

It is an officially supported API. Testing is not as big a problem  
as you
might expect since the library exercises the same code paths as  
mpirun and

comm_spawn. Like I said, I have written my own tools that exercise the
library - no problem using them as tests.

>
>>
>> We do not launch an orted for any tool-library query. All we do is
>> communicate the query to the target persistent daemon or mpirun.  
Those
>> entities have recv's posted to catch any incoming messages and  
execute the

>> request.
>>
>> You are correct that we no longer have event driven notification  
in the
>> system. I repeatedly asked the community (on both devel and core  
lists) for
>> input on that question, and received no indications that anyone  
wanted it
>> supported. It can be added back into the system, but would  
require the
>> approval of the OMPI community. I don't know how problematic that  
would be -
>> there is a lot of concern over the amount of memory, overhead,  
and potential
>> reliability issues that surround event notification. If you want  
that
>> capability, I suggest we discuss it, come up with a plan that  
deals with
>> those issues, and then take a proposal to the devel list for  
discussion.

>>
>> As for reliability, the objectives of the last year's effort were  
precisely
>> scalability and reliability. We did a lot of work to eliminate  
recursive
>> deadlocks and improve the reliability of the code. Our current  
testing
>> indicates we had considerable success in that regard,  
particularly with the

>> recursion elimination commit earlier today.
>>
>> I would be happy to work with you to meet the PTP's needs - we'll  
just need
>> to work with the OMPI community to ensure everyone buys into the  
plan. If it
>> would help, I could come and 

Re: [OMPI devel] suggested patch for mca-btl-openib-hca-params.ini

2008-03-04 Thread Jeff Squyres
Sounds good -- I don't remember who's on the schedule A for Qlogic,  
but I know that Christian Bell can commit.


Do you need this for v1.2.6?  We are literally rolling 1.2.6rc1 right  
*now*...



On Mar 4, 2008, at 2:12 PM, Ralph Campbell wrote:


Here is a suggested patch for adding the QLogic QLE7240 and QLE7280
DDR HCA cards to the openib params file.

I would like the MTU to default to 4K for these HCAs but I don't see
any code using the ibv_port_attr.active_mtu field to limit the MTU
to the active MTU.  If you like, I can try to make a patch to do this.

--- ompi/mca/btl/openib/mca-btl-openib-hca-params.ini	2008-02-20  
08:28:32.0 -0800
+++ ompi/mca/btl/openib/mca-btl-openib-hca-params.ini.new	2008-02-25  
18:09:24.364877000 -0800

@@ -121,6 +121,12 @@

[QLogic InfiniPath]
vendor_id = 0x1fc1
-vendor_part_id = 13,16
+vendor_part_id = 13
use_eager_rdma = 1
mtu = 2048
+
+[QLogic InfiniPath]
+vendor_id = 0x1fc1,0x1077
+vendor_part_id = 16,29216
+use_eager_rdma = 1
+mtu = 4096



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



[OMPI devel] suggested patch for mca-btl-openib-hca-params.ini

2008-03-04 Thread Ralph Campbell
Here is a suggested patch for adding the QLogic QLE7240 and QLE7280
DDR HCA cards to the openib params file.

I would like the MTU to default to 4K for these HCAs but I don't see
any code using the ibv_port_attr.active_mtu field to limit the MTU
to the active MTU.  If you like, I can try to make a patch to do this.

--- ompi/mca/btl/openib/mca-btl-openib-hca-params.ini   2008-02-20 
08:28:32.0 -0800
+++ ompi/mca/btl/openib/mca-btl-openib-hca-params.ini.new   2008-02-25 
18:09:24.364877000 -0800
@@ -121,6 +121,12 @@

 [QLogic InfiniPath]
 vendor_id = 0x1fc1
-vendor_part_id = 13,16
+vendor_part_id = 13
 use_eager_rdma = 1
 mtu = 2048
+
+[QLogic InfiniPath]
+vendor_id = 0x1fc1,0x1077
+vendor_part_id = 16,29216
+use_eager_rdma = 1
+mtu = 4096





[OMPI devel] [RFC] Reduce the number of tests run by make check

2008-03-04 Thread Tim Prins

WHAT: Reduce the number of tests run by make check

WHY: Some of the tests will not work properly until Open MPI is 
installed. Also, many of the tests do not really test anything.


WHERE: See below.

TIMEOUT: COB Friday March 14

DESCRIPTION:
We have been having many problems with make check over the years. People 
tend to change things and not update the tests, which lead to tarball 
generation failures and nightly test run failures. Furthermore, many of 
the tests test things which have not changed for years.


So with this in mind, I propose only running the following tests when 
'make check' is run:

asm/atomic_barrier
asm/atomic_barrier_noinline
asm/atomic_spinlock
asm/atomic_spinlock_noinline
asm/atomic_math
asm/atomic_math_noinline
asm/atomic_cmpset
asm/atomic_cmpset_noinline

We we would no longer run the following tests:
class/ompi_bitmap_t
class/opal_hash_table_t
class/opal_list_t
class/opal_value_array_t
class/opal_pointer_array
class/ompi_rb_tree_t
memory/opal_memory_basic
memory/opal_memory_speed
memory/opal_memory_cxx
threads/opal_thread
threads/opal_condition
datatype/ddt_test
datatype/checksum
datatype/position
peruse/mpi_peruse

These tests would not be deleted from the repository, just made so they 
do not run by default.


Re: [OMPI devel] make check failing

2008-03-04 Thread Tim Prins
Simple, because the test that eventually segfaults only runs if ompi is 
configured with threading. Otherwise it is a no-op.


Tim

Jeff Squyres wrote:
I think another important question is: why is this related to  
threads?  (i.e., why does it work in non-threaded builds)



On Mar 4, 2008, at 9:44 AM, Ralph H Castain wrote:

Carto select failing if it doesn't find any modules was called out  
in an
earlier message (might have been a commit log) when we set an mca-no- 
build
flag on that framework. This should probably be fixed - there are  
times when

someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we  
crate a

default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins"  wrote:


Hi,

We have been having a problem lately with our MTT runs where make  
check

would fail when mpi threads were enabled.

Turns out the problem is that opal_init now calls
opal_base_carto_select, which cannot find any carto modules since we
have not done an install yet. So it returns a failure. This causes
opal_init to abort before initializing the event engine. So when we  
try
to do the threading tests, the event engine is uninitialized and  
fails.


So this is why it fails, but I do not know how best to fix it. Any
suggestions would be appreciated.

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel







Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Edgar Gabriel

Tim Prins wrote:
We have used '^' elsewhere to indicate not, so maybe just have the 
syntax be if you put '^' at the beginning of a line, that node is not used.


So we could have:
n0
n1
^headnode
n3


this would sound fine for me.



I understand the idea of having a flag to indicate that all nodes below 
a certain point should be ignored, but I think this might get confusing, 
and I'm unsure how useful it would be. I just see the usefulness of this 
to block out a couple of nodes by default. Besides, if you do want to 
block out many nodes, any reasonable text editor allows you to insert 
'^' in front of any number of lines easily.


Alternatively, for the particular situation that Edgar mentions, it may 
be good enough just to set rmaps_base_no_schedule_local in the mca 
params default file.


hm, ok, here is another flag which I was not aware of. Anyway, I can 
think of other scenarios where this feature could be useful, e.g. when 
hunting down performance problems on a cluster and you would like to 
avoid to have to get a new allocation or do a major rewrite of the 
hostfile every time. Or including an I/O node into an allocation (in 
order to have it exclusively), but make sure that no MPI process gets 
scheduled onto the node.


Thanks
Edgar



One question though: If I am in a slurm allocation which contains n1, 
and there is a default hostfile that contains "^n1", will I run on 'n1'?


I'm not sure what the answer is, I know we talked about the precedence 
earlier...


Tim

Ralph H Castain wrote:

I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:


Ralph,

could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition  on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
front-end node.

If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...

THanks
edgar

Ralph Castain wrote:

I forget all the formatting we are supposed to use, so I hope you'll all
just bear with me.

George brought up the fact that we used to have an MCA param to specify a
hostfile to use for a job. The hostfile behavior described on the wiki,
however, doesn't provide for that option. It associates a hostfile with a
specific app_context, and provides a detailed hierarchical layout of how
mpirun is to interpret that information.

What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
to replace the deprecated capability. If found, the system's behavior will
be:

1. in a managed environment, the default hostfile will be used to filter the
discovered nodes to define the available node pool. Any hostfile and/or dash
host options provided to an app_context will be used to further filter the
node pool to define the specific nodes for use by that app_context. Thus,
nodes in the hostfile and dash host options given to an app_context -must-
also be in the default hostfile in order to be available for use by that
app_context - any nodes in the app_context options that are not in the
default hostfile will be ignored.

2. in an unmanaged environment, the default hostfile will be used to define
the available node pool. Any hostfile and/or dash host options provided to
an app_context will be used to filter the node pool to define the specific
nodes for use by that app_context, subject to the previous caveat. However,
add-hostfile and add-host options will add nodes to the node pool for use
-only- by the associated app_context.


I believe this proposed behavior is consistent with that described on the
wiki, and would be relatively easy to implement. If nobody objects, I will
do so by end-of-day 3/6.

Comments, suggestions, objections - all are welcome!
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 

Re: [OMPI devel] make check failing

2008-03-04 Thread Jeff Squyres
I think another important question is: why is this related to  
threads?  (i.e., why does it work in non-threaded builds)



On Mar 4, 2008, at 9:44 AM, Ralph H Castain wrote:

Carto select failing if it doesn't find any modules was called out  
in an
earlier message (might have been a commit log) when we set an mca-no- 
build
flag on that framework. This should probably be fixed - there are  
times when

someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we  
crate a

default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins"  wrote:


Hi,

We have been having a problem lately with our MTT runs where make  
check

would fail when mpi threads were enabled.

Turns out the problem is that opal_init now calls
opal_base_carto_select, which cannot find any carto modules since we
have not done an install yet. So it returns a failure. This causes
opal_init to abort before initializing the event engine. So when we  
try
to do the threading tests, the event engine is uninitialized and  
fails.


So this is why it fails, but I do not know how best to fix it. Any
suggestions would be appreciated.

Tim
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] make check failing

2008-03-04 Thread Ralph H Castain
Carto select failing if it doesn't find any modules was called out in an
earlier message (might have been a commit log) when we set an mca-no-build
flag on that framework. This should probably be fixed - there are times when
someone may not wish to build any carto modules.

Is there some reason why carto absolutely must find a module? Can we crate a
default "none available" module in the base?


On 3/4/08 7:39 AM, "Tim Prins"  wrote:

> Hi,
> 
> We have been having a problem lately with our MTT runs where make check
> would fail when mpi threads were enabled.
> 
> Turns out the problem is that opal_init now calls
> opal_base_carto_select, which cannot find any carto modules since we
> have not done an install yet. So it returns a failure. This causes
> opal_init to abort before initializing the event engine. So when we try
> to do the threading tests, the event engine is uninitialized and fails.
> 
> So this is why it fails, but I do not know how best to fix it. Any
> suggestions would be appreciated.
> 
> Tim
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] make check failing

2008-03-04 Thread Tim Prins

Hi,

We have been having a problem lately with our MTT runs where make check 
would fail when mpi threads were enabled.


Turns out the problem is that opal_init now calls 
opal_base_carto_select, which cannot find any carto modules since we 
have not done an install yet. So it returns a failure. This causes 
opal_init to abort before initializing the event engine. So when we try 
to do the threading tests, the event engine is uninitialized and fails.


So this is why it fails, but I do not know how best to fix it. Any 
suggestions would be appreciated.


Tim


[OMPI devel] disabling vt by default

2008-03-04 Thread Jeff Squyres
Per prior e-mails on this list, I finally got around to disabling VT  
builds by default this morning (https://svn.open-mpi.org/trac/ompi/changeset/17683 
 -- I committed before 9am Eastern, so it's, er, sorta/mostly before  
the US workday :p ).


Once the VT configury stuff is incorporated into OMPI's autogen stuff  
and we don't have timestamp issues that can cause re-autoconfs/etc.,  
we can re-enable it by default.


Dresden: any estimates on when the integration will occur?

--
Jeff Squyres
Cisco Systems



Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Ralph H Castain



On 3/4/08 5:51 AM, "Tim Prins"  wrote:

> We have used '^' elsewhere to indicate not, so maybe just have the
> syntax be if you put '^' at the beginning of a line, that node is not used.
> 
> So we could have:
> n0
> n1
> ^headnode
> n3
> 

That works for me and sounds like the right solution.

> I understand the idea of having a flag to indicate that all nodes below
> a certain point should be ignored, but I think this might get confusing,
> and I'm unsure how useful it would be. I just see the usefulness of this
> to block out a couple of nodes by default. Besides, if you do want to
> block out many nodes, any reasonable text editor allows you to insert
> '^' in front of any number of lines easily.
> 
> Alternatively, for the particular situation that Edgar mentions, it may
> be good enough just to set rmaps_base_no_schedule_local in the mca
> params default file.
> 
> One question though: If I am in a slurm allocation which contains n1,
> and there is a default hostfile that contains "^n1", will I run on 'n1'?

According to the precedence rules in the wiki, you would -not- run on n1.

> 
> I'm not sure what the answer is, I know we talked about the precedence
> earlier...
> 
> Tim
> 
> Ralph H Castain wrote:
>> I personally have no objection, but I would ask then that the wiki be
>> modified to cover this case. All I require is that someone define the syntax
>> to be used to indicate "this is a node I do -not- want used", or
>> alternatively a flag that indicates "all nodes below are -not- to be used".
>> 
>> Implementation isn't too hard once I have that...
>> 
>> 
>> On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:
>> 
>>> Ralph,
>>> 
>>> could this mechanism be used also to exclude a node, indicating to never
>>> run a job there? Here is the problem that I face quite often: students
>>> working on the homework forget to allocate a partition  on the cluster,
>>> and just type mpirun. Because of that, all jobs end up running on the
>>> front-end node.
>>> 
>>> If we would have now the ability to specify in a default hostfile, to
>>> never run a job on a specified node (e.g. the front end node), users
>>> would get an error message when trying to do that. I am aware that
>>> that's a little ugly...
>>> 
>>> THanks
>>> edgar
>>> 
>>> Ralph Castain wrote:
 I forget all the formatting we are supposed to use, so I hope you'll all
 just bear with me.
 
 George brought up the fact that we used to have an MCA param to specify a
 hostfile to use for a job. The hostfile behavior described on the wiki,
 however, doesn't provide for that option. It associates a hostfile with a
 specific app_context, and provides a detailed hierarchical layout of how
 mpirun is to interpret that information.
 
 What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
 to replace the deprecated capability. If found, the system's behavior will
 be:
 
 1. in a managed environment, the default hostfile will be used to filter
 the
 discovered nodes to define the available node pool. Any hostfile and/or
 dash
 host options provided to an app_context will be used to further filter the
 node pool to define the specific nodes for use by that app_context. Thus,
 nodes in the hostfile and dash host options given to an app_context -must-
 also be in the default hostfile in order to be available for use by that
 app_context - any nodes in the app_context options that are not in the
 default hostfile will be ignored.
 
 2. in an unmanaged environment, the default hostfile will be used to define
 the available node pool. Any hostfile and/or dash host options provided to
 an app_context will be used to filter the node pool to define the specific
 nodes for use by that app_context, subject to the previous caveat. However,
 add-hostfile and add-host options will add nodes to the node pool for use
 -only- by the associated app_context.
 
 
 I believe this proposed behavior is consistent with that described on the
 wiki, and would be relatively easy to implement. If nobody objects, I will
 do so by end-of-day 3/6.
 
 Comments, suggestions, objections - all are welcome!
 Ralph
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [RFC] Default hostfile MCA param

2008-03-04 Thread Tim Prins
We have used '^' elsewhere to indicate not, so maybe just have the 
syntax be if you put '^' at the beginning of a line, that node is not used.


So we could have:
n0
n1
^headnode
n3

I understand the idea of having a flag to indicate that all nodes below 
a certain point should be ignored, but I think this might get confusing, 
and I'm unsure how useful it would be. I just see the usefulness of this 
to block out a couple of nodes by default. Besides, if you do want to 
block out many nodes, any reasonable text editor allows you to insert 
'^' in front of any number of lines easily.


Alternatively, for the particular situation that Edgar mentions, it may 
be good enough just to set rmaps_base_no_schedule_local in the mca 
params default file.


One question though: If I am in a slurm allocation which contains n1, 
and there is a default hostfile that contains "^n1", will I run on 'n1'?


I'm not sure what the answer is, I know we talked about the precedence 
earlier...


Tim

Ralph H Castain wrote:

I personally have no objection, but I would ask then that the wiki be
modified to cover this case. All I require is that someone define the syntax
to be used to indicate "this is a node I do -not- want used", or
alternatively a flag that indicates "all nodes below are -not- to be used".

Implementation isn't too hard once I have that...


On 3/3/08 9:44 AM, "Edgar Gabriel"  wrote:


Ralph,

could this mechanism be used also to exclude a node, indicating to never
run a job there? Here is the problem that I face quite often: students
working on the homework forget to allocate a partition  on the cluster,
and just type mpirun. Because of that, all jobs end up running on the
front-end node.

If we would have now the ability to specify in a default hostfile, to
never run a job on a specified node (e.g. the front end node), users
would get an error message when trying to do that. I am aware that
that's a little ugly...

THanks
edgar

Ralph Castain wrote:

I forget all the formatting we are supposed to use, so I hope you'll all
just bear with me.

George brought up the fact that we used to have an MCA param to specify a
hostfile to use for a job. The hostfile behavior described on the wiki,
however, doesn't provide for that option. It associates a hostfile with a
specific app_context, and provides a detailed hierarchical layout of how
mpirun is to interpret that information.

What I propose to do is add an MCA param called "OMPI_MCA_default_hostfile"
to replace the deprecated capability. If found, the system's behavior will
be:

1. in a managed environment, the default hostfile will be used to filter the
discovered nodes to define the available node pool. Any hostfile and/or dash
host options provided to an app_context will be used to further filter the
node pool to define the specific nodes for use by that app_context. Thus,
nodes in the hostfile and dash host options given to an app_context -must-
also be in the default hostfile in order to be available for use by that
app_context - any nodes in the app_context options that are not in the
default hostfile will be ignored.

2. in an unmanaged environment, the default hostfile will be used to define
the available node pool. Any hostfile and/or dash host options provided to
an app_context will be used to filter the node pool to define the specific
nodes for use by that app_context, subject to the previous caveat. However,
add-hostfile and add-host options will add nodes to the node pool for use
-only- by the associated app_context.


I believe this proposed behavior is consistent with that described on the
wiki, and would be relatively easy to implement. If nobody objects, I will
do so by end-of-day 3/6.

Comments, suggestions, objections - all are welcome!
Ralph


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel