Re: [O-MPI devel] configure hangs in libtool..

2005-09-02 Thread Peter Kjellström
On Friday 26 August 2005 19.37, Ralf Wildenhues wrote:
> * Peter Kjellström wrote on Fri, Aug 26, 2005 at 06:24:43PM CEST:
> > On Friday 26 August 2005 18.04, Ralf Wildenhues wrote:
> > > Libtool version previous to 1.5.16 had a bug in the macro to detect
> > > this; it was often wrong before.
> >
> > my libtool is indeed rather ancient compared to 1.5.16, I knew ompi
> > needed a recent one but had assumed that an updated centos-4.1 (rhel4u1)
> > would be enough... (I have 1.5.6)
> >
> > Either way, I have managed to build ompi before so I looked deeper and
> > found out that the intel compilers had been updated under my feet (-027
> > to -029). Switching back made configure run just fine...
>
> OK.  But see, I'd still like to know whether libtool-1.5.18 copes with
> your newer icc.  I can't fix it (given there is something to fix) unless
> I know about the failure.

Just to end this thread nicely, I was able to build svn7132 with icc-8.1e-029 
just fine (now with recommended libtool&co). I think we fixed a bug in our 
compiler installation too so I'm not sure if it was the libtool upgrade or 
not but either way it works now :-)

/Peter

>
> Cheers,
> Ralf


pgpGxVSYDtIuX.pgp
Description: PGP signature


[O-MPI devel] IMPORTANT: Libtool version

2005-09-02 Thread Jeff Squyres
Haven't heard back from any of the other core developers about their 
versions of Libtool.


FAIR WARNING: If I don't hear back from anyone by COB today, I'm going 
to raise the required version of Libtool


The issue: shared library builds [the default] for OMPI require some 
version of Libtool > 1.5, probably somewhere around 1.5.14 or 1.5.16, 
although 1.5.18 is the most current.  This is actually causing 
confusion and build issues for some users, so I want to make autogen.sh 
force the use of a more recent libtool (right now it only checks for >= 
1.5).


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



[O-MPI devel] poe PLS component

2005-09-02 Thread Jeff Squyres

Thara / George --

Can you guys convert POE's configure.stub to configure.m4?

Thanks!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] IMPORTANT: Libtool version

2005-09-02 Thread Ralf Wildenhues
Hi Jeff,

* Jeff Squyres wrote on Fri, Sep 02, 2005 at 12:23:35PM CEST:
> Haven't heard back from any of the other core developers about their 
> versions of Libtool.

I don't belong in this group, but FWIW: 1.5.20 (or CVS versions).

> FAIR WARNING: If I don't hear back from anyone by COB today, I'm going 
> to raise the required version of Libtool
> 
> The issue: shared library builds [the default] for OMPI require some 
> version of Libtool > 1.5, probably somewhere around 1.5.14 or 1.5.16, 
> although 1.5.18 is the most current.

Not any more.  :)

> This is actually causing confusion and build issues for some users, so
> I want to make autogen.sh force the use of a more recent libtool
> (right now it only checks for >= 1.5).

It would be good to check for >= 1.5.16, because of the "-c -o" issue
that was recently discussed here.

Cheers,
Ralf


Re: [O-MPI devel] "fix" for romio configure.in

2005-09-02 Thread Jeff Squyres
Committed -- although I put most of your explanation in a comment.  
Thanks!



On Aug 31, 2005, at 7:27 AM, Ralf Wildenhues wrote:


This is a rather subtle issue, and pretty ugly, unfortunately.
For the curious reader, here is a rather technical explanation:

Somewhere, inside some
  if test "$arch_..."
branching construct (but not inside an Autoconf macro definition!), the
configure.in script uses the macro AC_CHECK_HEADER.  This macro 
requires

some other ones to work, so it AC_REQUIREs these; for example
AC_PROG_EGREP which define $EGREP.  What autoconf then does is expand
these checks right before the expansion of AC_CHECK_HEADER, that is:
inside the shell branching construct.

Then, later on, AC_PROG_LIBTOOL is called.  This macro also needs
AC_PROG_EGREP, so it also AC_REQUIREs it.  Autoconf remembers that
it has already expanded the macro, so it is not expanded again.

Since the actual test for egrep now is hidden inside the shell branch,
it is not run in all cases.  So further tests that stem from
AC_PROG_LIBTOOL fail.

Possible ideas to solve this:
- move AC_PROG_LIBTOOL up before the branch: does not work, the branch
  code modifies $CC.
- search for all AC_REQUIREd macros and call them by hand, outside the
  branch.  Tedious and error-prone.
- rewrite major parts of configure.in to solve the logic.  Not an
  option, OpenMPI wants as little changes as possible to this legacy
  external packages.
- try to use the experimental AS_IF() Autoconf macro which aims at
  solving (or at least mitigating) this issue.  Not too good an idea.
- call a stub AC_CHECK_HEADER once outside any branches _before_ it's
  called inside so that required macros are expanded there.

The patch below implements that last possibility, by searching some
header name unlikely to be used seriously, and minimizing any
consequences.

Cheers,
Ralf

* ompi/mca/io/romio/romio/configure.in: Insert stub call of
AC_CHECK_HEADER to pull in required macros at top level instead
of later in shell-conditional branch.

Index: ompi/mca/io/romio/romio/configure.in
===
--- ompi/mca/io/romio/romio/configure.in(revision 7105)
+++ ompi/mca/io/romio/romio/configure.in(working copy)
@@ -641,6 +641,8 @@
 # Open MPI: need to actually get the C compiler
 CFLAGS_save="$CFLAGS"
 AC_PROG_CC
+# Open MPI: pull in machinery necessary foe AC_HEADER_CHECK at top 
level.

+AC_CHECK_HEADER([foobar.h], [:], [:])
 CFLAGS="$CFLAGS_save"

 # Open MPI: this stuff is not necessary with modern versions of the
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] IMPORTANT: Libtool version

2005-09-02 Thread Peter Kjellström
On Friday 02 September 2005 12.23, Jeff Squyres wrote:
> Haven't heard back from any of the other core developers about their
> versions of Libtool.
>
> FAIR WARNING: If I don't hear back from anyone by COB today, I'm going
> to raise the required version of Libtool
>
> The issue: shared library builds [the default] for OMPI require some
> version of Libtool > 1.5, probably somewhere around 1.5.14 or 1.5.16,
> although 1.5.18 is the most current. 

just fyi, 1.5.20 exists

/Peter

> This is actually causing 
> confusion and build issues for some users, so I want to make autogen.sh
> force the use of a more recent libtool (right now it only checks for >=
> 1.5).

-- 

  Peter Kjellström   |
  National Supercomputer Centre  |
  Sweden | http://www.nsc.liu.se


pgpAC9vgAXsTi.pgp
Description: PGP signature


[O-MPI devel] cleanup

2005-09-02 Thread Tim S. Woodall


Ctrl^C handling in orte seems to be broken. I know get a core
file for every orted that is spawned on remote nodes when
I attempt to kill a job via Crtl^C or orterun. This behaviour
has actually been around for about a week know. Anyone else
seeing this?

Tim



Re: [O-MPI devel] cleanup

2005-09-02 Thread Jeff Squyres

Josh --

Is this related to the orted signal traps you put in recently, 
perchance?



On Sep 2, 2005, at 9:21 AM, Tim S. Woodall wrote:


Ctrl^C handling in orte seems to be broken. I know get a core
file for every orted that is spawned on remote nodes when
I attempt to kill a job via Crtl^C or orterun. This behaviour
has actually been around for about a week know. Anyone else
seeing this?


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] cleanup

2005-09-02 Thread Josh Hursey

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

It shouldn't be, since orterun should be sending a message (not 
signals) to the orted's and have them shutdown. That mechanism has been 
there before my addition. Ralph noticed some odd behavior with the 
orted's living well after the orterun has exited. It's possible that 
this is a result of the same bug. I was going to check that out 
yesterday, but became sidetracked and didn't get to it. I'll take a 
look this morning and see if I can't track this down.


Josh

On Sep 2, 2005, at 8:24 AM, Jeff Squyres wrote:


Josh --

Is this related to the orted signal traps you put in recently,
perchance?


On Sep 2, 2005, at 9:21 AM, Tim S. Woodall wrote:


Ctrl^C handling in orte seems to be broken. I know get a core
file for every orted that is spawned on remote nodes when
I attempt to kill a job via Crtl^C or orterun. This behaviour
has actually been around for about a week know. Anyone else
seeing this?


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


- 
Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFDGFqrAxXT7ehIlVwRAvqPAKCyJ3sf9nZ4wN4H1bMP0aLWZX5mBACeJUCV
2+5K2jnMxEhN94uDaxB32tg=
=w3ce
-END PGP SIGNATURE-



Re: [O-MPI devel] poe PLS component

2005-09-02 Thread Craig Rasmussen


On Sep 2, 2005, at 4:55 AM, Jeff Squyres wrote:


Can you guys convert POE's configure.stub to configure.m4?



4:55 AM, aren't you up kind of early??

Ciao,
Craig



Re: [O-MPI devel] poe PLS component

2005-09-02 Thread Jeff Squyres
Heh.  6:55am in my timezone.  So it's *early*, but not *absurdly* 
early.  :-)




On Sep 2, 2005, at 10:00 AM, Craig Rasmussen wrote:


On Sep 2, 2005, at 4:55 AM, Jeff Squyres wrote:


Can you guys convert POE's configure.stub to configure.m4?


4:55 AM, aren't you up kind of early??


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



[O-MPI devel] Totalview Support

2005-09-02 Thread Jeff Squyres

FYI --

David recently committed support for Totalview to attach to Open MPI  
jobs.  I added a FAQ entry on the web site about how to use it (in  
particular, we suggest adding a $HOME/.tvdrc file -- see the FAQ):


	http://www.open-mpi.org/faq/?category=supported-systems#parallel- 
debuggers


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] Totalview Support

2005-09-02 Thread Tim S. Woodall

Jeff - Do you have totalview on any of your machines at IU?


Jeff Squyres wrote:

FYI --

David recently committed support for Totalview to attach to Open MPI  
jobs.  I added a FAQ entry on the web site about how to use it (in  
particular, we suggest adding a $HOME/.tvdrc file -- see the FAQ):


	http://www.open-mpi.org/faq/?category=supported-systems#parallel- 
debuggers




Re: [O-MPI devel] Totalview Support

2005-09-02 Thread Jeff Squyres

Yes, but not on the odin or thor clusters.  :-\

It's a small license meant for basic development and testing of 
supporting Totalviw itself (total of 4 concurrent processes).



On Sep 2, 2005, at 2:35 PM, Tim S. Woodall wrote:


Jeff - Do you have totalview on any of your machines at IU?


Jeff Squyres wrote:

FYI --

David recently committed support for Totalview to attach to Open MPI
jobs.  I added a FAQ entry on the web site about how to use it (in
particular, we suggest adding a $HOME/.tvdrc file -- see the FAQ):

http://www.open-mpi.org/faq/?category=supported-systems#parallel-
debuggers


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] Totalview Support

2005-09-02 Thread Paul H. Hargrove
I'd suggest that the short tcl script should live somewhere in the ompi 
and/or orte install directory and the FAQ should encourage adding 
"source " to the .tvdrc file rather than the full script.


-Paul

Jeff Squyres wrote:


FYI --

David recently committed support for Totalview to attach to Open MPI  
jobs.  I added a FAQ entry on the web site about how to use it (in  
particular, we suggest adding a $HOME/.tvdrc file -- see the FAQ):


	http://www.open-mpi.org/faq/?category=supported-systems#parallel- 
debuggers





--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [O-MPI devel] Totalview Support

2005-09-02 Thread Jeff Squyres

On Sep 2, 2005, at 3:19 PM, Paul H. Hargrove wrote:


I'd suggest that the short tcl script should live somewhere in the ompi
and/or orte install directory and the FAQ should encourage adding
"source " to the .tvdrc file rather than the full script.


Excellent idea -- I didn't put 2 and 2 together:

- that TV's scripting language was TCL
- that TCL had "source" capability

But I just tested it and it works, so I'll update the FAQ... (commit 
with the script coming shortly)


--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



Re: [O-MPI devel] pml vs bml vs btl

2005-09-02 Thread Brad Penoff

hey Jeff/Galen,

Thanks to both of you for helping answer our questions, both on and off 
the list.  Currently, we're doing a lot of writing trying to focus on MPI 
implementation design strategies, so this has helped us certainly; 
hopefully others too.


On our end, generally, we've been trying to push as much functionality
down to the transport (we have some info on our webpage :
http://www.cs.ubc.ca/labs/dsg/mpi-sctp/ or you can hear me talk at SC|05 )
where as your approach is to bring functionality up and manage it within
the middleware (obviously you do a lot of other neat things like thread
safety and countless other things that are really impressive).  With
respect to managing interfaces in the middleware, I understand it buys you
some generality though since channel bonding (for TCP) and concurrent
multipath transfer (for SCTP) aren't available for mVAPI, Open IB, GM, MX,
etc.

Already, I think it's cool to read about OpenMPI's design; in the future,
it will be cooler to hear if pulling so much functionality up to the
middleware has any performance drawbacks from having to do so much
management (so comparing for example, a setup with two NICs using OpenMPI
striping to that of a thinner middleware that has the same setup but uses
channel bonding).  From the looks of it, your Euro PVM/MPI paper is going
to tell about the low cost of software components; I'm just curious of the
costs of even having this management functionality in the middleware in
the first place; time will tell!

Thanks again for all your answers,

brad


On Wed, 31 Aug 2005, Galen M. Shipman wrote:



On Aug 31, 2005, at 1:06 PM, Jeff Squyres wrote:


On Aug 29, 2005, at 9:17 PM, Brad Penoff wrote:



PML: Pretty much the same as it was described in the paper.  Its
interface is basically MPI semantics (i.e., it sits right under
MPI_SEND and the rest).

BTL: Byte Transfer Layer; it's the next generation of PTL.  The
BTL is
much more simple than the PTL, and removes all vestigaes of any MPI
semantics that still lived in the PTL.  It's a very simple byte
mover
layer, intended to make it quite easy to implement new network
interfaces.



I was curious about what you meant by the removal of MPI
semantics.  Do
you mean it simply has no notion of tags, ranks, etc?  In other
words,
does it simply put the data into some sort of format so that the
PML can
operate on with its own state machine?



I don't recall the details (it's been quite a while since I looked at
the PTL), but there was some semblance of MPI semantics that creeped
down into the PTL interface itself.  The BTL has none of that -- it's
purely a byte mover.



The old ptl's controlled the short vs long rendezvous protocol, the
eager transmission of data, as well as pipelining of rdma operations
(where appropriate). In the pml OB1 and the btls this has all been
moved the OB1 level. Note that this is simply a logical separation of
control and comes at virtually no cost (well there is the very small
cost of using a function pointer).





Also, say you had some underlying protocol that allowed unordered
delivery
of data (so not fully ordered like TCP); which "layer" would the
notion of
"order" be handled in?  I'm guessing PML would need some sort of
sequence
number attached to it; is that right?



Correct.  That was in the PML in the 2nd gen stuff and is still at
the PML in the 3rd gen stuff.



BML: BTL Management Layer; this used to be part of the PML but we
recently split it off into its own framework.  It's mainly the
utility
gorp of managing multiple BTL modules in a single process.  This was
done because when working with the next generation of collectives,
MPI-2 IO, and MPI-2 one sided operations, we want to have the
ability
to use the PML (which the collectives do today, for example) or
to be
able to dive right down and directly use the BTLs (i.e., cut out a
little latency).



In the cases where the BML is required, does it cost extra memcpy's?



Not to my knowledge.  Galen -- can you fill in the details of this
question and the rest of Brad's questions?


The BML layer is simply a management layer for discovering peer
resources. It does mask the btl send, put, prepare_src, prepare_dst
operations but this code is all inlined and very short so gcc should
inline this appropriately. In fact this inlined code used to be in
the PML OB1 before we added the BML so it is a no cost "logical"
abstraction.  We don't add any extra memory copies in this abstraction.


Thanks!

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel