Re: [OMPI devel] RFC: delete mvapi BTL for v1.3

2007-10-11 Thread Josh Aune
How long will the 1.2 series be maintained?

This has been giving some of our customers a bit of heart burn, but it
can also be used to help push through the OFED upgrades on the
clusters (a good thing).

Josh

On 10/11/07, Jeff Squyres  wrote:
> Reminder -- this RFC expires tonight.
>
> Speak now or forever hold your peace...
>
>
> On Oct 5, 2007, at 7:46 AM, Jeff Squyres wrote:
>
> > WHAT: Remove the mvapi BTL for the v1.3 release.
> >
> > WHY: None of the IB vendors want to maintain it anymore; our future
> > is OFED.  If someone still has mvapi IB drivers, they can use the
> > OMPI v1.2 series.
> >
> > WHERE: svn rm ompi/mca/btl/mvapi
> >
> > WHEN: Before the v1.3 release.
> >
> > TIMEOUT: COB, Thurs, Oct 11, 2007
> >
> > -
> >
> > None of the IB vendors are interested in maintaining the "mvapi" BTL
> > anymore.  Indeed, none of us have updated it with any of the new/
> > interesting/better performance features that went into the openib BTL
> > over the past year (or more).  Additionally, some changes may be
> > coming in the OMPI infrastructure that would *require* some revamping
> > in the mvapi BTL -- and no one of Cisco, Voltaire, Mellanox is
> > willing to do it.
> >
> > So we'd like to ditch the mvapi BTL starting with v1.3 and have the
> > official guidance be that if you have mvapi, you need to use the OMPI
> > v1.2 series (i.e., remove this from the SVN trunk in the Very Near
> > Future).
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


Re: [OMPI devel] More memory troubles with vapi

2007-08-24 Thread Josh Aune
On 8/24/07, Jeff Squyres  wrote:
>
> Hmm.  If you compile Open MPI with no memory manager, then it
> *shouldn't* be Open MPI's fault (unless there's a leak in the mvapi
> BTL...?).  Verify that you did not actually compile Open MPI with a
> memory manager by running "ompi_info| grep ptmalloc2" -- it should
> come up empty.

I am sure.  I have multiple builds that I switch between.  One of the
apps doesn't work unless I --without-memory-manager (see post to
-users about realloc(), with sample code).

I noticed that there are a few ./configure --debug type switches, even
some dealing with memory.  Could those be useful for gathering further
data?  What features do those provide and how do I use them?

> The fact that you can run this under TCP without memory leaking would
> seem to indicate that it's not the app that's leaking memory, but
> rather either the MPI or the network stack.

I should clarify here, this is effectively true.  The app crashes from
a segfault after running over tcp for several hours, but it gets much
farther into the run than the vapi btl does.

>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>


[OMPI devel] More memory troubles with vapi

2007-08-24 Thread Josh Aune
We are using open-mpi on several 1000+ node clusters.  We received
several new clusters using the Infiniserve 3.X software stack recently
and are having several problems with the vapi btl (yes, I know, it is
very very old and shouldn't be used.  I couldn't agree with you more
but those are my marching orders).

I have a new application that is running into swap for an unknown
reason.  If I run and force it to use the tcp btl I don't seem to run
into swap (the job just takes a very very long time).  I have tried
restricting the size of the free lists, forcing to use send mode, and
use an open-mpi compiled w/ no memory manager but nothing seems to
help.  I've profiled with valgrind --tool=massif and the memtrace
capabilities of ptmalloc but I don't have any smoking guns yet.  It is
a fortran app an I don't know anything about debugging fortran memory
problems, can someone point me in the proper direction?

Thanks,
Josh


Re: [OMPI devel] Best bw/lat performance for microbenchmark/debug utility

2006-07-13 Thread Josh Aune

On 6/29/06, Patrick Geoffray  wrote:

Jeff Squyres (jsquyres) wrote:
>> -Original Message-
>> From: devel-boun...@open-mpi.org
>> [mailto:devel-boun...@open-mpi.org] On Behalf Of Patrick Geoffray
>> Sent: Wednesday, June 28, 2006 1:23 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Best bw/lat performance for
>> microbenchmark/debug utility
>>
>> Josh Aune wrote:
>>> I am writing up some interconnect/network debugging software that is
>>> centered around ompi.  What is the best set of functions to

> I was assuming that you would be testing latency/bandwidth, but Patrick
> is correct in stating that there are many more things to test than just
> those two metrics.

There are a lot of metrics, but most of them require deep understanding
of the MPI semantics and implementation details to make sense. The art
of micro-benchmark is to choose the metrics and explain why they matter.
It's obvious for latency/bandwidth, a bit less for unexpected and host
overhead, definitively hard for overlap and progress. And that's just
for point-to-point.

To avoid reinventing the wheel, I would suggest to Josh to develop a
micro-benchmark test suite to compute a very detailed LogP-derived
parameters, ie for all message sizes:
* send overhead (o.s) and recv overhead (o.r). These overheads will
likely be either constant or linear for various message size ranges, it
would be great to automatically compute the ranges.
Memory registration cost is accounted here, so it would useful to
measure with and without registration cache also.
* Latency (L).
* Send gap (g.s) and recv gap (g.r). For large messages, they will
likely be identical and represent the link bandwidth. For smaller
messages, the send gap is the gap of a fan-out pattern (1->N) and the
recv gap is the gap of a flat gather (N->1). It's important to not have
the send or recv overhead hiding the send or recv gap, using several
processes could be used to dive the send/recv overhead.
* unexpected overhead (o.u). Overhead added to (o.r) when the message is
not immediately matched.
* overlap availability (a) that is the percentage of communication time
that you can overlap with real host computation.

 From these parameters, you can derive pretty much all characteristics
of an interconnect without contention.

Patrick


Sorry for the long delay in replying.  Thanks for the info.  What I am
trying to do is create a set of standardized easy to use system level
debugging utilites (and force myself to learn more MPI :).  Currently
I am shooting for latency/bandwidth but would welcome ideas for
further useful node level tests.  I am not just testing the
interconnect, but need to verify memory bandwidth, pci bandwidth to
the interconnect card (I love -mca btl ^sm :), processor
functionality, system errors (currently only parity and pci-express
fatal/nonfatal/etc) and what not.

I want to have tests that are easy enough to run all you have to do is
'mpirun -np $ALL ./footest' and it comes back with any nodes that look
bad for that test as well as some general data about the cluster's
performance.

I want to get the suite out to the comunity after I have some seed
tests written and hope that there will be enough that others will be
interested in contributing, though I am waiting for release approval
from work at the moment, which may not happen :(

Josh


[OMPI devel] Best bw/lat performance for microbenchmark/debug utility

2006-06-28 Thread Josh Aune

I am writing up some interconnect/network debugging software that is
centered around ompi.  What is the best set of functions to use to get
the best bandwidth and latency numbers for openmpi and why?  I've been
asking around at work and some people say just send/recieve, though
some of the micro benchmarks I have looked at in the past used
isend/irecv.  Can someone shed some light on this (or propose more
methods)?

Thanks,
josh


Re: [OMPI devel] process ordering/processes per node

2006-06-05 Thread Josh Aune

On 4/5/06, Jeff Squyres (jsquyres)  wrote:

This is going to be influenced by how many processes bproc tells Open
MPI can be launched on each node.

Check out the FAQ for the -bynode and -byslot arguments to mpirun for
more details:


I have tried these arguments several times (up through 1.0.2a4) and I
always get the same ordering.




http://www.open-mpi.org/faq/?category=running#mpirun-scheduling

This specific entry uses hostfiles as an example, but the issue is the
same for bproc -- the "hostfile" is simply implicitly supplied by bproc
(i.e., the node names and the available on each).



> -Original Message-
> From: devel-boun...@open-mpi.org
> [mailto:devel-boun...@open-mpi.org] On Behalf Of Josh Aune
> Sent: Friday, March 31, 2006 4:43 PM
> To: Open MPI Developers
> Subject: [OMPI devel] process ordering/processes per node
>
> I have a simple hello program where each child prints out the hostname
> of the node it is running on.  When I run this (on a bproc machine)
> with -np 4 and no host file it launches one process per node on each
> of the first 4 avaliable nodes.   ie:
>
> $ mpirun -np 4 ./mpi_hello
> n1 hello
> n3 hello
> n2 hello
> n4 hello
>
> What I am trying to get is to launch 2 processes per node, or
> this output:
>
> $ mpirun -np 4 $magic_arg ./mpi_hello
> n1 hello
> n1 hello
> n2 hello
> n2 hello
>
>
> ita,
> Josh
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



[OMPI devel] Please add explicit test for sysfs/libsysfs.h

2006-06-05 Thread Josh Aune

So far, every system I have compiled open-mpi on I have hit this same
non-obvious configure failure.  In each case I have added
--with-openib= and --with-openib-libs=.  configure runs
just fine till it starts looking for OpenIB and reports that it can't
find most of the header files and what not relating to OpenIB and
eventually bombs during the checks reporting that OpenIB is not found
(even though it really is there).  After looking through the
config.log I find an error about not being able to find
sysfs/libsysfs.h.

after installing sysfsutils-devel (fedora core) the compile proceedes
w/o a hitch.

Having this be an explicit test (at the top of the openib section?)
would be wonderful :)

Thanks,
Josh

grep -r "sysfs/libsysfs.h" *
ompi/mca/btl/openib/btl_openib_component.c:#include 
ompi/dynamic-mca/btl/openib/btl_openib_component.c:#include 


[OMPI devel] process ordering/processes per node

2006-03-31 Thread Josh Aune
I have a simple hello program where each child prints out the hostname
of the node it is running on.  When I run this (on a bproc machine)
with -np 4 and no host file it launches one process per node on each
of the first 4 avaliable nodes.   ie:

$ mpirun -np 4 ./mpi_hello
n1 hello
n3 hello
n2 hello
n4 hello

What I am trying to get is to launch 2 processes per node, or this output:

$ mpirun -np 4 $magic_arg ./mpi_hello
n1 hello
n1 hello
n2 hello
n2 hello


ita,
Josh