Re: [OMPI devel] RFC: delete mvapi BTL for v1.3
How long will the 1.2 series be maintained? This has been giving some of our customers a bit of heart burn, but it can also be used to help push through the OFED upgrades on the clusters (a good thing). Josh On 10/11/07, Jeff Squyres wrote: > Reminder -- this RFC expires tonight. > > Speak now or forever hold your peace... > > > On Oct 5, 2007, at 7:46 AM, Jeff Squyres wrote: > > > WHAT: Remove the mvapi BTL for the v1.3 release. > > > > WHY: None of the IB vendors want to maintain it anymore; our future > > is OFED. If someone still has mvapi IB drivers, they can use the > > OMPI v1.2 series. > > > > WHERE: svn rm ompi/mca/btl/mvapi > > > > WHEN: Before the v1.3 release. > > > > TIMEOUT: COB, Thurs, Oct 11, 2007 > > > > - > > > > None of the IB vendors are interested in maintaining the "mvapi" BTL > > anymore. Indeed, none of us have updated it with any of the new/ > > interesting/better performance features that went into the openib BTL > > over the past year (or more). Additionally, some changes may be > > coming in the OMPI infrastructure that would *require* some revamping > > in the mvapi BTL -- and no one of Cisco, Voltaire, Mellanox is > > willing to do it. > > > > So we'd like to ditch the mvapi BTL starting with v1.3 and have the > > official guidance be that if you have mvapi, you need to use the OMPI > > v1.2 series (i.e., remove this from the SVN trunk in the Very Near > > Future). > > > > -- > > Jeff Squyres > > Cisco Systems > > > > ___ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] More memory troubles with vapi
On 8/24/07, Jeff Squyres wrote: > > Hmm. If you compile Open MPI with no memory manager, then it > *shouldn't* be Open MPI's fault (unless there's a leak in the mvapi > BTL...?). Verify that you did not actually compile Open MPI with a > memory manager by running "ompi_info| grep ptmalloc2" -- it should > come up empty. I am sure. I have multiple builds that I switch between. One of the apps doesn't work unless I --without-memory-manager (see post to -users about realloc(), with sample code). I noticed that there are a few ./configure --debug type switches, even some dealing with memory. Could those be useful for gathering further data? What features do those provide and how do I use them? > The fact that you can run this under TCP without memory leaking would > seem to indicate that it's not the app that's leaking memory, but > rather either the MPI or the network stack. I should clarify here, this is effectively true. The app crashes from a segfault after running over tcp for several hours, but it gets much farther into the run than the vapi btl does. > > -- > Jeff Squyres > Cisco Systems > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
[OMPI devel] More memory troubles with vapi
We are using open-mpi on several 1000+ node clusters. We received several new clusters using the Infiniserve 3.X software stack recently and are having several problems with the vapi btl (yes, I know, it is very very old and shouldn't be used. I couldn't agree with you more but those are my marching orders). I have a new application that is running into swap for an unknown reason. If I run and force it to use the tcp btl I don't seem to run into swap (the job just takes a very very long time). I have tried restricting the size of the free lists, forcing to use send mode, and use an open-mpi compiled w/ no memory manager but nothing seems to help. I've profiled with valgrind --tool=massif and the memtrace capabilities of ptmalloc but I don't have any smoking guns yet. It is a fortran app an I don't know anything about debugging fortran memory problems, can someone point me in the proper direction? Thanks, Josh
Re: [OMPI devel] Best bw/lat performance for microbenchmark/debug utility
On 6/29/06, Patrick Geoffray wrote: Jeff Squyres (jsquyres) wrote: >> -Original Message- >> From: devel-boun...@open-mpi.org >> [mailto:devel-boun...@open-mpi.org] On Behalf Of Patrick Geoffray >> Sent: Wednesday, June 28, 2006 1:23 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Best bw/lat performance for >> microbenchmark/debug utility >> >> Josh Aune wrote: >>> I am writing up some interconnect/network debugging software that is >>> centered around ompi. What is the best set of functions to > I was assuming that you would be testing latency/bandwidth, but Patrick > is correct in stating that there are many more things to test than just > those two metrics. There are a lot of metrics, but most of them require deep understanding of the MPI semantics and implementation details to make sense. The art of micro-benchmark is to choose the metrics and explain why they matter. It's obvious for latency/bandwidth, a bit less for unexpected and host overhead, definitively hard for overlap and progress. And that's just for point-to-point. To avoid reinventing the wheel, I would suggest to Josh to develop a micro-benchmark test suite to compute a very detailed LogP-derived parameters, ie for all message sizes: * send overhead (o.s) and recv overhead (o.r). These overheads will likely be either constant or linear for various message size ranges, it would be great to automatically compute the ranges. Memory registration cost is accounted here, so it would useful to measure with and without registration cache also. * Latency (L). * Send gap (g.s) and recv gap (g.r). For large messages, they will likely be identical and represent the link bandwidth. For smaller messages, the send gap is the gap of a fan-out pattern (1->N) and the recv gap is the gap of a flat gather (N->1). It's important to not have the send or recv overhead hiding the send or recv gap, using several processes could be used to dive the send/recv overhead. * unexpected overhead (o.u). Overhead added to (o.r) when the message is not immediately matched. * overlap availability (a) that is the percentage of communication time that you can overlap with real host computation. From these parameters, you can derive pretty much all characteristics of an interconnect without contention. Patrick Sorry for the long delay in replying. Thanks for the info. What I am trying to do is create a set of standardized easy to use system level debugging utilites (and force myself to learn more MPI :). Currently I am shooting for latency/bandwidth but would welcome ideas for further useful node level tests. I am not just testing the interconnect, but need to verify memory bandwidth, pci bandwidth to the interconnect card (I love -mca btl ^sm :), processor functionality, system errors (currently only parity and pci-express fatal/nonfatal/etc) and what not. I want to have tests that are easy enough to run all you have to do is 'mpirun -np $ALL ./footest' and it comes back with any nodes that look bad for that test as well as some general data about the cluster's performance. I want to get the suite out to the comunity after I have some seed tests written and hope that there will be enough that others will be interested in contributing, though I am waiting for release approval from work at the moment, which may not happen :( Josh
[OMPI devel] Best bw/lat performance for microbenchmark/debug utility
I am writing up some interconnect/network debugging software that is centered around ompi. What is the best set of functions to use to get the best bandwidth and latency numbers for openmpi and why? I've been asking around at work and some people say just send/recieve, though some of the micro benchmarks I have looked at in the past used isend/irecv. Can someone shed some light on this (or propose more methods)? Thanks, josh
Re: [OMPI devel] process ordering/processes per node
On 4/5/06, Jeff Squyres (jsquyres) wrote: This is going to be influenced by how many processes bproc tells Open MPI can be launched on each node. Check out the FAQ for the -bynode and -byslot arguments to mpirun for more details: I have tried these arguments several times (up through 1.0.2a4) and I always get the same ordering. http://www.open-mpi.org/faq/?category=running#mpirun-scheduling This specific entry uses hostfiles as an example, but the issue is the same for bproc -- the "hostfile" is simply implicitly supplied by bproc (i.e., the node names and the available on each). > -Original Message- > From: devel-boun...@open-mpi.org > [mailto:devel-boun...@open-mpi.org] On Behalf Of Josh Aune > Sent: Friday, March 31, 2006 4:43 PM > To: Open MPI Developers > Subject: [OMPI devel] process ordering/processes per node > > I have a simple hello program where each child prints out the hostname > of the node it is running on. When I run this (on a bproc machine) > with -np 4 and no host file it launches one process per node on each > of the first 4 avaliable nodes. ie: > > $ mpirun -np 4 ./mpi_hello > n1 hello > n3 hello > n2 hello > n4 hello > > What I am trying to get is to launch 2 processes per node, or > this output: > > $ mpirun -np 4 $magic_arg ./mpi_hello > n1 hello > n1 hello > n2 hello > n2 hello > > > ita, > Josh > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] Please add explicit test for sysfs/libsysfs.h
So far, every system I have compiled open-mpi on I have hit this same non-obvious configure failure. In each case I have added --with-openib= and --with-openib-libs=. configure runs just fine till it starts looking for OpenIB and reports that it can't find most of the header files and what not relating to OpenIB and eventually bombs during the checks reporting that OpenIB is not found (even though it really is there). After looking through the config.log I find an error about not being able to find sysfs/libsysfs.h. after installing sysfsutils-devel (fedora core) the compile proceedes w/o a hitch. Having this be an explicit test (at the top of the openib section?) would be wonderful :) Thanks, Josh grep -r "sysfs/libsysfs.h" * ompi/mca/btl/openib/btl_openib_component.c:#include ompi/dynamic-mca/btl/openib/btl_openib_component.c:#include
[OMPI devel] process ordering/processes per node
I have a simple hello program where each child prints out the hostname of the node it is running on. When I run this (on a bproc machine) with -np 4 and no host file it launches one process per node on each of the first 4 avaliable nodes. ie: $ mpirun -np 4 ./mpi_hello n1 hello n3 hello n2 hello n4 hello What I am trying to get is to launch 2 processes per node, or this output: $ mpirun -np 4 $magic_arg ./mpi_hello n1 hello n1 hello n2 hello n2 hello ita, Josh