Re: [OMPI devel] sm_coll segv
On Aug 6, 2009, at 5:18 PM, Jeff Squyres (jsquyres) wrote: I'm therefore going to change the mpool string names that btl/sm and coll/sm are looking for so that they get unique sm mpool modules. (another case of "I knew what I meant, but that's not what I typed") I'm going to [try to] change it so that the btl sm and coll sm always have separate mpool modules. -- Jeff Squyres jsquy...@cisco.com
[OMPI devel] sm_coll segv
Ok, with Terry's help, I found a segv in the coll sm. If you run without the sm btl, there's an obvious bad parameter that we're passing that results in a segv. LANL -- can you confirm / deny that these are the segv's that you were seeing? While fixing this, I noticed that the sm btl and sm coll are sharing an mpool when both are running. This probably used to be a good idea way back when (e.g., when we were using a lot more shmem than we needed and core counts were lower), but it seems like a bad idea now (e.g., the btl/sm is fairly specific about the size of the mpool that is created -- it's just big enough for its data structures). I'm therefore going to change the mpool string names that btl/sm and coll/sm are looking for so that they get unique sm mpool modules. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] Device failover on ob1
Is it time to "svn rm ompi/mca/pml/dr"? On Aug 4, 2009, at 6:50 AM, Ralph Castain wrote: Rolf/Mouhamed Could you get together off-list to discuss the different approaches and see if/where there is common ground. It would be nice to see an integrated solution - personally, I would rather not see two orthogonal approaches unless they can be cleanly separated. Much better if they could support each other in an intelligent fashion. On Aug 3, 2009, at 9:49 AM, Pavel Shamis (Pasha) wrote: > > >> I have not, but there should be no difference. The failover code >> only gets triggered when an error happens. Otherwise, there are no >> differences in the code paths while everything is functioning >> normally. > Sounds good. I still did not have time to review the code. I will > try to do it during this week. > > Pasha >> >> Rolf >> >> On 08/03/09 11:14, Pavel Shamis (Pasha) wrote: >>> Rolf, >>> Did you compare latency/bw for failover-enabled code VS trunk ? >>> >>> Pasha. >>> >>> Rolf Vandevaart wrote: Hi folks: As some of you know, I have also been looking into implementing failover as well. I took a different approach as I am solving the problem within the openib BTL itself. This of course means that this only works for failing from one openib BTL to another but that was our area of interest. This also means that we do not need to keep track of fragments as we get them back from the completion queue upon failure. We then extract the relevant information and repost on the other working endpoint. My work has been progressing at http://bitbucket.org/rolfv/ompi-failover . This only currently works for send semantics so you have to run with -mca btl_openib_flags 1. Rolf On 07/31/09 05:49, Mouhamed Gueye wrote: > Hi list, > > Here is an update on our work concerning device failover. > > As many of you suggested, we reoriented our work on ob1 rather > than dr and we now have a working prototype on top of ob1. The > approach is to store btl descriptors sent to peers and delete > them when we receive proof of delivery. So far, we rely on > completion callback functions, assuming that the message is > delivered when the completion function is called, that is the > case of openib. When a btl module fails, it is removed from the > endpoint's btl list and the next one is used to retransmit > stored descriptors. No extra-message is transmitted, it only > consists in additions to the header. It has been mainly tested > with two IB modules, in both multi-rail (two separate networks) > and multi-path (a big unique network). > > You can grab and test the patch here (applies on top of the > trunk) : > http://bitbucket.org/gueyem/ob1-failover/ > > To compile with failover support, just define --enable-device- > failover at configure. You can then run a benchmark, disconnect > a port and see the failover operate. > > A little latency increase (~ 2%) is induced by the failover > layer when no failover occurs. To accelerate the failover > process on openib, you can try to lower the > btl_openib_ib_timeout openib parameter to 15 for example instead > of 20 (default value). > > Mouhamed > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] Improvement of openmpi.spec
On Thu, 6 Aug 2009, Jeff Squyres wrote: > -Source: openmpi-%{version}.tar.$EXTENSION > +Source: %{name}-%{version}.tar.$EXTENSION The spec file parser defines some of these variables by default. F.e. after encountering at the top of the file: Name: fftw Version: 2.1.5 Release: 5.bc something like %{name}-%{version}-%{release} will be expanded to fftw-2.1.5-5.bc. So there is no need to define any of these variables... The suggestion for improvement here is only cosmetic, it looks nicer to not have hardcoded names all over the spec file. This makes it easier to later change the name of the package, f.e. to allow installing several packages at the same time by simply changing: Name: fftw2 to allow for the package called 'fftw' to track the 3.x versions. This was done previously by Red Hat f.e. for their python packages. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.coste...@iwr.uni-heidelberg.de
Re: [OMPI devel] Improvement of openmpi.spec
On Aug 6, 2009, at 9:49 AM, Sylvain Jeaugey wrote: > -Source: openmpi-%{version}.tar.$EXTENSION > +Source: %{name}-%{version}.tar.$EXTENSION > > Does this mean that you're looking for a different tarball name? I'm not > sure that's good; the tarball should be an openmpi tarball, regardless of > what name it gets installed under (e.g., OFED builds an OMPI tarball 3-4 > different ways [one for each compiler] and changes %name, but uses the same > tarball. How about another param (hey, we've got something like 100, so > what's 101? ;-) ) for the tarball that defaults to "openmpi"? They if you > want to have a differently-named tarball, you can. Well, maybe we could live with an openmpi tarball ... it was just to be consistent. When I build bullmpi-a.b.c.src.rpm, I somehow expect the tar file to be bullmpi-a.b.c.tar.gz. I'm not opposed to having another variable to set the name of the tarball if you want it. -- Jeff Squyres jsquy...@cisco.com
Re: [OMPI devel] Improvement of openmpi.spec
Hi Jeff, Thanks for reviewing my changes ! On Thu, 6 Aug 2009, Jeff Squyres wrote: -Source: openmpi-%{version}.tar.$EXTENSION +Source: %{name}-%{version}.tar.$EXTENSION Does this mean that you're looking for a different tarball name? I'm not sure that's good; the tarball should be an openmpi tarball, regardless of what name it gets installed under (e.g., OFED builds an OMPI tarball 3-4 different ways [one for each compiler] and changes %name, but uses the same tarball. How about another param (hey, we've got something like 100, so what's 101? ;-) ) for the tarball that defaults to "openmpi"? They if you want to have a differently-named tarball, you can. Well, maybe we could live with an openmpi tarball ... it was just to be consistent. When I build bullmpi-a.b.c.src.rpm, I somehow expect the tar file to be bullmpi-a.b.c.tar.gz. -%setup -q -n openmpi-%{version} +%setup -q -n %{name}-%{version} Ditto for this. -%dir %{_libdir}/openmpi +%dir %{_libdir}/%{name} Hmm -- is this right? I thought that the name "openmpi" in this directory path came from OMPI's configure script, not from the RPM spec...? Or is the RPM build command passing --pkgname or somesuch to OMPI's configure to override the built-in name? Hum, I guess you're right, this is indeed not something to change. Sorry about that. Sylvain On Jul 31, 2009, at 11:51 AM, Sylvain Jeaugey wrote: Hi all, We had to apply a little set of modifications to the openmpi.spec file to help us integrate openmpi in our cluster distribution. So here is a patch which, as the changelog suggests, does a couple of "improvements" : - Fix a typo in Summary - Replace openmpi by %{name} in a couple of places - Add an %{opt_prefix} option to be able to install in a specific path (e.g. in /opt//mpi/-/ instead of /opt/-) The patch is done with "hg extract" but should apply on the SVN trunk. Sylvain___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Improvement of openmpi.spec
Thanks! A few questions about this patch: -Source: openmpi-%{version}.tar.$EXTENSION +Source: %{name}-%{version}.tar.$EXTENSION Does this mean that you're looking for a different tarball name? I'm not sure that's good; the tarball should be an openmpi tarball, regardless of what name it gets installed under (e.g., OFED builds an OMPI tarball 3-4 different ways [one for each compiler] and changes %name, but uses the same tarball. How about another param (hey, we've got something like 100, so what's 101? ;-) ) for the tarball that defaults to "openmpi"? They if you want to have a differently-named tarball, you can. -%setup -q -n openmpi-%{version} +%setup -q -n %{name}-%{version} Ditto for this. -%dir %{_libdir}/openmpi +%dir %{_libdir}/%{name} Hmm -- is this right? I thought that the name "openmpi" in this directory path came from OMPI's configure script, not from the RPM spec...? Or is the RPM build command passing --pkgname or somesuch to OMPI's configure to override the built-in name? On Jul 31, 2009, at 11:51 AM, Sylvain Jeaugey wrote: Hi all, We had to apply a little set of modifications to the openmpi.spec file to help us integrate openmpi in our cluster distribution. So here is a patch which, as the changelog suggests, does a couple of "improvements" : - Fix a typo in Summary - Replace openmpi by %{name} in a couple of places - Add an %{opt_prefix} option to be able to install in a specific path (e.g. in /opt//mpi/-/ instead of /opt/ -) The patch is done with "hg extract" but should apply on the SVN trunk. Sylvain ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com
[OMPI devel] Parallel Quicksort
Hello All, This may not be something relates to the forum, so sorry for asking this first of all :). Currently I have been working on an implementation of parallel Quicksort using MPI and now I need some standard parallel Quicksort implementation(s) for a performance evaluation. So can someone recommend me any available implementation that I can use ? Thanks all! Prasad. -- http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=3489381