Re: [OMPI devel] sm_coll segv

2009-08-06 Thread Jeff Squyres

On Aug 6, 2009, at 5:18 PM, Jeff Squyres (jsquyres) wrote:


I'm therefore going to change the mpool string names that btl/sm and
coll/sm are looking for so that they get unique sm mpool modules.




(another case of "I knew what I meant, but that's not what I typed")

I'm going to [try to] change it so that the btl sm and coll sm always  
have separate mpool modules.


--
Jeff Squyres
jsquy...@cisco.com



[OMPI devel] sm_coll segv

2009-08-06 Thread Jeff Squyres
Ok, with Terry's help, I found a segv in the coll sm.  If you run  
without the sm btl, there's an obvious bad parameter that we're  
passing that results in a segv.


LANL -- can you confirm / deny that these are the segv's that you were  
seeing?


While fixing this, I noticed that the sm btl and sm coll are sharing  
an mpool when both are running.  This probably used to be a good idea  
way back when (e.g., when we were using a lot more shmem than we  
needed and core counts were lower), but it seems like a bad idea now  
(e.g., the btl/sm is fairly specific about the size of the mpool that  
is created -- it's just big enough for its data structures).


I'm therefore going to change the mpool string names that btl/sm and  
coll/sm are looking for so that they get unique sm mpool modules.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Device failover on ob1

2009-08-06 Thread Jeff Squyres

Is it time to "svn rm ompi/mca/pml/dr"?


On Aug 4, 2009, at 6:50 AM, Ralph Castain wrote:


Rolf/Mouhamed

Could you get together off-list to discuss the different approaches
and see if/where there is common ground. It would be nice to see an
integrated solution - personally, I would rather not see two
orthogonal approaches unless they can be cleanly separated. Much
better if they could support each other in an intelligent fashion.

On Aug 3, 2009, at 9:49 AM, Pavel Shamis (Pasha) wrote:

>
>
>> I have not, but there should be no difference.  The failover code
>> only gets triggered when an error happens.  Otherwise, there are no
>> differences in the code paths while everything is functioning
>> normally.
> Sounds good. I still did not have time to review the code. I will
> try to do it during this week.
>
> Pasha
>>
>> Rolf
>>
>> On 08/03/09 11:14, Pavel Shamis (Pasha) wrote:
>>> Rolf,
>>> Did you compare latency/bw for failover-enabled code VS trunk ?
>>>
>>> Pasha.
>>>
>>> Rolf Vandevaart wrote:
 Hi folks:

 As some of you know, I have also been looking into implementing
 failover as well.  I took a different approach as I am solving
 the problem within the openib BTL itself.  This of course means
 that this only works for failing from one openib BTL to another
 but that was our area of interest.  This also means that we do
 not need to keep track of fragments as we get them back from the
 completion queue upon failure. We then extract the relevant
 information and repost on the other working endpoint.

 My work has been progressing at http://bitbucket.org/rolfv/ompi-failover
 .

 This only currently works for send semantics so you have to run
 with -mca btl_openib_flags 1.

 Rolf

 On 07/31/09 05:49, Mouhamed Gueye wrote:
> Hi list,
>
> Here is an update on our work concerning device failover.
>
> As many of you suggested, we reoriented our work on ob1 rather
> than dr and we now have a working prototype on top of ob1. The
> approach is to store btl descriptors sent to peers and delete
> them when we receive proof of delivery. So far, we rely on
> completion callback functions, assuming that the message is
> delivered when the completion function is called, that is the
> case of openib. When a btl module fails, it is removed from the
> endpoint's btl list and the next one is used to retransmit
> stored descriptors. No extra-message is transmitted, it only
> consists in additions to the header. It has been mainly tested
> with two IB modules, in both multi-rail (two separate networks)
> and multi-path (a big unique network).
>
> You can grab and test the patch here (applies on top of the
> trunk) :
> http://bitbucket.org/gueyem/ob1-failover/
>
> To compile with failover support, just define --enable-device-
> failover at configure. You can then run a benchmark, disconnect
> a port and see the failover operate.
>
> A little latency increase (~ 2%) is induced by the failover
> layer when no failover occurs. To accelerate the failover
> process on openib, you can try to lower the
> btl_openib_ib_timeout openib parameter to 15 for example instead
> of 20 (default value).
>
> Mouhamed
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


>>>
>>
>>
>
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Improvement of openmpi.spec

2009-08-06 Thread Bogdan Costescu

On Thu, 6 Aug 2009, Jeff Squyres wrote:


>  -Source: openmpi-%{version}.tar.$EXTENSION
>  +Source: %{name}-%{version}.tar.$EXTENSION


The spec file parser defines some of these variables by default. F.e. 
after encountering at the top of the file:


Name: fftw
Version: 2.1.5
Release: 5.bc

something like %{name}-%{version}-%{release} will be expanded to 
fftw-2.1.5-5.bc. So there is no need to define any of these 
variables... The suggestion for improvement here is only cosmetic, it 
looks nicer to not have hardcoded names all over the spec file. This 
makes it easier to later change the name of the package, f.e. to allow 
installing several packages at the same time by simply changing:


Name: fftw2

to allow for the package called 'fftw' to track the 3.x versions. This 
was done previously by Red Hat f.e. for their python packages.


--
Bogdan Costescu

IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany
Phone: +49 6221 54 8240, Fax: +49 6221 54 8850
E-mail: bogdan.coste...@iwr.uni-heidelberg.de


Re: [OMPI devel] Improvement of openmpi.spec

2009-08-06 Thread Jeff Squyres

On Aug 6, 2009, at 9:49 AM, Sylvain Jeaugey wrote:


> -Source: openmpi-%{version}.tar.$EXTENSION
> +Source: %{name}-%{version}.tar.$EXTENSION
>
> Does this mean that you're looking for a different tarball name?   
I'm not
> sure that's good; the tarball should be an openmpi tarball,  
regardless of
> what name it gets installed under (e.g., OFED builds an OMPI  
tarball 3-4
> different ways [one for each compiler] and changes %name, but uses  
the same
> tarball.  How about another param (hey, we've got something like  
100, so
> what's 101? ;-) ) for the tarball that defaults to "openmpi"?   
They if you

> want to have a differently-named tarball, you can.
Well, maybe we could live with an openmpi tarball ... it was just to  
be
consistent. When I build bullmpi-a.b.c.src.rpm, I somehow expect the  
tar

file to be bullmpi-a.b.c.tar.gz.



I'm not opposed to having another variable to set the name of the  
tarball if you want it.


--
Jeff Squyres
jsquy...@cisco.com



Re: [OMPI devel] Improvement of openmpi.spec

2009-08-06 Thread Sylvain Jeaugey

Hi Jeff,

Thanks for reviewing my changes !

On Thu, 6 Aug 2009, Jeff Squyres wrote:


-Source: openmpi-%{version}.tar.$EXTENSION
+Source: %{name}-%{version}.tar.$EXTENSION

Does this mean that you're looking for a different tarball name?  I'm not 
sure that's good; the tarball should be an openmpi tarball, regardless of 
what name it gets installed under (e.g., OFED builds an OMPI tarball 3-4 
different ways [one for each compiler] and changes %name, but uses the same 
tarball.  How about another param (hey, we've got something like 100, so 
what's 101? ;-) ) for the tarball that defaults to "openmpi"?  They if you 
want to have a differently-named tarball, you can.
Well, maybe we could live with an openmpi tarball ... it was just to be 
consistent. When I build bullmpi-a.b.c.src.rpm, I somehow expect the tar 
file to be bullmpi-a.b.c.tar.gz.



-%setup -q -n openmpi-%{version}
+%setup -q -n %{name}-%{version}

Ditto for this.

-%dir %{_libdir}/openmpi
+%dir %{_libdir}/%{name}

Hmm -- is this right?  I thought that the name "openmpi" in this directory 
path came from OMPI's configure script, not from the RPM spec...?  Or is the 
RPM build command passing --pkgname or somesuch to OMPI's configure to 
override the built-in name?
Hum, I guess you're right, this is indeed not something to change. Sorry 
about that.


Sylvain


On Jul 31, 2009, at 11:51 AM, Sylvain Jeaugey wrote:


Hi all,

We had to apply a little set of modifications to the openmpi.spec file to 
help us integrate openmpi in our cluster distribution.


So here is a patch which, as the changelog suggests, does a couple of 
"improvements" :

- Fix a typo in Summary
- Replace openmpi by %{name} in a couple of places
- Add an %{opt_prefix} option to be able to install in a specific path 
(e.g. in /opt//mpi/-/ instead of 
/opt/-)


The patch is done with "hg extract" but should apply on the SVN trunk.

Sylvain___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] Improvement of openmpi.spec

2009-08-06 Thread Jeff Squyres

Thanks!  A few questions about this patch:

-Source: openmpi-%{version}.tar.$EXTENSION
+Source: %{name}-%{version}.tar.$EXTENSION

Does this mean that you're looking for a different tarball name?  I'm  
not sure that's good; the tarball should be an openmpi tarball,  
regardless of what name it gets installed under (e.g., OFED builds an  
OMPI tarball 3-4 different ways [one for each compiler] and changes  
%name, but uses the same tarball.  How about another param (hey, we've  
got something like 100, so what's 101? ;-) ) for the tarball that  
defaults to "openmpi"?  They if you want to have a differently-named  
tarball, you can.


-%setup -q -n openmpi-%{version}
+%setup -q -n %{name}-%{version}

Ditto for this.

-%dir %{_libdir}/openmpi
+%dir %{_libdir}/%{name}

Hmm -- is this right?  I thought that the name "openmpi" in this  
directory path came from OMPI's configure script, not from the RPM  
spec...?  Or is the RPM build command passing --pkgname or somesuch to  
OMPI's configure to override the built-in name?





On Jul 31, 2009, at 11:51 AM, Sylvain Jeaugey wrote:


Hi all,

We had to apply a little set of modifications to the openmpi.spec  
file to help us integrate openmpi in our cluster distribution.


So here is a patch which, as the changelog suggests, does a couple  
of "improvements" :

- Fix a typo in Summary
- Replace openmpi by %{name} in a couple of places
- Add an %{opt_prefix} option to be able to install in a specific  
path (e.g. in /opt//mpi/-/ instead of /opt/ 
-)


The patch is done with "hg extract" but should apply on the SVN trunk.

Sylvain 
___

devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Jeff Squyres
jsquy...@cisco.com



[OMPI devel] Parallel Quicksort

2009-08-06 Thread Prasadcse Perera
Hello All,
This may not be something relates to the forum, so sorry for asking this
first of all :). Currently I have been working on an implementation of
parallel Quicksort using MPI and now I need some standard parallel Quicksort
implementation(s) for a performance evaluation. So can someone recommend me
any available implementation that I can use ?

Thanks all!

Prasad.
-- 
http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=3489381