Re: [OMPI devel] mosix patches

2014-04-24 Thread Ralph Castain
It was never committed to OMPI - it became stuck in a side branch when the 
developer graduated and took a job, and never came across. Given the age, I'd 
suspect that side branch is way out-of-date and probably would need some 
significant effort before it could be brought into the OMPI trunk, assuming 
someone took the effort to do so.


On Apr 24, 2014, at 9:07 AM, Pavel V. Kaygorodov  wrote:

> Hi!
> 
> What is current status of mosix support in OpenMPI?
> I have tried patches from 
> http://www.cs.huji.ac.il/wikis/MediaWiki/mosix/index.php/Process_migration_for_OpenMPI
>  but without any success, even on 1.6 branch.
> 
> With best regards, 
>  Pavel.
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14593.php



Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-24 Thread Ralph Castain

On Apr 24, 2014, at 12:05 PM, Andreas Schäfer  wrote:

> Hey,
> 
> On 14:49 Thu 24 Apr , George Bosilca wrote:
>> On Thu, Apr 24, 2014 at 1:06 PM, Jeff Squyres (jsquyres)
>>  wrote:
>>> The code is unused.  It has been unused for a long time.  It is
>> unlikely to be fixed.
> 
> We'd be using it, probably not in production, but in research and
> teaching -- if it was operational.
> 
> And, as George pointed out, I see a trend towards heterogeneity in
> HPC, to I'd say this feature will be rather more important in the
> future.

We have been hearing about such "trends" for a long time, but have yet to see 
them actually happen. Not saying it couldn't some day - just saying it still 
hasn't happened in production.

> 
>> PS: This code has implications from the datatype engine till up in the
>> MPI layer. It also impacts  the BTL, especially the hand-shake for the
>> one requiring such a protocol. It also has an impact on the external32
>> support in MPI, for some types of architectures. So it's removal
>> should be an extremely cautious and chirurgical operation.
> 
> So, would repairing the code be significantly more complicated than a
> clean extraction?

Unless someone volunteers to fix it, it would seem the question is moot. My 
employer isn't interested, and I'm not sure any of the employer's within the 
OMPI community currently are inclined to support such an effort.

I can't speak to what George is referring to re how it was broken as I honestly 
don't recall the circumstances. We know it has been broken for some time, and 
that nobody really has a setup to test it - we can check that it compiles, but 
I don't think any of us actually have a hetero cluster upon which we could test 
it.

And as my production code friends keep pointing out - if you can't test it, 
then you can't "sell" it.

So here's what I suggest: if someone is willing to take the lead in fixing 
hetero operations, and has the hardware upon which to verify it, then please 
step forward. Otherwise, I agree with Jeff that we should remove it and move on.


> 
> Cheers
> -Andreas
> 
> 
> -- 
> ==
> Andreas Schäfer
> HPC and Grid Computing
> Chair of Computer Science 3
> Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
> +49 9131 85-27910
> PGP/GPG key via keyserver
> http://www.libgeodecomp.org
> ==
> 
> (\___/)
> (+'.'+)
> (")_(")
> This is Bunny. Copy and paste Bunny into your
> signature to help him gain world domination!
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14598.php



Re: [OMPI devel] RFC: Well-known mca parameters

2014-04-24 Thread Ralph Castain
Just for clarification: are you proposing that we *require* every component 
that links against an external library to include these parameters? If so, that 
seems a significant requirement as quite a few of our components do so.

On the other hand, if you are proposing that those component developers who 
choose to expose such information do so using the suggested syntax, then that 
is a different proposal.

Just want to understand what you are proposing - a requirement on components, 
or a syntax for those who choose to support this capability?

FWIW: we do not (and cannot, for licensing reasons) link against Slurm, so 
please don't include it in such lists to avoid giving anyone even a passing 
thought that we do so.



On Apr 23, 2014, at 10:38 PM, Mike Dubman  wrote:

> 
> WHAT:
> * Formalize well-known MCA parameters that can be used by any component to 
> represent external dependencies for this component. 
> 
> * Component can set that well-known MCA r/o parameters to expose to the 
> end-users different setup related traits of the OMPI installation.
> 
> Example:
> 
> ompi_info can print for every component which depends on external library:
> - ext lib runtime version used by component
> - ext lib compiletime version used by component
> 
> slurm: v2.6.6
> mtl/mxm: v2.5
> btl/verbs: v3.2
> btl/usnic: v1.1
> coll/fca: v2.5
> ...
> 
> End-user or site admin or OMPI vendor can aggregate this information by some 
> script and generate report if given installation compiles with site/vendor 
> rules.
> 
> * The "well-known" mca parameters can be easily extracted from ALL components 
> by grep-like utilities.
> 
> * Current proposal:
> 
> ** prefix each well-known MCA param with "print_":
> ** Define two well-known mca parameters indicating external library runtime 
> and compiletime versions, i.e.:
>  
> print_compiletime_version
> print_runtime_version
> 
> The following command will show all exposed well-known mca params from all 
> components:
> ompi_info --parsable -l 9 |grep ":print_"
> 
> 
> WHY:
> 
> * Better support-ability: site/vendor can provide script which will check if 
> OMPI installation complies to release notes or support matrix.
> 
> 
> WHEN:
> 
> - Next teleconf
> - code can be observed here: https://svn.open-mpi.org/trac/ompi/ticket/4556
>   
> 
> Comments?
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14590.php



Re: [OMPI devel] MPI_Recv_init_null_c from intel test suite fails vs ompi trunk

2014-04-24 Thread George Bosilca
The problem was not in the start but in the wait (hint: the status is
set in the wait). The difference I guess is r27880, which seems not to
be in the 1.8.

So, the 1.8 is not returning the correct status for persistent
inactive requests, but it does the right thing for MPI_PROC_NULL bound
requests.

  George.


On Thu, Apr 24, 2014 at 6:19 PM, Jeff Squyres (jsquyres)
 wrote:
> George --
>
> Any idea why it isn't failing on the v1.8 branch?  The only major difference 
> I see between mpi/c/start.c between trunk and v1.8 is your change.
>
>
>
> On Apr 24, 2014, at 2:08 PM, George Bosilca  wrote:
>
>> r31524 is fixing this corner case. The problem was that persistent
>> request with MPI_RPOC_NULL were never activated, so the wait* function
>> was taking the if corresponding to inactive requests.
>>
>>  George.
>>
>> On Thu, Apr 24, 2014 at 12:14 AM, Gilles Gouaillardet
>>  wrote:
>>> Folks,
>>>
>>> Here is attached an oversimplified version of the MPI_Recv_init_null_c
>>> test from the
>>> intel test suite.
>>>
>>> the test works fine with v1.6, v1.7 and v1.8 branches but fails with the
>>> trunk.
>>>
>>> i wonder wether the bug is in OpenMPI or the test itself.
>>>
>>> on one hand, we could consider there is a bug in OpenMPI :
>>> status.MPI_SOURCE should be MPI_PROC_NULL since we explicitly posted a
>>> recv request with MPI_PROC_NULL.
>>>
>>> on the other hand, (mpi specs, chapter 3.7.3 and
>>> https://svn.open-mpi.org/trac/ompi/ticket/3475)
>>> we could consider the returned value is not significant, and hence
>>> MPI_Wait should return an
>>> empty status (and empty status has source=MPI_ANY_SOURCE per the MPI specs).
>>>
>>> for what it's worth, this test is a success with mpich (e.g.
>>> status.MPI_SOURCE is MPI_PROC_NULL).
>>>
>>>
>>> what is the correct interpretation of the MPI specs and what should be
>>> done ?
>>> (e.g. fix OpenMPI or fix/skip the test ?)
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14589.php
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/04/14596.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14599.php


Re: [OMPI devel] MPI_Recv_init_null_c from intel test suite fails vs ompi trunk

2014-04-24 Thread Jeff Squyres (jsquyres)
George --

Any idea why it isn't failing on the v1.8 branch?  The only major difference I 
see between mpi/c/start.c between trunk and v1.8 is your change.



On Apr 24, 2014, at 2:08 PM, George Bosilca  wrote:

> r31524 is fixing this corner case. The problem was that persistent
> request with MPI_RPOC_NULL were never activated, so the wait* function
> was taking the if corresponding to inactive requests.
> 
>  George.
> 
> On Thu, Apr 24, 2014 at 12:14 AM, Gilles Gouaillardet
>  wrote:
>> Folks,
>> 
>> Here is attached an oversimplified version of the MPI_Recv_init_null_c
>> test from the
>> intel test suite.
>> 
>> the test works fine with v1.6, v1.7 and v1.8 branches but fails with the
>> trunk.
>> 
>> i wonder wether the bug is in OpenMPI or the test itself.
>> 
>> on one hand, we could consider there is a bug in OpenMPI :
>> status.MPI_SOURCE should be MPI_PROC_NULL since we explicitly posted a
>> recv request with MPI_PROC_NULL.
>> 
>> on the other hand, (mpi specs, chapter 3.7.3 and
>> https://svn.open-mpi.org/trac/ompi/ticket/3475)
>> we could consider the returned value is not significant, and hence
>> MPI_Wait should return an
>> empty status (and empty status has source=MPI_ANY_SOURCE per the MPI specs).
>> 
>> for what it's worth, this test is a success with mpich (e.g.
>> status.MPI_SOURCE is MPI_PROC_NULL).
>> 
>> 
>> what is the correct interpretation of the MPI specs and what should be
>> done ?
>> (e.g. fix OpenMPI or fix/skip the test ?)
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2014/04/14589.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14596.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-24 Thread Andreas Schäfer
Hey,

On 14:49 Thu 24 Apr , George Bosilca wrote:
> On Thu, Apr 24, 2014 at 1:06 PM, Jeff Squyres (jsquyres)
>  wrote:
> > The code is unused.  It has been unused for a long time.  It is
> unlikely to be fixed.

We'd be using it, probably not in production, but in research and
teaching -- if it was operational.

And, as George pointed out, I see a trend towards heterogeneity in
HPC, to I'd say this feature will be rather more important in the
future.

> PS: This code has implications from the datatype engine till up in the
> MPI layer. It also impacts  the BTL, especially the hand-shake for the
> one requiring such a protocol. It also has an impact on the external32
> support in MPI, for some types of architectures. So it's removal
> should be an extremely cautious and chirurgical operation.

So, would repairing the code be significantly more complicated than a
clean extraction?

Cheers
-Andreas


-- 
==
Andreas Schäfer
HPC and Grid Computing
Chair of Computer Science 3
Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
+49 9131 85-27910
PGP/GPG key via keyserver
http://www.libgeodecomp.org
==

(\___/)
(+'.'+)
(")_(")
This is Bunny. Copy and paste Bunny into your
signature to help him gain world domination!


signature.asc
Description: Digital signature


Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-24 Thread George Bosilca
On Thu, Apr 24, 2014 at 1:06 PM, Jeff Squyres (jsquyres)
 wrote:
> On Apr 24, 2014, at 12:54 PM, George Bosilca  wrote:
>
>> There seems to be an opportunity to still have heterogeneous environment in 
>> the future.
>> http://www.enterprisetech.com/2014/04/23/ibm-google-show-power8-systems-openpower-efforts/
>
> How so?

As the link I sent highlight, there is a push, a reasonable effort, to
bring another processor family into mainstream. This open the
potential for the dawn of heterogeneous data centers, thus the need
for at least some basic support for heterogeneous environments.

>
>> I don’t think it is fair to shift the burden on the original developer 
>> instead of the committer who broke a feature.
>
> I don't see how your comment is related to this RFC.

Because I have the feeling the logic behind the RFC is: it is broken
and must be removed because nobody wants to fix it. And I don't agree
with this logic. This particular code was working and was used but
incompetence and carelessness (in any arbitrary order) broke it.

>
> The code is unused.  It has been unused for a long time.  It is unlikely to 
> be fixed.

I wrote a significant portion of the code pinpointed in this RFC, and
maintained it for a reasonable amount of time, despite a number of
careless commits. But today, you are right, I have no intention in
fixing it anymore, and I don't think anybody wants to volunteer for
such a chore.

  George.

PS: This code has implications from the datatype engine till up in the
MPI layer. It also impacts  the BTL, especially the hand-shake for the
one requiring such a protocol. It also has an impact on the external32
support in MPI, for some types of architectures. So it's removal
should be an extremely cautious and chirurgical operation.

>
> Why not remove it?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14595.php


Re: [OMPI devel] MPI_Recv_init_null_c from intel test suite fails vs ompi trunk

2014-04-24 Thread George Bosilca
r31524 is fixing this corner case. The problem was that persistent
request with MPI_RPOC_NULL were never activated, so the wait* function
was taking the if corresponding to inactive requests.

  George.

On Thu, Apr 24, 2014 at 12:14 AM, Gilles Gouaillardet
 wrote:
> Folks,
>
> Here is attached an oversimplified version of the MPI_Recv_init_null_c
> test from the
> intel test suite.
>
> the test works fine with v1.6, v1.7 and v1.8 branches but fails with the
> trunk.
>
> i wonder wether the bug is in OpenMPI or the test itself.
>
> on one hand, we could consider there is a bug in OpenMPI :
> status.MPI_SOURCE should be MPI_PROC_NULL since we explicitly posted a
> recv request with MPI_PROC_NULL.
>
> on the other hand, (mpi specs, chapter 3.7.3 and
> https://svn.open-mpi.org/trac/ompi/ticket/3475)
> we could consider the returned value is not significant, and hence
> MPI_Wait should return an
> empty status (and empty status has source=MPI_ANY_SOURCE per the MPI specs).
>
> for what it's worth, this test is a success with mpich (e.g.
> status.MPI_SOURCE is MPI_PROC_NULL).
>
>
> what is the correct interpretation of the MPI specs and what should be
> done ?
> (e.g. fix OpenMPI or fix/skip the test ?)
>
> Cheers,
>
> Gilles
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14589.php


Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-24 Thread Jeff Squyres (jsquyres)
On Apr 24, 2014, at 12:54 PM, George Bosilca  wrote:

> There seems to be an opportunity to still have heterogeneous environment in 
> the future.
> http://www.enterprisetech.com/2014/04/23/ibm-google-show-power8-systems-openpower-efforts/

How so?

> I don’t think it is fair to shift the burden on the original developer 
> instead of the committer who broke a feature. 

I don't see how your comment is related to this RFC.

The code is unused.  It has been unused for a long time.  It is unlikely to be 
fixed.

Why not remove it?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] RFC: Remove heterogeneous support

2014-04-24 Thread George Bosilca
There seems to be an opportunity to still have heterogeneous environment in the 
future.
http://www.enterprisetech.com/2014/04/23/ibm-google-show-power8-systems-openpower-efforts/

I don’t think it is fair to shift the burden on the original developer instead 
of the committer who broke a feature. 

 George.

On Apr 23, 2014, at 09:49 , Jeff Squyres (jsquyres)  wrote:

> WHAT: Remove data-heterogeneous support from Open MPI
> 
> WHY: No one uses it (it's not the default), it's broken (probably has been 
> for a while)
> 
> WHERE: Datatype engine, some configury, and a few other places
> 
> TIMEOUT: Tuesday teleconf, 6 May 2014 (i.e., 2 weeks from now)
> 
> MORE DETAIL:
> 
> It recently came to my attention that we seem to have some bit rot in the 
> heterogeneous data representation support such that if you configure with 
> --enable-heterogeneous, even if you run on homogeneous machines, you can get 
> segv's with tcp,sm,self.
> 
> The heterogeneous support has never been enabled by default.  AFAIK, only 
> Cisco tests it regularly in its MTT.  I'm be greatly surprised if many (any?) 
> users use it at all.
> 
> So I have to ask myself: why do we keep this functionality around?  It seems 
> like we should delete this code, simplify things a little, and move on.
> 
> Comments?
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14584.php



[OMPI devel] mosix patches

2014-04-24 Thread Pavel V. Kaygorodov
Hi!

What is current status of mosix support in OpenMPI?
I have tried patches from 
http://www.cs.huji.ac.il/wikis/MediaWiki/mosix/index.php/Process_migration_for_OpenMPI
 but without any success, even on 1.6 branch.

With best regards, 
  Pavel.



Re: [OMPI devel] Bug report: non-blocking allreduce with user-defined operation gives segfault

2014-04-24 Thread Rupert Nash
Hi George,

Having looked again you're correct about the two 2buf reductions being wrong. 
For now, I've updated my patch of nbc.c to copy buf1 into buf3 and then do buf3 
OP= buf2 (see below).

Patching ompi_3buff_op_reduce to cope with user-defined operations is certainly 
possible, but I don't really understand the implications of doing that for the 
rest of the codebase (this is the first time I've looked at the internals of 
OpenMPI).

Best,
Rupert

if (ompi_op_is_intrinsic(opargs.op)) {
  /* This does buf3 = buf1 OP buf2 */
  ompi_3buff_op_reduce(opargs.op, buf1, buf2, buf3, opargs.count, 
opargs.datatype);
} else {
  /* Copy buf1 -> buf3 (if necessary)
   * then do buf3 OP= buf2
   * If the output is the same as the first input, we don't need to copy
   * This only applies to the second if the operator commutes */
  if (buf1 == buf3) {
ompi_op_reduce(opargs.op, buf2, buf3, opargs.count, 
opargs.datatype);
  } else if (buf2 == buf3 && ompi_op_is_commute(opargs.op)) {
ompi_op_reduce(opargs.op, buf1, buf3, opargs.count, 
opargs.datatype);
  } else {
res = NBC_Copy(buf1, opargs.count, opargs.datatype, buf3, 
opargs.count, opargs.datatype, handle->comm);
if(res != NBC_OK) { printf("NBC_Copy() failed (code: %i)\n", res); 
ret=res; goto error; }
ompi_op_reduce(opargs.op, buf2, buf3, opargs.count, 
opargs.datatype);
  }
}


> Rupert, 
> 
> You are right, the code of any non-blocking reduce is not built with 
> user-level op in mind. However, I'm not sure about your patch. One 
> reason is that ompi_3buff is doing target = source1 op source2 while 
>ompi_2buf is doing target op= source (notice the op=) 
> 
> Thus you can't replace ompi_3buff by 2 ompi_2buff because you 
> basically replace target = source1 op source2 by target op= source1 op 
> source2 
> 
> Moreover, I much nicer solution will be to patch directly the 
> ompi_3buff_op_reduce function in op.h to fallback to a user defined 
> function when necessary. 
> 
>   George. 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


[OMPI devel] Patch to fix valgrind warning

2014-04-24 Thread Lisandro Dalcin
Please review the attached patch,

==19533== Conditional jump or move depends on uninitialised value(s)
==19533==at 0x140DAB78: component_select (osc_sm_component.c:352)
==19533==by 0xD9BA0B2: ompi_osc_base_select (osc_base_init.c:73)
==19533==by 0xD9314C1: ompi_win_allocate (win.c:182)
==19533==by 0xD982C4E: PMPI_Win_allocate (pwin_allocate.c:79)
==19533==by 0xD628887: __pyx_pw_6mpi4py_3MPI_3Win_11Allocate
(mpi4py.MPI.c:109170)
==19533==by 0x38442E0BD3: PyEval_EvalFrameEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E21EC: PyEval_EvalCodeEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E22F1: PyEval_EvalCode (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F20DB: PyImport_ExecCodeModuleEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2357: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2FF0: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F323C: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==
==19533== Conditional jump or move depends on uninitialised value(s)
==19533==at 0x140DAB78: component_select (osc_sm_component.c:352)
==19533==by 0xD9BA0B2: ompi_osc_base_select (osc_base_init.c:73)
==19533==by 0xD93174D: ompi_win_allocate_shared (win.c:213)
==19533==by 0xD982FD0: PMPI_Win_allocate_shared (pwin_allocate_shared.c:80)
==19533==by 0xD62C727:
__pyx_pw_6mpi4py_3MPI_3Win_13Allocate_shared (mpi4py.MPI.c:109409)
==19533==by 0x38442E0BD3: PyEval_EvalFrameEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E21EC: PyEval_EvalCodeEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442E22F1: PyEval_EvalCode (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F20DB: PyImport_ExecCodeModuleEx (in
/usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2357: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F2FF0: ??? (in /usr/lib64/libpython2.7.so.1.0)
==19533==by 0x38442F323C: ??? (in /usr/lib64/libpython2.7.so.1.0)


-- 
Lisandro Dalcin
---
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169
diff -up ompi/mca/osc/sm/osc_sm_component.c.orig 
ompi/mca/osc/sm/osc_sm_component.c
--- ompi/mca/osc/sm/osc_sm_component.c.orig 2014-04-24 10:28:58.790702380 
+0300
+++ ompi/mca/osc/sm/osc_sm_component.c  2014-04-24 10:30:15.138137733 +0300
@@ -341,7 +341,7 @@ component_select(struct ompi_win_t *win,
 #if HAVE_PTHREAD_CONDATTR_SETPSHARED && HAVE_PTHREAD_MUTEXATTR_SETPSHARED
 pthread_mutexattr_t mattr;
 pthread_condattr_t cattr;
-bool blocking_fence;
+bool blocking_fence = false;
 int flag;
 
 if (OMPI_SUCCESS != ompi_info_get_bool(info, "blocking_fence",
@@ -349,7 +349,7 @@ component_select(struct ompi_win_t *win,
 goto error;
 }
 
-if (blocking_fence) {
+if (flag && blocking_fence) {
 ret = pthread_mutexattr_init(&mattr);
 ret = pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
 if (ret != 0) {


[OMPI devel] RFC: Well-known mca parameters

2014-04-24 Thread Mike Dubman
WHAT:
* Formalize well-known MCA parameters that can be used by any component to
represent external dependencies for this component.

* Component can set that well-known MCA r/o parameters to expose to the
end-users different setup related traits of the OMPI installation.

Example:

ompi_info can print for every component which depends on external library:
- ext lib runtime version used by component
- ext lib compiletime version used by component

slurm: v2.6.6
mtl/mxm: v2.5
btl/verbs: v3.2
btl/usnic: v1.1
coll/fca: v2.5
...

End-user or site admin or OMPI vendor can aggregate this information by
some script and generate report if given installation compiles with
site/vendor rules.

* The "well-known" mca parameters can be easily extracted from ALL
components by grep-like utilities.

* Current proposal:

** prefix each well-known MCA param with "print_":
** Define two well-known mca parameters indicating external library runtime
and compiletime versions, i.e.:

print_compiletime_version
print_runtime_version

The following command will show all exposed well-known mca params from all
components:
ompi_info --parsable -l 9 |grep ":print_"


WHY:

* Better support-ability: site/vendor can provide script which will check
if OMPI installation complies to release notes or support matrix.


WHEN:

- Next teleconf
- code can be observed here: https://svn.open-mpi.org/trac/ompi/ticket/4556


Comments?


[OMPI devel] MPI_Recv_init_null_c from intel test suite fails vs ompi trunk

2014-04-24 Thread Gilles Gouaillardet
Folks,

Here is attached an oversimplified version of the MPI_Recv_init_null_c
test from the
intel test suite.

the test works fine with v1.6, v1.7 and v1.8 branches but fails with the
trunk.

i wonder wether the bug is in OpenMPI or the test itself.

on one hand, we could consider there is a bug in OpenMPI :
status.MPI_SOURCE should be MPI_PROC_NULL since we explicitly posted a
recv request with MPI_PROC_NULL.

on the other hand, (mpi specs, chapter 3.7.3 and
https://svn.open-mpi.org/trac/ompi/ticket/3475)
we could consider the returned value is not significant, and hence
MPI_Wait should return an
empty status (and empty status has source=MPI_ANY_SOURCE per the MPI specs).

for what it's worth, this test is a success with mpich (e.g.
status.MPI_SOURCE is MPI_PROC_NULL).


what is the correct interpretation of the MPI specs and what should be
done ?
(e.g. fix OpenMPI or fix/skip the test ?)

Cheers,

Gilles
/*
 *  This test program is an over simplified version of the
 *  MPI_Recv_init_null_c test from the intel test suite.
 *
 *  It can be ran on one task :
 *  mpirun -np 1 -host localhost ./a.out
 *
 *  when ran on the trunk, since r28431, the test will fail :
 *  status.MPI_SOURCE is MPI_ANY_SOURCE instead of MPI_PROC_NULL
 *
 * Copyright (c) 2014  Research Organization for Information Science
 * and Technology (RIST). All rights reserved.
 */
#include 
#include 

int main (int argc, char *argv[]) {
MPI_Status status;
MPI_Request req;
int ierr;

MPI_Init(&argc, &argv);

ierr = MPI_Recv_init(NULL, 0, MPI_INT, MPI_PROC_NULL, MPI_ANY_TAG, 
MPI_COMM_WORLD, &req);
if (ierr != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, 1);

ierr = MPI_Start(&req);
if (ierr != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, 2);

ierr = MPI_Wait(&req, &status);
if (ierr != MPI_SUCCESS) MPI_Abort(MPI_COMM_WORLD, 3);

if (MPI_PROC_NULL != status.MPI_SOURCE) {
if (MPI_ANY_SOURCE == status.MPI_SOURCE) {
printf("got MPI_ANY_SOURCE=%d instead of MPI_PROC_NULL=%d\n", 
status.MPI_SOURCE, MPI_PROC_NULL);
} else {
printf("got %d instead of MPI_PROC_NULL=%d\n", status.MPI_SOURCE, 
MPI_PROC_NULL);
}
} else {
printf("OK\n");
}

MPI_Finalize();
return 0;
}