Re: [OMPI devel] SHMEM, "mpp/shmem.fh", CMake and infinite loops

2016-07-13 Thread Paul Kapinos

Hi Gilles,

On 07/13/16 01:10, Gilles Gouaillardet wrote:

Paul,

The two header files in include/mpp simply include the file with the same name
in the upper directory.


Yessir!
(and CMake do not care about the upper directory and build infinite loop)



A simple workaround is to replace these two files in include/mpp with symbolic
links to files with the same name in the upper directory.

Would you mind giving this a try ?


It work very well, at least for the one test case provided. So yes, patching any 
installation of Open MPI could be a workaround. However we would really love to 
avoid this need to patch any Open MPI installation


Maybe OpenMPI's developer could think about how-to minimize the probability of 
such loops? Symlink is one alternative, another one would be renaming one of the 
headers..

we fully trust to Open MPI's developers expertise in this :-)

Have a nice day,

Paul Kapinos


pk224850@linuxc2:/opt/MPI/openmpi-1.8.1/linux/intel/include[519]$ ls -la 
mpp/shmem.fh

lrwxrwxrwx 1 pk224850 pk224850 11 Jul 13 13:20 mpp/shmem.fh -> ../shmem.fh



Cheers,

Gilles

On Wednesday, July 13, 2016, Paul Kapinos mailto:kapi...@itc.rwth-aachen.de>> wrote:

Dear OpenMPI developer,

we have some troubles when using OpenMPI and CMake on codes using 'SHMEM'.

Cf. 'man shmem_swap',
 >   Fortran:
 >   INCLUDE "mpp/shmem.fh"

Yes here is one such header file:
 > openmpi-1.X.Y/oshmem/include/mpp/shmem.fh
... since version 1.7. at least.


The significnat content is this line:
 >  include 'shmem.fh'
whereby OpenMPI mean to include not the same file by itself (= infinite
loop!) but I believe these one file:
 > openmpi-1.X.Y/oshmem/include/shmem.fh

(The above paths are in the source code distributions; in the installation
the files are located here:  include/shmem.fh  include/mpp/shmem.fh)


This works. Unless you start using CMake. Because CMake is 'intelligent' and
try to add the search paths recursively, (I believe,) gloriously enabling
the infinite loop by including the 'shmem.fh' file from the 'shmem.fh' file.

Steps to repriduce:
$ mkdir build; cd build; cmake ..
$ make

The second one command need some minute(s), sticking by the 'Scanning
dependencies of target mpihelloworld' step.

If connecting by 'strace -p ' to the 'cmake' process you will see lines
like below, again and again. So I think CMake just include the 'shmem.fh'
file from itself unless the stack is full / a limit is reached / the moon
shines, and thus hangs for a while (seconds/minutes) in the 'Scanning
dependencies...' state.

*Well, maybe having a file including the same file is not that good?*
If the file 'include/mpp/shmem.fh' would include not 'shmem.fh' but
'somethingelse.fh' located in 'include/...' these infinite loop would be
impossible at all...

And by the way: is here a way to limit the maximum include depths in CMake
for header files? This would workaround this one 'infinite include loop'
issue...

Have a nice day,

Paul Kapinos

..

access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", 
R_OK)
= 0
stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
{st_mode=S_IFREG|0644, st_size=205, ...}) = 0
open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
O_RDONLY) = 5271
fstat(5271, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f08457d2000
read(5271, "!\n!   Copyright (c) 2013  Me"..., 32768) = 205
read(5271, "", 32768)   = 0


access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", 
R_OK)
= 0
stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
{st_mode=S_IFREG|0644, st_size=205, ...}) = 0
open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
O_RDONLY) = 5272
fstat(5272, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7f08457ca000
read(5272, "!\n!   Copyright (c) 2013  Me"..., 32768) = 205
read(5272, "", 32768)   = 0
..

--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19195.php




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, IT Center
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Descripti

Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Howard Pritchard
Hi Eric,

Thanks very much for finding this problem.   We decided in order to have a
reasonably timely
release, that we'd triage issues and turn around a new RC if something
drastic
appeared.  We want to fix this issue (and it will be fixed), but we've
decided to
defer the fix for this issue to a 2.0.1 bug fix release.

Howard



2016-07-12 13:51 GMT-06:00 Eric Chamberland <
eric.chamberl...@giref.ulaval.ca>:

> Hi Edgard,
>
> I just saw that your patch got into ompi/master... any chances it goes
> into ompi-release/v2.x before rc5?
>
> thanks,
>
> Eric
>
>
> On 08/07/16 03:14 PM, Edgar Gabriel wrote:
>
>> I think I found the problem, I filed a pr towards master, and if that
>> passes I will file a pr for the 2.x branch.
>>
>> Thanks!
>> Edgar
>>
>>
>> On 7/8/2016 1:14 PM, Eric Chamberland wrote:
>>
>>>
>>> On 08/07/16 01:44 PM, Edgar Gabriel wrote:
>>>
 ok, but just to be able to construct a test case, basically what you are
 doing is

 MPI_File_write_all_begin (fh, NULL, 0, some datatype);

 MPI_File_write_all_end (fh, NULL, &status),

 is this correct?

>>> Yes, but with 2 processes:
>>>
>>> rank 0 writes something, but not rank 1...
>>>
>>> other info: rank 0 didn't wait for rank1 after MPI_File_write_all_end so
>>> it continued to the next MPI_File_write_all_begin with a different
>>> datatype but on the same file...
>>>
>>> thanks!
>>>
>>> Eric
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2016/07/19173.php
>>>
>>
>> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2016/07/19192.php
>


Re: [OMPI devel] MCA_SPML_CALL call in compiled objects

2016-07-13 Thread Jeff Squyres (jsquyres)
Thanks Ben.  Rainer Keller just filed a PR for this -- we'll get it in v2.0.1:

https://github.com/open-mpi/ompi/pull/1867


> On Jul 12, 2016, at 12:08 AM, Ben Menadue  wrote:
> 
> Hi,
> 
> Looks like there's a #include missing from
> oshmem/shmem/fortran/shmem_put_nb_f.c. It's causing MCA_SPML_CALL to show up
> as an undefined symbol, even though it's a macro (among other things). The
> #include is in shmem_get_nb_f.c but not ..._put_...
> 
> Patch against master (0e433ea):
> 
> $ git diff
> diff --git a/oshmem/shmem/fortran/shmem_put_nb_f.c
> b/oshmem/shmem/fortran/shmem_put_nb_f.c
> index 3acff9c..acfb22d 100644
> --- a/oshmem/shmem/fortran/shmem_put_nb_f.c
> +++ b/oshmem/shmem/fortran/shmem_put_nb_f.c
> @@ -13,6 +13,7 @@
> #include "oshmem/include/shmem.h"
> #include "oshmem/shmem/shmem_api_logger.h"
> #include "oshmem/runtime/runtime.h"
> +#include "oshmem/mca/spml/spml.h"
> #include "ompi/datatype/ompi_datatype.h"
> #include "stdio.h"
> 
> 
> Cheers,
> Ben
> 
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/07/19177.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] SHMEM, "mpp/shmem.fh", CMake and infinite loops

2016-07-13 Thread Jeff Squyres (jsquyres)
Thanks for the report.  I don't know much about OSHMEM, but I'm guessing the 
files were laid out that way for a reason (e.g., maybe the OSHMEM spec calls 
for both of those files to exist?).

I've filed an issue here to track it:

https://github.com/open-mpi/ompi/issues/1868

Additionally, have you reported this issue upstream to cmake?  It seems like 
this is actually a bug in cmake and should be fixed.



> On Jul 13, 2016, at 7:30 AM, Paul Kapinos  wrote:
> 
> Hi Gilles,
> 
> On 07/13/16 01:10, Gilles Gouaillardet wrote:
>> Paul,
>> 
>> The two header files in include/mpp simply include the file with the same 
>> name
>> in the upper directory.
> 
> Yessir!
> (and CMake do not care about the upper directory and build infinite loop)
> 
> 
>> A simple workaround is to replace these two files in include/mpp with 
>> symbolic
>> links to files with the same name in the upper directory.
>> 
>> Would you mind giving this a try ?
> 
> It work very well, at least for the one test case provided. So yes, patching 
> any installation of Open MPI could be a workaround. However we would really 
> love to avoid this need to patch any Open MPI installation
> 
> Maybe OpenMPI's developer could think about how-to minimize the probability 
> of such loops? Symlink is one alternative, another one would be renaming one 
> of the headers..
> we fully trust to Open MPI's developers expertise in this :-)
> 
> Have a nice day,
> 
> Paul Kapinos
> 
> 
> pk224850@linuxc2:/opt/MPI/openmpi-1.8.1/linux/intel/include[519]$ ls -la 
> mpp/shmem.fh
> lrwxrwxrwx 1 pk224850 pk224850 11 Jul 13 13:20 mpp/shmem.fh -> ../shmem.fh
> 
>> 
>> Cheers,
>> 
>> Gilles
>> 
>> On Wednesday, July 13, 2016, Paul Kapinos > > wrote:
>> 
>>Dear OpenMPI developer,
>> 
>>we have some troubles when using OpenMPI and CMake on codes using 'SHMEM'.
>> 
>>Cf. 'man shmem_swap',
>> >   Fortran:
>> >   INCLUDE "mpp/shmem.fh"
>> 
>>Yes here is one such header file:
>> > openmpi-1.X.Y/oshmem/include/mpp/shmem.fh
>>... since version 1.7. at least.
>> 
>> 
>>The significnat content is this line:
>> >  include 'shmem.fh'
>>whereby OpenMPI mean to include not the same file by itself (= infinite
>>loop!) but I believe these one file:
>> > openmpi-1.X.Y/oshmem/include/shmem.fh
>> 
>>(The above paths are in the source code distributions; in the installation
>>the files are located here:  include/shmem.fh  include/mpp/shmem.fh)
>> 
>> 
>>This works. Unless you start using CMake. Because CMake is 'intelligent' 
>> and
>>try to add the search paths recursively, (I believe,) gloriously enabling
>>the infinite loop by including the 'shmem.fh' file from the 'shmem.fh' 
>> file.
>> 
>>Steps to repriduce:
>>$ mkdir build; cd build; cmake ..
>>$ make
>> 
>>The second one command need some minute(s), sticking by the 'Scanning
>>dependencies of target mpihelloworld' step.
>> 
>>If connecting by 'strace -p ' to the 'cmake' process you will see 
>> lines
>>like below, again and again. So I think CMake just include the 'shmem.fh'
>>file from itself unless the stack is full / a limit is reached / the moon
>>shines, and thus hangs for a while (seconds/minutes) in the 'Scanning
>>dependencies...' state.
>> 
>>*Well, maybe having a file including the same file is not that good?*
>>If the file 'include/mpp/shmem.fh' would include not 'shmem.fh' but
>>'somethingelse.fh' located in 'include/...' these infinite loop would be
>>impossible at all...
>> 
>>And by the way: is here a way to limit the maximum include depths in CMake
>>for header files? This would workaround this one 'infinite include loop'
>>issue...
>> 
>>Have a nice day,
>> 
>>Paul Kapinos
>> 
>>..
>>
>> access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
>>  R_OK)
>>= 0
>>
>> stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
>>{st_mode=S_IFREG|0644, st_size=205, ...}) = 0
>>
>> open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
>>O_RDONLY) = 5271
>>fstat(5271, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0
>>mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
>> =
>>0x7f08457d2000
>>read(5271, "!\n!   Copyright (c) 2013  Me"..., 32768) = 205
>>read(5271, "", 32768)   = 0
>> 
>>
>> access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
>>  R_OK)
>>= 0
>>
>> stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
>>{st_mode=S_IFREG|0644, st_size=205, ...}) = 0
>>
>> open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
>>O_RDONLY) = 5272
>>fstat(5272, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0
>>mmap(NULL, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS

[OMPI devel] Option to switch from shmem to kmem module

2016-07-13 Thread Abhishek Joshi
Hi,
Is there any option in mpirun which enables us to switch dynamically from
shmem mode to  knem/xpmem mode beyond a specifiable message size?

This is because, according to my tests, knem performs better than shmem
only at large message sizes.

-- 
Abhishek


Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Eric Chamberland

Hi Howard,

ok, I will wait for 2.0.1rcX... ;)

I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and 
our code from the git repos.


Now I am in a somewhat uncomfortable situation where neither the 
ompi-release.git or ompi.git repos are working for me.


The first gives me the errors with MPI_File_write_all_end I reported, 
but the former gives me errors like these:


[lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in 
file ess_singleton_module.c at line 167

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[lorien:106919] Local abort before MPI_INIT completed completed 
successfully, but am not able to aggregate error messages, and not able 
to guarantee that all other processes were killed!


So, for my continuous integration of OpenMPI I am in a no man's land... :(

Thanks anyway for the follow-up!

Eric

On 13/07/16 07:49 AM, Howard Pritchard wrote:

Hi Eric,

Thanks very much for finding this problem.   We decided in order to have
a reasonably timely
release, that we'd triage issues and turn around a new RC if something
drastic
appeared.  We want to fix this issue (and it will be fixed), but we've
decided to
defer the fix for this issue to a 2.0.1 bug fix release.

Howard



2016-07-12 13:51 GMT-06:00 Eric Chamberland
mailto:eric.chamberl...@giref.ulaval.ca>>:

Hi Edgard,

I just saw that your patch got into ompi/master... any chances it
goes into ompi-release/v2.x before rc5?

thanks,

Eric


On 08/07/16 03:14 PM, Edgar Gabriel wrote:

I think I found the problem, I filed a pr towards master, and if
that
passes I will file a pr for the 2.x branch.

Thanks!
Edgar


On 7/8/2016 1:14 PM, Eric Chamberland wrote:


On 08/07/16 01:44 PM, Edgar Gabriel wrote:

ok, but just to be able to construct a test case,
basically what you are
doing is

MPI_File_write_all_begin (fh, NULL, 0, some datatype);

MPI_File_write_all_end (fh, NULL, &status),

is this correct?

Yes, but with 2 processes:

rank 0 writes something, but not rank 1...

other info: rank 0 didn't wait for rank1 after
MPI_File_write_all_end so
it continued to the next MPI_File_write_all_begin with a
different
datatype but on the same file...

thanks!

Eric
___
devel mailing list
de...@open-mpi.org 
Subscription:
https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19173.php


___
devel mailing list
de...@open-mpi.org 
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19192.php




Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Ralph Castain
Hmmm…I see where the singleton on master might be broken - will check later 
today

> On Jul 13, 2016, at 11:37 AM, Eric Chamberland 
>  wrote:
> 
> Hi Howard,
> 
> ok, I will wait for 2.0.1rcX... ;)
> 
> I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and our 
> code from the git repos.
> 
> Now I am in a somewhat uncomfortable situation where neither the 
> ompi-release.git or ompi.git repos are working for me.
> 
> The first gives me the errors with MPI_File_write_all_end I reported, but the 
> former gives me errors like these:
> 
> [lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
> ess_singleton_module.c at line 167
> *** An error occurred in MPI_Init_thread
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [lorien:106919] Local abort before MPI_INIT completed completed successfully, 
> but am not able to aggregate error messages, and not able to guarantee that 
> all other processes were killed!
> 
> So, for my continuous integration of OpenMPI I am in a no man's land... :(
> 
> Thanks anyway for the follow-up!
> 
> Eric
> 
> On 13/07/16 07:49 AM, Howard Pritchard wrote:
>> Hi Eric,
>> 
>> Thanks very much for finding this problem.   We decided in order to have
>> a reasonably timely
>> release, that we'd triage issues and turn around a new RC if something
>> drastic
>> appeared.  We want to fix this issue (and it will be fixed), but we've
>> decided to
>> defer the fix for this issue to a 2.0.1 bug fix release.
>> 
>> Howard
>> 
>> 
>> 
>> 2016-07-12 13:51 GMT-06:00 Eric Chamberland
>> > > >>:
>> 
>>Hi Edgard,
>> 
>>I just saw that your patch got into ompi/master... any chances it
>>goes into ompi-release/v2.x before rc5?
>> 
>>thanks,
>> 
>>Eric
>> 
>> 
>>On 08/07/16 03:14 PM, Edgar Gabriel wrote:
>> 
>>I think I found the problem, I filed a pr towards master, and if
>>that
>>passes I will file a pr for the 2.x branch.
>> 
>>Thanks!
>>Edgar
>> 
>> 
>>On 7/8/2016 1:14 PM, Eric Chamberland wrote:
>> 
>> 
>>On 08/07/16 01:44 PM, Edgar Gabriel wrote:
>> 
>>ok, but just to be able to construct a test case,
>>basically what you are
>>doing is
>> 
>>MPI_File_write_all_begin (fh, NULL, 0, some datatype);
>> 
>>MPI_File_write_all_end (fh, NULL, &status),
>> 
>>is this correct?
>> 
>>Yes, but with 2 processes:
>> 
>>rank 0 writes something, but not rank 1...
>> 
>>other info: rank 0 didn't wait for rank1 after
>>MPI_File_write_all_end so
>>it continued to the next MPI_File_write_all_begin with a
>>different
>>datatype but on the same file...
>> 
>>thanks!
>> 
>>Eric
>>___
>>devel mailing list
>>de...@open-mpi.org  
>> >
>>Subscription:
>>https://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> 
>>Link to this post:
>>http://www.open-mpi.org/community/lists/devel/2016/07/19173.php 
>> 
>> 
>> 
>>___
>>devel mailing list
>>de...@open-mpi.org  > >
>>Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> 
>>Link to this post:
>>http://www.open-mpi.org/community/lists/devel/2016/07/19192.php 
>> 
>> 
>> 
> ___
> devel mailing list
> de...@open-mpi.org 
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel 
> 
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/07/19201.php 
> 


Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Jeff Squyres (jsquyres)
I literally just noticed that this morning (that singleton was broken on 
master), but hadn't gotten to bisecting / reporting it yet...

I also haven't tested 2.0.0.  I really hope singletons aren't broken then...

/me goes to test 2.0.0...

Whew -- 2.0.0 singletons are fine.  :-)


> On Jul 13, 2016, at 3:01 PM, Ralph Castain  wrote:
> 
> Hmmm…I see where the singleton on master might be broken - will check later 
> today
> 
>> On Jul 13, 2016, at 11:37 AM, Eric Chamberland 
>>  wrote:
>> 
>> Hi Howard,
>> 
>> ok, I will wait for 2.0.1rcX... ;)
>> 
>> I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and our 
>> code from the git repos.
>> 
>> Now I am in a somewhat uncomfortable situation where neither the 
>> ompi-release.git or ompi.git repos are working for me.
>> 
>> The first gives me the errors with MPI_File_write_all_end I reported, but 
>> the former gives me errors like these:
>> 
>> [lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
>> ess_singleton_module.c at line 167
>> *** An error occurred in MPI_Init_thread
>> *** on a NULL communicator
>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>> ***and potentially your MPI job)
>> [lorien:106919] Local abort before MPI_INIT completed completed 
>> successfully, but am not able to aggregate error messages, and not able to 
>> guarantee that all other processes were killed!
>> 
>> So, for my continuous integration of OpenMPI I am in a no man's land... :(
>> 
>> Thanks anyway for the follow-up!
>> 
>> Eric
>> 
>> On 13/07/16 07:49 AM, Howard Pritchard wrote:
>>> Hi Eric,
>>> 
>>> Thanks very much for finding this problem.   We decided in order to have
>>> a reasonably timely
>>> release, that we'd triage issues and turn around a new RC if something
>>> drastic
>>> appeared.  We want to fix this issue (and it will be fixed), but we've
>>> decided to
>>> defer the fix for this issue to a 2.0.1 bug fix release.
>>> 
>>> Howard
>>> 
>>> 
>>> 
>>> 2016-07-12 13:51 GMT-06:00 Eric Chamberland
>>> >> >:
>>> 
>>>Hi Edgard,
>>> 
>>>I just saw that your patch got into ompi/master... any chances it
>>>goes into ompi-release/v2.x before rc5?
>>> 
>>>thanks,
>>> 
>>>Eric
>>> 
>>> 
>>>On 08/07/16 03:14 PM, Edgar Gabriel wrote:
>>> 
>>>I think I found the problem, I filed a pr towards master, and if
>>>that
>>>passes I will file a pr for the 2.x branch.
>>> 
>>>Thanks!
>>>Edgar
>>> 
>>> 
>>>On 7/8/2016 1:14 PM, Eric Chamberland wrote:
>>> 
>>> 
>>>On 08/07/16 01:44 PM, Edgar Gabriel wrote:
>>> 
>>>ok, but just to be able to construct a test case,
>>>basically what you are
>>>doing is
>>> 
>>>MPI_File_write_all_begin (fh, NULL, 0, some datatype);
>>> 
>>>MPI_File_write_all_end (fh, NULL, &status),
>>> 
>>>is this correct?
>>> 
>>>Yes, but with 2 processes:
>>> 
>>>rank 0 writes something, but not rank 1...
>>> 
>>>other info: rank 0 didn't wait for rank1 after
>>>MPI_File_write_all_end so
>>>it continued to the next MPI_File_write_all_begin with a
>>>different
>>>datatype but on the same file...
>>> 
>>>thanks!
>>> 
>>>Eric
>>>___
>>>devel mailing list
>>>de...@open-mpi.org 
>>>Subscription:
>>>https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>Link to this post:
>>>http://www.open-mpi.org/community/lists/devel/2016/07/19173.php
>>> 
>>> 
>>>___
>>>devel mailing list
>>>de...@open-mpi.org 
>>>Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>Link to this post:
>>>http://www.open-mpi.org/community/lists/devel/2016/07/19192.php
>>> 
>>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/07/19201.php
> 
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2016/07/19202.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/



Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Pritchard Jr., Howard
Jeff,

I think this was fixed in PR 1227 on v2.x

Howard

-- 
Howard Pritchard

HPC-DES
Los Alamos National Laboratory





On 7/13/16, 1:47 PM, "devel on behalf of Jeff Squyres (jsquyres)"
 wrote:

>I literally just noticed that this morning (that singleton was broken on
>master), but hadn't gotten to bisecting / reporting it yet...
>
>I also haven't tested 2.0.0.  I really hope singletons aren't broken
>then...
>
>/me goes to test 2.0.0...
>
>Whew -- 2.0.0 singletons are fine.  :-)
>
>
>> On Jul 13, 2016, at 3:01 PM, Ralph Castain  wrote:
>> 
>> HmmmŠI see where the singleton on master might be broken - will check
>>later today
>> 
>>> On Jul 13, 2016, at 11:37 AM, Eric Chamberland
>>> wrote:
>>> 
>>> Hi Howard,
>>> 
>>> ok, I will wait for 2.0.1rcX... ;)
>>> 
>>> I've put in place a script to download/compile OpenMPI+PETSc(3.7.2)
>>>and our code from the git repos.
>>> 
>>> Now I am in a somewhat uncomfortable situation where neither the
>>>ompi-release.git or ompi.git repos are working for me.
>>> 
>>> The first gives me the errors with MPI_File_write_all_end I reported,
>>>but the former gives me errors like these:
>>> 
>>> [lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in
>>>file ess_singleton_module.c at line 167
>>> *** An error occurred in MPI_Init_thread
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>>>abort,
>>> ***and potentially your MPI job)
>>> [lorien:106919] Local abort before MPI_INIT completed completed
>>>successfully, but am not able to aggregate error messages, and not able
>>>to guarantee that all other processes were killed!
>>> 
>>> So, for my continuous integration of OpenMPI I am in a no man's
>>>land... :(
>>> 
>>> Thanks anyway for the follow-up!
>>> 
>>> Eric
>>> 
>>> On 13/07/16 07:49 AM, Howard Pritchard wrote:
 Hi Eric,
 
 Thanks very much for finding this problem.   We decided in order to
have
 a reasonably timely
 release, that we'd triage issues and turn around a new RC if something
 drastic
 appeared.  We want to fix this issue (and it will be fixed), but we've
 decided to
 defer the fix for this issue to a 2.0.1 bug fix release.
 
 Howard
 
 
 
 2016-07-12 13:51 GMT-06:00 Eric Chamberland
 >>> >:
 
Hi Edgard,
 
I just saw that your patch got into ompi/master... any chances it
goes into ompi-release/v2.x before rc5?
 
thanks,
 
Eric
 
 
On 08/07/16 03:14 PM, Edgar Gabriel wrote:
 
I think I found the problem, I filed a pr towards master, and
if
that
passes I will file a pr for the 2.x branch.
 
Thanks!
Edgar
 
 
On 7/8/2016 1:14 PM, Eric Chamberland wrote:
 
 
On 08/07/16 01:44 PM, Edgar Gabriel wrote:
 
ok, but just to be able to construct a test case,
basically what you are
doing is
 
MPI_File_write_all_begin (fh, NULL, 0, some datatype);
 
MPI_File_write_all_end (fh, NULL, &status),
 
is this correct?
 
Yes, but with 2 processes:
 
rank 0 writes something, but not rank 1...
 
other info: rank 0 didn't wait for rank1 after
MPI_File_write_all_end so
it continued to the next MPI_File_write_all_begin with a
different
datatype but on the same file...
 
thanks!
 
Eric
___
devel mailing list
de...@open-mpi.org 
Subscription:
https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:

http://www.open-mpi.org/community/lists/devel/2016/07/19173.php
 
 
___
devel mailing list
de...@open-mpi.org 
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19192.php
 
 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>>http://www.open-mpi.org/community/lists/devel/2016/07/19201.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>>http://www.open-mpi.org/community/lists/devel/2016/07/19202.php
>
>
>-- 
>Jeff Squyres
>jsquy...@cisco

Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Eric Chamberland

Hi,

FYI: I've tested the SHA e28951e

From git clone launched around 01h19:

http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.13.01h19m30s_config.log

Eric

On 13/07/16 04:01 PM, Pritchard Jr., Howard wrote:

Jeff,

I think this was fixed in PR 1227 on v2.x

Howard



Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Gilles Gouaillardet

Eric,


OpenMPI 2.0.0 has been released, so the fix should land into the v2.x 
branch shortly.


If i understand correctly, you script download/compile OpenMPI and then 
download/compile PETSc.


In this is correct, and for the time being, feel free to patch Open MPI 
v2.x before compiling it, the fix can be


downloaded ad 
https://patch-diff.githubusercontent.com/raw/open-mpi/ompi-release/pull/1263.patch



Cheers,


Gilles


On 7/14/2016 3:37 AM, Eric Chamberland wrote:

Hi Howard,

ok, I will wait for 2.0.1rcX... ;)

I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) 
and our code from the git repos.


Now I am in a somewhat uncomfortable situation where neither the 
ompi-release.git or ompi.git repos are working for me.


The first gives me the errors with MPI_File_write_all_end I reported, 
but the former gives me errors like these:


[lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in 
file ess_singleton_module.c at line 167

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[lorien:106919] Local abort before MPI_INIT completed completed 
successfully, but am not able to aggregate error messages, and not 
able to guarantee that all other processes were killed!


So, for my continuous integration of OpenMPI I am in a no man's 
land... :(


Thanks anyway for the follow-up!

Eric

On 13/07/16 07:49 AM, Howard Pritchard wrote:

Hi Eric,

Thanks very much for finding this problem.   We decided in order to have
a reasonably timely
release, that we'd triage issues and turn around a new RC if something
drastic
appeared.  We want to fix this issue (and it will be fixed), but we've
decided to
defer the fix for this issue to a 2.0.1 bug fix release.

Howard



2016-07-12 13:51 GMT-06:00 Eric Chamberland
mailto:eric.chamberl...@giref.ulaval.ca>>:

Hi Edgard,

I just saw that your patch got into ompi/master... any chances it
goes into ompi-release/v2.x before rc5?

thanks,

Eric


On 08/07/16 03:14 PM, Edgar Gabriel wrote:

I think I found the problem, I filed a pr towards master, and if
that
passes I will file a pr for the 2.x branch.

Thanks!
Edgar


On 7/8/2016 1:14 PM, Eric Chamberland wrote:


On 08/07/16 01:44 PM, Edgar Gabriel wrote:

ok, but just to be able to construct a test case,
basically what you are
doing is

MPI_File_write_all_begin (fh, NULL, 0, some datatype);

MPI_File_write_all_end (fh, NULL, &status),

is this correct?

Yes, but with 2 processes:

rank 0 writes something, but not rank 1...

other info: rank 0 didn't wait for rank1 after
MPI_File_write_all_end so
it continued to the next MPI_File_write_all_begin with a
different
datatype but on the same file...

thanks!

Eric
___
devel mailing list
de...@open-mpi.org 
Subscription:
https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19173.php


___
devel mailing list
de...@open-mpi.org 
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19192.php



___
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/07/19201.php






Re: [OMPI devel] SHMEM, "mpp/shmem.fh", CMake and infinite loops

2016-07-13 Thread Gilles Gouaillardet

Paul,


thanks for testing the workaround

/* i was on a trip and could not do it myself */


At first glance, i agree with Jeff and the root cause seems to be a 
CMake bug.


/* i cannot find any rationale for automatically including some 
directories that were not requested by the user */



note that even if you could limit the recursion depth with an 
appropriate CMake option, that would some the infinite loop,


but dependencies would be incorrect (e.g. shmem.fh from the upper 
directory would be missing)



That being said, i am more pragmatic that dogmatic, so i am fine 
updating OpenMPI so it avoids a CMake bug,


let's follow up at https://github.com/open-mpi/ompi/issues/1868


Cheers,


Gilles


On 7/13/2016 8:30 PM, Paul Kapinos wrote:

Hi Gilles,

On 07/13/16 01:10, Gilles Gouaillardet wrote:

Paul,

The two header files in include/mpp simply include the file with the 
same name

in the upper directory.


Yessir!
(and CMake do not care about the upper directory and build infinite loop)


A simple workaround is to replace these two files in include/mpp with 
symbolic

links to files with the same name in the upper directory.

Would you mind giving this a try ?


It work very well, at least for the one test case provided. So yes, 
patching any installation of Open MPI could be a workaround. However 
we would really love to avoid this need to patch any Open MPI 
installation


Maybe OpenMPI's developer could think about how-to minimize the 
probability of such loops? Symlink is one alternative, another one 
would be renaming one of the headers..

we fully trust to Open MPI's developers expertise in this :-)

Have a nice day,

Paul Kapinos


pk224850@linuxc2:/opt/MPI/openmpi-1.8.1/linux/intel/include[519]$ ls 
-la mpp/shmem.fh
lrwxrwxrwx 1 pk224850 pk224850 11 Jul 13 13:20 mpp/shmem.fh -> 
../shmem.fh




Cheers,

Gilles

On Wednesday, July 13, 2016, Paul Kapinos mailto:kapi...@itc.rwth-aachen.de>> wrote:

Dear OpenMPI developer,

we have some troubles when using OpenMPI and CMake on codes using 
'SHMEM'.


Cf. 'man shmem_swap',
 >   Fortran:
 >   INCLUDE "mpp/shmem.fh"

Yes here is one such header file:
 > openmpi-1.X.Y/oshmem/include/mpp/shmem.fh
... since version 1.7. at least.


The significnat content is this line:
 >  include 'shmem.fh'
whereby OpenMPI mean to include not the same file by itself (= 
infinite

loop!) but I believe these one file:
 > openmpi-1.X.Y/oshmem/include/shmem.fh

(The above paths are in the source code distributions; in the 
installation

the files are located here:  include/shmem.fh include/mpp/shmem.fh)


This works. Unless you start using CMake. Because CMake is 
'intelligent' and
try to add the search paths recursively, (I believe,) gloriously 
enabling
the infinite loop by including the 'shmem.fh' file from the 
'shmem.fh' file.


Steps to repriduce:
$ mkdir build; cd build; cmake ..
$ make

The second one command need some minute(s), sticking by the 
'Scanning

dependencies of target mpihelloworld' step.

If connecting by 'strace -p ' to the 'cmake' process you 
will see lines
like below, again and again. So I think CMake just include the 
'shmem.fh'
file from itself unless the stack is full / a limit is reached / 
the moon
shines, and thus hangs for a while (seconds/minutes) in the 
'Scanning

dependencies...' state.

*Well, maybe having a file including the same file is not that 
good?*

If the file 'include/mpp/shmem.fh' would include not 'shmem.fh' but
'somethingelse.fh' located in 'include/...' these infinite loop 
would be

impossible at all...

And by the way: is here a way to limit the maximum include depths 
in CMake
for header files? This would workaround this one 'infinite 
include loop'

issue...

Have a nice day,

Paul Kapinos

..
access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", 
R_OK)

= 0
stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
{st_mode=S_IFREG|0644, st_size=205, ...}) = 0
open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
O_RDONLY) = 5271
fstat(5271, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x7f08457d2000
read(5271, "!\n!   Copyright (c) 2013  Me"..., 32768) = 205
read(5271, "", 32768)   = 0

access("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh", 
R_OK)

= 0
stat("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
{st_mode=S_IFREG|0644, st_size=205, ...}) = 0
open("/opt/MPI/openmpi-1.10.2/linux/intel_16.0.2.181/include/mpp/shmem.fh",
O_RDONLY) = 5272
fstat(5272, {st_mode=S_IFREG|0644, st_size=205, ...}) = 0
mmap(NULL, 32768, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =

0x7f

Re: [OMPI devel] 2.0.0rc4 Crash in MPI_File_write_all_end

2016-07-13 Thread Ralph Castain
Fixed on master

> On Jul 13, 2016, at 12:47 PM, Jeff Squyres (jsquyres)  
> wrote:
> 
> I literally just noticed that this morning (that singleton was broken on 
> master), but hadn't gotten to bisecting / reporting it yet...
> 
> I also haven't tested 2.0.0.  I really hope singletons aren't broken then...
> 
> /me goes to test 2.0.0...
> 
> Whew -- 2.0.0 singletons are fine.  :-)
> 
> 
>> On Jul 13, 2016, at 3:01 PM, Ralph Castain  wrote:
>> 
>> Hmmm…I see where the singleton on master might be broken - will check later 
>> today
>> 
>>> On Jul 13, 2016, at 11:37 AM, Eric Chamberland 
>>>  wrote:
>>> 
>>> Hi Howard,
>>> 
>>> ok, I will wait for 2.0.1rcX... ;)
>>> 
>>> I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and our 
>>> code from the git repos.
>>> 
>>> Now I am in a somewhat uncomfortable situation where neither the 
>>> ompi-release.git or ompi.git repos are working for me.
>>> 
>>> The first gives me the errors with MPI_File_write_all_end I reported, but 
>>> the former gives me errors like these:
>>> 
>>> [lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file 
>>> ess_singleton_module.c at line 167
>>> *** An error occurred in MPI_Init_thread
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***and potentially your MPI job)
>>> [lorien:106919] Local abort before MPI_INIT completed completed 
>>> successfully, but am not able to aggregate error messages, and not able to 
>>> guarantee that all other processes were killed!
>>> 
>>> So, for my continuous integration of OpenMPI I am in a no man's land... :(
>>> 
>>> Thanks anyway for the follow-up!
>>> 
>>> Eric
>>> 
>>> On 13/07/16 07:49 AM, Howard Pritchard wrote:
 Hi Eric,
 
 Thanks very much for finding this problem.   We decided in order to have
 a reasonably timely
 release, that we'd triage issues and turn around a new RC if something
 drastic
 appeared.  We want to fix this issue (and it will be fixed), but we've
 decided to
 defer the fix for this issue to a 2.0.1 bug fix release.
 
 Howard
 
 
 
 2016-07-12 13:51 GMT-06:00 Eric Chamberland
 >>> >:
 
  Hi Edgard,
 
  I just saw that your patch got into ompi/master... any chances it
  goes into ompi-release/v2.x before rc5?
 
  thanks,
 
  Eric
 
 
  On 08/07/16 03:14 PM, Edgar Gabriel wrote:
 
  I think I found the problem, I filed a pr towards master, and if
  that
  passes I will file a pr for the 2.x branch.
 
  Thanks!
  Edgar
 
 
  On 7/8/2016 1:14 PM, Eric Chamberland wrote:
 
 
  On 08/07/16 01:44 PM, Edgar Gabriel wrote:
 
  ok, but just to be able to construct a test case,
  basically what you are
  doing is
 
  MPI_File_write_all_begin (fh, NULL, 0, some datatype);
 
  MPI_File_write_all_end (fh, NULL, &status),
 
  is this correct?
 
  Yes, but with 2 processes:
 
  rank 0 writes something, but not rank 1...
 
  other info: rank 0 didn't wait for rank1 after
  MPI_File_write_all_end so
  it continued to the next MPI_File_write_all_begin with a
  different
  datatype but on the same file...
 
  thanks!
 
  Eric
  ___
  devel mailing list
  de...@open-mpi.org 
  Subscription:
  https://www.open-mpi.org/mailman/listinfo.cgi/devel
  Link to this post:
  http://www.open-mpi.org/community/lists/devel/2016/07/19173.php
 
 
  ___
  devel mailing list
  de...@open-mpi.org 
  Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
  Link to this post:
  http://www.open-mpi.org/community/lists/devel/2016/07/19192.php
 
 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/devel/2016/07/19201.php
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2016/07/19202.php
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> ___
> devel mailing lis