Re: [OMPI devel] failure with zero-length Reduce() and both sbuf=rbuf=NULL

2010-02-09 Thread Lisandro Dalcín
BUMP. See http://code.google.com/p/mpi4py/issues/detail?id=14


On 12 December 2009 00:31, Lisandro Dalcin  wrote:
> On Thu, Dec 10, 2009 at 4:26 PM, George Bosilca  wrote:
>> Lisandro,
>>
>> This code is not correct from the MPI standard perspective. The reason is 
>> independent of the datatype or count, it is solely related to the fact that 
>> the MPI_Reduce cannot accept a sendbuf equal to the recvbuf (or one has to 
>> use MPI_IN_PLACE).
>>
>
> George, I have to disagree. Zero-length buffers are a very special
> case, and the MPI std is not very explicit about this limit case. Try
> the code pasted at the end.
>
> 1) In Open MPI, the only one of these failing for sbuf=rbuf=NULL is 
> MPI_Reduce()
>
> 2) As reference, all the calls succeed in MPICH2.
>
>
>
> #include 
> #include 
>
> int main( int argc, char ** argv ) {
>  int ierr;
>  MPI_Init(&argc, &argv);
>  ierr = MPI_Scan(
>                  NULL, NULL,
>                  0,
>                  MPI_INT,
>                  MPI_SUM,
>                  MPI_COMM_WORLD);
>  ierr = MPI_Exscan(
>                    NULL, NULL,
>                    0,
>                    MPI_INT,
>                    MPI_SUM,
>                    MPI_COMM_WORLD);
>  ierr = MPI_Allreduce(
>                       NULL, NULL,
>                       0,
>                       MPI_INT,
>                       MPI_SUM,
>                       MPI_COMM_WORLD);
> #if 1
>  ierr = MPI_Reduce(
>                    NULL, NULL,
>                    0,
>                    MPI_INT,
>                    MPI_SUM,
>                    0,
>                    MPI_COMM_WORLD);
> #endif
>  MPI_Finalize();
>  return 0;
> }
>
>
>
> --
> Lisandro Dalcín
> ---
> Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
> Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
> Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
> PTLC - Güemes 3450, (3000) Santa Fe, Argentina
> Tel/Fax: +54-(0)342-451.1594
>



-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594



Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g

2010-02-09 Thread Ralph Castain
I'm sure someone will object to a name, but the logic looks fine to me


On Feb 9, 2010, at 6:35 AM, Jeff Squyres wrote:

> On Feb 9, 2010, at 4:34 AM, Ralph Castain wrote:
> 
>>> While we're at it, why not call the option giving MPI_THREAD_MULTIPLE 
>>> support --enable-thread-multiple ?
>> 
>> Makes sense to me. I agree with Brian that we need three options here.
> 
> Ok, how about these:
> 
>  --enable-opal-progress-threads: enables progress thread machinery in opal
> 
>  --enable-opal-multi-thread: enables multi threaded machinery in opal
>or perhaps --enable-opal-threads ?
> 
>  --enable-mpi-thread-multiple: enables the use of MPI_THREAD_MULTIPLE; 
> affects only the MPI layer
>directly implies --enable-opal-multi-thread
> 
>  Deprecated options
>  --enable-mpi-threads: deprecated synonym for --enable-mpi-thread-multiple
>  --enable-progress-threads: deprecated synonym for 
> --enable-opal-progress-threads
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] [patch] return value not updated in ompi_mpi_init()

2010-02-09 Thread Ralph Castain
Oops - yep, that is an oversight! Will fix - thanks!

On Feb 9, 2010, at 7:13 AM, Guillaume Thouvenin wrote:

> Hello,
> 
> It seems that a return value is not updated during the setup of
> process affinity in function ompi_mpi_init()
> ompi/runtime/ompi_mpi_init.c:459
> 
> The problem is in the following piece of code:
> 
>[... here ret == OPAL_SUCCESS ...]
>phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank);
>if (0 > phys_cpu) {
>error = "Could not get physical processor id - cannot set processor 
> affinity";
>goto error;
>}
>[...]
> 
> If opal_paffinity_base_get_physical_processor_id() failed ret is not
> updated and we will reach the "error:" label while ret == OPAL_SUCCESS.
> 
> As a result MPI_Init() will return without having initialized the
> MPI_COMM_WORLD struct leading to a segmentation fault on calls like
> MPI_Comm_size().
> 
> I got the bug recently with new westmere processors for which the
> function opal_paffinity_base_get_physical_processor_id() failed if we
> are using the mca parameter "opal_paffinity_alone 1" during the
> execution.
> 
> I'm not sure that it's the right way to fix the problem but here is a
> patch tested with v1.5. This patch allows to report the problem instead
> of generating a segmentation fault.
> 
> With the patch, the output is:
> 
> --
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>  Could not get physical processor id - cannot set processor affinity
>  --> Returned "Not found" (-5) instead of "Success" (0)
> --
> 
> Without the patch, the output was:
> 
> *** Process received signal ***
> Signal: Segmentation fault (11)
> Signal code: Address not mapped (1)
> Failing at address: 0x10
> [ 0] /lib64/libpthread.so.0 [0x3d4e20ee90]
> [ 1] /home_nfs/thouveng/dev/openmpi-v1.5/lib/libmpi.so.0(MPI_Comm_size+0x9c) 
> [0x7fce74468dfc]
> [ 2] ./IMB-MPI1(IMB_init_pointers+0x2f) [0x40629f]
> [ 3] ./IMB-MPI1(main+0x65) [0x4035c5]
> [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d4da1ea2d]
> [ 5] ./IMB-MPI1 [0x403499]
> 
> 
> Regards,
> Guillaume
> 
> ---
> diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
> --- a/ompi/runtime/ompi_mpi_init.c
> +++ b/ompi/runtime/ompi_mpi_init.c
> @@ -459,6 +459,7 @@ int ompi_mpi_init(int argc, char **argv,
> OPAL_PAFFINITY_CPU_ZERO(mask);
> phys_cpu = 
> opal_paffinity_base_get_physical_processor_id(nrank);
> if (0 > phys_cpu) {
> +ret = phys_cpu;
> error = "Could not get physical processor id - cannot set 
> processor affinity";
> goto error;
> }
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel




[OMPI devel] [patch] return value not updated in ompi_mpi_init()

2010-02-09 Thread Guillaume Thouvenin
Hello,

 It seems that a return value is not updated during the setup of
process affinity in function ompi_mpi_init()
ompi/runtime/ompi_mpi_init.c:459

 The problem is in the following piece of code:

[... here ret == OPAL_SUCCESS ...]
phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank);
if (0 > phys_cpu) {
error = "Could not get physical processor id - cannot set processor 
affinity";
goto error;
}
[...]

 If opal_paffinity_base_get_physical_processor_id() failed ret is not
updated and we will reach the "error:" label while ret == OPAL_SUCCESS.

 As a result MPI_Init() will return without having initialized the
MPI_COMM_WORLD struct leading to a segmentation fault on calls like
MPI_Comm_size().

 I got the bug recently with new westmere processors for which the
function opal_paffinity_base_get_physical_processor_id() failed if we
are using the mca parameter "opal_paffinity_alone 1" during the
execution.

 I'm not sure that it's the right way to fix the problem but here is a
patch tested with v1.5. This patch allows to report the problem instead
of generating a segmentation fault.

With the patch, the output is:

--
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems.  This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  Could not get physical processor id - cannot set processor affinity
  --> Returned "Not found" (-5) instead of "Success" (0)
--

Without the patch, the output was:

 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code: Address not mapped (1)
 Failing at address: 0x10
[ 0] /lib64/libpthread.so.0 [0x3d4e20ee90]
[ 1] /home_nfs/thouveng/dev/openmpi-v1.5/lib/libmpi.so.0(MPI_Comm_size+0x9c) 
[0x7fce74468dfc]
[ 2] ./IMB-MPI1(IMB_init_pointers+0x2f) [0x40629f]
[ 3] ./IMB-MPI1(main+0x65) [0x4035c5]
[ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d4da1ea2d]
[ 5] ./IMB-MPI1 [0x403499]


Regards,
Guillaume

---
diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c
--- a/ompi/runtime/ompi_mpi_init.c
+++ b/ompi/runtime/ompi_mpi_init.c
@@ -459,6 +459,7 @@ int ompi_mpi_init(int argc, char **argv,
 OPAL_PAFFINITY_CPU_ZERO(mask);
 phys_cpu = 
opal_paffinity_base_get_physical_processor_id(nrank);
 if (0 > phys_cpu) {
+ret = phys_cpu;
 error = "Could not get physical processor id - cannot set 
processor affinity";
 goto error;
 }


Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g

2010-02-09 Thread Jeff Squyres
On Feb 9, 2010, at 4:34 AM, Ralph Castain wrote:

> > While we're at it, why not call the option giving MPI_THREAD_MULTIPLE 
> > support --enable-thread-multiple ?
> 
> Makes sense to me. I agree with Brian that we need three options here.

Ok, how about these:

  --enable-opal-progress-threads: enables progress thread machinery in opal

  --enable-opal-multi-thread: enables multi threaded machinery in opal
or perhaps --enable-opal-threads ?

  --enable-mpi-thread-multiple: enables the use of MPI_THREAD_MULTIPLE; affects 
only the MPI layer
directly implies --enable-opal-multi-thread

  Deprecated options
  --enable-mpi-threads: deprecated synonym for --enable-mpi-thread-multiple
  --enable-progress-threads: deprecated synonym for 
--enable-opal-progress-threads

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g

2010-02-09 Thread Ralph Castain

On Feb 9, 2010, at 1:44 AM, Sylvain Jeaugey wrote:

> While we're at it, why not call the option giving MPI_THREAD_MULTIPLE support 
> --enable-thread-multiple ?

Makes sense to me. I agree with Brian that we need three options here.

> 
> About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may force 
> the usage of --enable-thread-safety to configure OPAL and/or ORTE.

It definitely will, but I don't see that as an issue.

> 
> I know there are other projects using ORTE and OPAL, but the vast majority of 
> users are still using OMPI and were already confused by --enable-mpi-threads. 
> Switching to --enable-multi-threads or --enable-thread-safety will surely 
> confuse them one more time.
> 

Just to clarify: this actually isn't about other projects. Jeff misspoke, IMO. 
The problem is in OMPI as it may be necessary/advantageous for ORTE to have 
threads for proper mpirun and orted operation even though application processes 
don't use them.

Ralph


> Sylvain
> 
> On Mon, 8 Feb 2010, Barrett, Brian W wrote:
> 
>> Well, does --disable-multi-threads disable progress threads?  And do you 
>> want to disable thread support in ORTE because you don't want 
>> MPI_THREAD_MULTIPLE?  Perhaps a third option is a rational way to go?
>> 
>> Brain
>> 
>> On Feb 8, 2010, at 6:54 PM, Jeff Squyres wrote:
>> 
>>> How about
>>> 
>>> --enable-mpi-threads  ==>  --enable-multi-threads
>>>   ENABLE_MPI_THREADS  ==>ENABLE_MULTI_THREADS
>>> 
>>> Essentially, s/mpi/multi/ig.  This gives us "progress thread" support and 
>>> "multi thread" support.  Similar, but different.
>>> 
>>> Another possibility instead of "mpi" could be "concurrent".
>>> 
>>> 
>>> 
>>> On Jan 28, 2010, at 9:24 PM, Barrett, Brian W wrote:
>>> 
 Jeff -
 
 I think the idea is ok, but I think the name needs some thought.  There's 
 currently two ways to have the lower layers be thread safe -- enabling MPI 
 threads or progress threads.  The two can be done independently -- you can 
 disable MPI threads and still enable thread support by enabling progress 
 threads.
 
 So either that behavior would need to change or we need a better name (in 
 my opinion...).
 
 Brian
 
 On Jan 28, 2010, at 8:53 PM, Jeff Squyres wrote:
 
> WHAT: Rename --enable-mpi-threads and ENABLE_MPI_THREADS to 
> --enable-thread-safety and ENABLE_THREAD_SAFETY, respectively 
> (--enable-mpi-threads will be maintained as a synonym to 
> --enable-thread-safety for backwards compat, of course).
> 
> WHY: Other projects are starting to use ORTE and OPAL without OMPI.  The 
> fact that thread safety in OPAL and ORTE requires a configure switch with 
> "mpi" in the name is very non-intuitive.
> 
> WHERE: A bunch of places in the code; see attached patch.
> 
> WHEN: Next Friday COB
> 
> TIMEOUT: COB, Friday, Feb 5, 2010
> 
> 
> 
> More details:
> 
> Cisco is starting to investigate using ORTE and OPAL in various threading 
> scenarios -- without the OMPI layer.  The fact that you need to enable 
> thread safety in ORTE/OPAL with a configure switch that has the word 
> "mpi" in it is extremely counter-intuitive (it bit some of our engineers 
> very badly, and they were mighty annoyed!!).
> 
> Since this functionality actually has nothing to do with MPI (it's 
> actually the other way around -- MPI_THREAD_MULTIPLE needs this 
> functionality), we really should rename this switch and the corresponding 
> AC_DEFINE -- I suggest:
> 
> --enable|disable-thread-safety
> ENABLE_THREAD_SAFETY
> 
> Of course, we need to keep the configure switch 
> "--enable|disable-mpi-threads" for backwards compatibility, so that can 
> be a synonym to --enable-thread-safety.
> 
> See the attached patch (the biggest change is in 
> opal/config/opal_config_threads.m4).  If there are no objections, I'll 
> commit this next Friday COB.
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 --
 Brian W. Barrett
 Dept. 1423: Scalable System Software
 Sandia National Laboratories
 
 
 
 
 
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
>>> 
>>> 
>>> --
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> --
>> Brian W. Barrett
>> Dept. 1423: Scalable System Software
>> Sandia National

Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g

2010-02-09 Thread Sylvain Jeaugey
While we're at it, why not call the option giving MPI_THREAD_MULTIPLE 
support --enable-thread-multiple ?


About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may 
force the usage of --enable-thread-safety to configure OPAL and/or ORTE.


I know there are other projects using ORTE and OPAL, but the vast majority 
of users are still using OMPI and were already confused by 
--enable-mpi-threads. Switching to --enable-multi-threads or 
--enable-thread-safety will surely confuse them one more time.


Sylvain

On Mon, 8 Feb 2010, Barrett, Brian W wrote:


Well, does --disable-multi-threads disable progress threads?  And do you want 
to disable thread support in ORTE because you don't want MPI_THREAD_MULTIPLE?  
Perhaps a third option is a rational way to go?

Brain

On Feb 8, 2010, at 6:54 PM, Jeff Squyres wrote:


How about

 --enable-mpi-threads  ==>  --enable-multi-threads
   ENABLE_MPI_THREADS  ==>ENABLE_MULTI_THREADS

Essentially, s/mpi/multi/ig.  This gives us "progress thread" support and "multi 
thread" support.  Similar, but different.

Another possibility instead of "mpi" could be "concurrent".



On Jan 28, 2010, at 9:24 PM, Barrett, Brian W wrote:


Jeff -

I think the idea is ok, but I think the name needs some thought.  There's 
currently two ways to have the lower layers be thread safe -- enabling MPI 
threads or progress threads.  The two can be done independently -- you can 
disable MPI threads and still enable thread support by enabling progress 
threads.

So either that behavior would need to change or we need a better name (in my 
opinion...).

Brian

On Jan 28, 2010, at 8:53 PM, Jeff Squyres wrote:


WHAT: Rename --enable-mpi-threads and ENABLE_MPI_THREADS to 
--enable-thread-safety and ENABLE_THREAD_SAFETY, respectively 
(--enable-mpi-threads will be maintained as a synonym to --enable-thread-safety 
for backwards compat, of course).

WHY: Other projects are starting to use ORTE and OPAL without OMPI.  The fact that thread 
safety in OPAL and ORTE requires a configure switch with "mpi" in the name is 
very non-intuitive.

WHERE: A bunch of places in the code; see attached patch.

WHEN: Next Friday COB

TIMEOUT: COB, Friday, Feb 5, 2010



More details:

Cisco is starting to investigate using ORTE and OPAL in various threading scenarios -- 
without the OMPI layer.  The fact that you need to enable thread safety in ORTE/OPAL with 
a configure switch that has the word "mpi" in it is extremely counter-intuitive 
(it bit some of our engineers very badly, and they were mighty annoyed!!).

Since this functionality actually has nothing to do with MPI (it's actually the 
other way around -- MPI_THREAD_MULTIPLE needs this functionality), we really 
should rename this switch and the corresponding AC_DEFINE -- I suggest:

--enable|disable-thread-safety
ENABLE_THREAD_SAFETY

Of course, we need to keep the configure switch "--enable|disable-mpi-threads" 
for backwards compatibility, so that can be a synonym to --enable-thread-safety.

See the attached patch (the biggest change is in 
opal/config/opal_config_threads.m4).  If there are no objections, I'll commit 
this next Friday COB.

--
Jeff Squyres
jsquy...@cisco.com
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
 Brian W. Barrett
 Dept. 1423: Scalable System Software
 Sandia National Laboratories





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
 Brian W. Barrett
 Dept. 1423: Scalable System Software
 Sandia National Laboratories





___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel