Re: [OMPI devel] failure with zero-length Reduce() and both sbuf=rbuf=NULL
BUMP. See http://code.google.com/p/mpi4py/issues/detail?id=14 On 12 December 2009 00:31, Lisandro Dalcin wrote: > On Thu, Dec 10, 2009 at 4:26 PM, George Bosilca wrote: >> Lisandro, >> >> This code is not correct from the MPI standard perspective. The reason is >> independent of the datatype or count, it is solely related to the fact that >> the MPI_Reduce cannot accept a sendbuf equal to the recvbuf (or one has to >> use MPI_IN_PLACE). >> > > George, I have to disagree. Zero-length buffers are a very special > case, and the MPI std is not very explicit about this limit case. Try > the code pasted at the end. > > 1) In Open MPI, the only one of these failing for sbuf=rbuf=NULL is > MPI_Reduce() > > 2) As reference, all the calls succeed in MPICH2. > > > > #include > #include > > int main( int argc, char ** argv ) { > int ierr; > MPI_Init(&argc, &argv); > ierr = MPI_Scan( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > MPI_COMM_WORLD); > ierr = MPI_Exscan( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > MPI_COMM_WORLD); > ierr = MPI_Allreduce( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > MPI_COMM_WORLD); > #if 1 > ierr = MPI_Reduce( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > 0, > MPI_COMM_WORLD); > #endif > MPI_Finalize(); > return 0; > } > > > > -- > Lisandro Dalcín > --- > Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) > Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) > Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) > PTLC - Güemes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
I'm sure someone will object to a name, but the logic looks fine to me On Feb 9, 2010, at 6:35 AM, Jeff Squyres wrote: > On Feb 9, 2010, at 4:34 AM, Ralph Castain wrote: > >>> While we're at it, why not call the option giving MPI_THREAD_MULTIPLE >>> support --enable-thread-multiple ? >> >> Makes sense to me. I agree with Brian that we need three options here. > > Ok, how about these: > > --enable-opal-progress-threads: enables progress thread machinery in opal > > --enable-opal-multi-thread: enables multi threaded machinery in opal >or perhaps --enable-opal-threads ? > > --enable-mpi-thread-multiple: enables the use of MPI_THREAD_MULTIPLE; > affects only the MPI layer >directly implies --enable-opal-multi-thread > > Deprecated options > --enable-mpi-threads: deprecated synonym for --enable-mpi-thread-multiple > --enable-progress-threads: deprecated synonym for > --enable-opal-progress-threads > > -- > Jeff Squyres > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [patch] return value not updated in ompi_mpi_init()
Oops - yep, that is an oversight! Will fix - thanks! On Feb 9, 2010, at 7:13 AM, Guillaume Thouvenin wrote: > Hello, > > It seems that a return value is not updated during the setup of > process affinity in function ompi_mpi_init() > ompi/runtime/ompi_mpi_init.c:459 > > The problem is in the following piece of code: > >[... here ret == OPAL_SUCCESS ...] >phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank); >if (0 > phys_cpu) { >error = "Could not get physical processor id - cannot set processor > affinity"; >goto error; >} >[...] > > If opal_paffinity_base_get_physical_processor_id() failed ret is not > updated and we will reach the "error:" label while ret == OPAL_SUCCESS. > > As a result MPI_Init() will return without having initialized the > MPI_COMM_WORLD struct leading to a segmentation fault on calls like > MPI_Comm_size(). > > I got the bug recently with new westmere processors for which the > function opal_paffinity_base_get_physical_processor_id() failed if we > are using the mca parameter "opal_paffinity_alone 1" during the > execution. > > I'm not sure that it's the right way to fix the problem but here is a > patch tested with v1.5. This patch allows to report the problem instead > of generating a segmentation fault. > > With the patch, the output is: > > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > Could not get physical processor id - cannot set processor affinity > --> Returned "Not found" (-5) instead of "Success" (0) > -- > > Without the patch, the output was: > > *** Process received signal *** > Signal: Segmentation fault (11) > Signal code: Address not mapped (1) > Failing at address: 0x10 > [ 0] /lib64/libpthread.so.0 [0x3d4e20ee90] > [ 1] /home_nfs/thouveng/dev/openmpi-v1.5/lib/libmpi.so.0(MPI_Comm_size+0x9c) > [0x7fce74468dfc] > [ 2] ./IMB-MPI1(IMB_init_pointers+0x2f) [0x40629f] > [ 3] ./IMB-MPI1(main+0x65) [0x4035c5] > [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d4da1ea2d] > [ 5] ./IMB-MPI1 [0x403499] > > > Regards, > Guillaume > > --- > diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c > --- a/ompi/runtime/ompi_mpi_init.c > +++ b/ompi/runtime/ompi_mpi_init.c > @@ -459,6 +459,7 @@ int ompi_mpi_init(int argc, char **argv, > OPAL_PAFFINITY_CPU_ZERO(mask); > phys_cpu = > opal_paffinity_base_get_physical_processor_id(nrank); > if (0 > phys_cpu) { > +ret = phys_cpu; > error = "Could not get physical processor id - cannot set > processor affinity"; > goto error; > } > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] [patch] return value not updated in ompi_mpi_init()
Hello, It seems that a return value is not updated during the setup of process affinity in function ompi_mpi_init() ompi/runtime/ompi_mpi_init.c:459 The problem is in the following piece of code: [... here ret == OPAL_SUCCESS ...] phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank); if (0 > phys_cpu) { error = "Could not get physical processor id - cannot set processor affinity"; goto error; } [...] If opal_paffinity_base_get_physical_processor_id() failed ret is not updated and we will reach the "error:" label while ret == OPAL_SUCCESS. As a result MPI_Init() will return without having initialized the MPI_COMM_WORLD struct leading to a segmentation fault on calls like MPI_Comm_size(). I got the bug recently with new westmere processors for which the function opal_paffinity_base_get_physical_processor_id() failed if we are using the mca parameter "opal_paffinity_alone 1" during the execution. I'm not sure that it's the right way to fix the problem but here is a patch tested with v1.5. This patch allows to report the problem instead of generating a segmentation fault. With the patch, the output is: -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): Could not get physical processor id - cannot set processor affinity --> Returned "Not found" (-5) instead of "Success" (0) -- Without the patch, the output was: *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0x10 [ 0] /lib64/libpthread.so.0 [0x3d4e20ee90] [ 1] /home_nfs/thouveng/dev/openmpi-v1.5/lib/libmpi.so.0(MPI_Comm_size+0x9c) [0x7fce74468dfc] [ 2] ./IMB-MPI1(IMB_init_pointers+0x2f) [0x40629f] [ 3] ./IMB-MPI1(main+0x65) [0x4035c5] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d4da1ea2d] [ 5] ./IMB-MPI1 [0x403499] Regards, Guillaume --- diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c --- a/ompi/runtime/ompi_mpi_init.c +++ b/ompi/runtime/ompi_mpi_init.c @@ -459,6 +459,7 @@ int ompi_mpi_init(int argc, char **argv, OPAL_PAFFINITY_CPU_ZERO(mask); phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank); if (0 > phys_cpu) { +ret = phys_cpu; error = "Could not get physical processor id - cannot set processor affinity"; goto error; }
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
On Feb 9, 2010, at 4:34 AM, Ralph Castain wrote: > > While we're at it, why not call the option giving MPI_THREAD_MULTIPLE > > support --enable-thread-multiple ? > > Makes sense to me. I agree with Brian that we need three options here. Ok, how about these: --enable-opal-progress-threads: enables progress thread machinery in opal --enable-opal-multi-thread: enables multi threaded machinery in opal or perhaps --enable-opal-threads ? --enable-mpi-thread-multiple: enables the use of MPI_THREAD_MULTIPLE; affects only the MPI layer directly implies --enable-opal-multi-thread Deprecated options --enable-mpi-threads: deprecated synonym for --enable-mpi-thread-multiple --enable-progress-threads: deprecated synonym for --enable-opal-progress-threads -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
On Feb 9, 2010, at 1:44 AM, Sylvain Jeaugey wrote: > While we're at it, why not call the option giving MPI_THREAD_MULTIPLE support > --enable-thread-multiple ? Makes sense to me. I agree with Brian that we need three options here. > > About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may force > the usage of --enable-thread-safety to configure OPAL and/or ORTE. It definitely will, but I don't see that as an issue. > > I know there are other projects using ORTE and OPAL, but the vast majority of > users are still using OMPI and were already confused by --enable-mpi-threads. > Switching to --enable-multi-threads or --enable-thread-safety will surely > confuse them one more time. > Just to clarify: this actually isn't about other projects. Jeff misspoke, IMO. The problem is in OMPI as it may be necessary/advantageous for ORTE to have threads for proper mpirun and orted operation even though application processes don't use them. Ralph > Sylvain > > On Mon, 8 Feb 2010, Barrett, Brian W wrote: > >> Well, does --disable-multi-threads disable progress threads? And do you >> want to disable thread support in ORTE because you don't want >> MPI_THREAD_MULTIPLE? Perhaps a third option is a rational way to go? >> >> Brain >> >> On Feb 8, 2010, at 6:54 PM, Jeff Squyres wrote: >> >>> How about >>> >>> --enable-mpi-threads ==> --enable-multi-threads >>> ENABLE_MPI_THREADS ==>ENABLE_MULTI_THREADS >>> >>> Essentially, s/mpi/multi/ig. This gives us "progress thread" support and >>> "multi thread" support. Similar, but different. >>> >>> Another possibility instead of "mpi" could be "concurrent". >>> >>> >>> >>> On Jan 28, 2010, at 9:24 PM, Barrett, Brian W wrote: >>> Jeff - I think the idea is ok, but I think the name needs some thought. There's currently two ways to have the lower layers be thread safe -- enabling MPI threads or progress threads. The two can be done independently -- you can disable MPI threads and still enable thread support by enabling progress threads. So either that behavior would need to change or we need a better name (in my opinion...). Brian On Jan 28, 2010, at 8:53 PM, Jeff Squyres wrote: > WHAT: Rename --enable-mpi-threads and ENABLE_MPI_THREADS to > --enable-thread-safety and ENABLE_THREAD_SAFETY, respectively > (--enable-mpi-threads will be maintained as a synonym to > --enable-thread-safety for backwards compat, of course). > > WHY: Other projects are starting to use ORTE and OPAL without OMPI. The > fact that thread safety in OPAL and ORTE requires a configure switch with > "mpi" in the name is very non-intuitive. > > WHERE: A bunch of places in the code; see attached patch. > > WHEN: Next Friday COB > > TIMEOUT: COB, Friday, Feb 5, 2010 > > > > More details: > > Cisco is starting to investigate using ORTE and OPAL in various threading > scenarios -- without the OMPI layer. The fact that you need to enable > thread safety in ORTE/OPAL with a configure switch that has the word > "mpi" in it is extremely counter-intuitive (it bit some of our engineers > very badly, and they were mighty annoyed!!). > > Since this functionality actually has nothing to do with MPI (it's > actually the other way around -- MPI_THREAD_MULTIPLE needs this > functionality), we really should rename this switch and the corresponding > AC_DEFINE -- I suggest: > > --enable|disable-thread-safety > ENABLE_THREAD_SAFETY > > Of course, we need to keep the configure switch > "--enable|disable-mpi-threads" for backwards compatibility, so that can > be a synonym to --enable-thread-safety. > > See the attached patch (the biggest change is in > opal/config/opal_config_threads.m4). If there are no objections, I'll > commit this next Friday COB. > > -- > Jeff Squyres > jsquy...@cisco.com > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia National
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
While we're at it, why not call the option giving MPI_THREAD_MULTIPLE support --enable-thread-multiple ? About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may force the usage of --enable-thread-safety to configure OPAL and/or ORTE. I know there are other projects using ORTE and OPAL, but the vast majority of users are still using OMPI and were already confused by --enable-mpi-threads. Switching to --enable-multi-threads or --enable-thread-safety will surely confuse them one more time. Sylvain On Mon, 8 Feb 2010, Barrett, Brian W wrote: Well, does --disable-multi-threads disable progress threads? And do you want to disable thread support in ORTE because you don't want MPI_THREAD_MULTIPLE? Perhaps a third option is a rational way to go? Brain On Feb 8, 2010, at 6:54 PM, Jeff Squyres wrote: How about --enable-mpi-threads ==> --enable-multi-threads ENABLE_MPI_THREADS ==>ENABLE_MULTI_THREADS Essentially, s/mpi/multi/ig. This gives us "progress thread" support and "multi thread" support. Similar, but different. Another possibility instead of "mpi" could be "concurrent". On Jan 28, 2010, at 9:24 PM, Barrett, Brian W wrote: Jeff - I think the idea is ok, but I think the name needs some thought. There's currently two ways to have the lower layers be thread safe -- enabling MPI threads or progress threads. The two can be done independently -- you can disable MPI threads and still enable thread support by enabling progress threads. So either that behavior would need to change or we need a better name (in my opinion...). Brian On Jan 28, 2010, at 8:53 PM, Jeff Squyres wrote: WHAT: Rename --enable-mpi-threads and ENABLE_MPI_THREADS to --enable-thread-safety and ENABLE_THREAD_SAFETY, respectively (--enable-mpi-threads will be maintained as a synonym to --enable-thread-safety for backwards compat, of course). WHY: Other projects are starting to use ORTE and OPAL without OMPI. The fact that thread safety in OPAL and ORTE requires a configure switch with "mpi" in the name is very non-intuitive. WHERE: A bunch of places in the code; see attached patch. WHEN: Next Friday COB TIMEOUT: COB, Friday, Feb 5, 2010 More details: Cisco is starting to investigate using ORTE and OPAL in various threading scenarios -- without the OMPI layer. The fact that you need to enable thread safety in ORTE/OPAL with a configure switch that has the word "mpi" in it is extremely counter-intuitive (it bit some of our engineers very badly, and they were mighty annoyed!!). Since this functionality actually has nothing to do with MPI (it's actually the other way around -- MPI_THREAD_MULTIPLE needs this functionality), we really should rename this switch and the corresponding AC_DEFINE -- I suggest: --enable|disable-thread-safety ENABLE_THREAD_SAFETY Of course, we need to keep the configure switch "--enable|disable-mpi-threads" for backwards compatibility, so that can be a synonym to --enable-thread-safety. See the attached patch (the biggest change is in opal/config/opal_config_threads.m4). If there are no objections, I'll commit this next Friday COB. -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel