Re: [MTT devel] MTToGDS
On Feb 5, 2010, at 4:56 AM, Igor Ivanov wrote: > Thank you to start playing with one. I hope you find it is useful. > I am trying to answer questions you raised. Thanks! Sorry for the delay in my answering -- got caught up in other stuff... Ugh! > 1. Yes, you are correct. The implementation uses google account authorization > way to access web page only. Client applications use separate approach to > communicate with datastore. > It is difficult to say what way is better from my point of view. In both ways > we need to manage list of valid accounts to answer "is this username/password > combo valid?" (Google does not do this task instead of us) and send > username/password information from a client to application. Visible > preference could exist in case web usage that was not main goal. Gotcha. FWIW, I think it would be (slightly) easier if we don't have to manage users' passwords on the appspot. If the MTT client can just submit using a regular google account username+password, that would be one less thing to have to manage. I guess I'm a little burned out from our current MTT setup where people had to bug me to reset their passwords (in a local .htaccess file) whenever they lost/forgot them. :-) All things being equal, you're right, of course -- a) we still have to maintain a list of google accounts who are allowed to submit/access/whatever, b) we still have to ship off a username/password combo and ask if it's valid. But eliminating that password column from our data, IMHO, represents pushing off all account management to Google. Is it hard to redirect the appspot lookup to use google account names + passwords? > 2. Current implementation of datastore environment is oriented on client > usage way mostly and does not grant users rich web possibility. Existing web > form can be considered as instrument for an administrator now. Gotcha. Someday someone with lots of time can write a glitzy web 2.0 interface. ;-) > There is special command line utility that allows to communicate with > datastore as bquery.pl located at /src/client. It is able to do > query data from datastore and view different information on console using > extended (more close to sql) gql syntax that is implemented for users > comfort. More detail information can be got from document as > http://svn.open-mpi.org/svn/mtt/trunk/docs/gds/VBench_bquery.doc > > For example: > to get information related mpi install following command line can be used > > $ ./bquery.pl --username= --password= > --server=http://.appspot.com > --view --gqls="select description, mpi_path from MpiInstallPhase where > duration=1" --format=txt > > description mpi_path > -- > Voltaire already installed MPI+OMA /opt/openmpi/1.3 > ... Nifty -- I'll go play with this... > 3. In case we can collect all needed information about cluster using > transparent way we should do it. ClusterInfo.pm is attempt to get info in one > place in clear manner. I ask because many of the assumptions in ClusterInfo.pm are not valid for my cluster. > 4. You are right it can be done. If you don't care, and since I'm the one making such an annoying request, I'll be happy to do the work for this one. :-) > 5. Results are cached to keep link information between "test build" ->"mpi > install"; "test run"->"test build" ->"mpi install" phases. Ah -- I see. In the SQL submitter, when we submit each phase, we get an ID back to use for the next linked phase (e.g., the mpi install submit returns an ID that is used with a corresponding test build submit, etc.). Is that not possible here? I.e., can a submit return an ID to be used with the next submit? I ask for two reasons: 1. When running a huge number of tests in MTT (like I do), it is useful to see the results start appearing in the database gradually over time rather than having to wait (potentially) many hours for all the results to appear at once. 2. I actually run OMPI testing in two phases at Cisco: a. (mpi get + mpi install + test get + test build) for ~25 different mpi install sections b. as each one of those finish, launch test run phases for each, with either ~10 or ~25 mpi details variants (depending on the specific mpi install) Specifically, I execute each of my test_run phases separately from all the other phases (because I have lots of them running in parallel for a given mpi install). Hence, the test run phase needs to be able to run long after all the other phase results were submitted. I believe IU and Sun do similar things (although our MTT setups are quite different from each other, I think we have all separated the get/install/get/build stuff from test runs). > 6. Could you send detail info about the issue (ini-file, mtt.log with verbose > info and command line), we will look on that. Let me reproduce and simplify; I was using a fairly complex ini file...
Re: [OMPI devel] failure with zero-length Reduce() and both sbuf=rbuf=NULL
BUMP. See http://code.google.com/p/mpi4py/issues/detail?id=14 On 12 December 2009 00:31, Lisandro Dalcinwrote: > On Thu, Dec 10, 2009 at 4:26 PM, George Bosilca wrote: >> Lisandro, >> >> This code is not correct from the MPI standard perspective. The reason is >> independent of the datatype or count, it is solely related to the fact that >> the MPI_Reduce cannot accept a sendbuf equal to the recvbuf (or one has to >> use MPI_IN_PLACE). >> > > George, I have to disagree. Zero-length buffers are a very special > case, and the MPI std is not very explicit about this limit case. Try > the code pasted at the end. > > 1) In Open MPI, the only one of these failing for sbuf=rbuf=NULL is > MPI_Reduce() > > 2) As reference, all the calls succeed in MPICH2. > > > > #include > #include > > int main( int argc, char ** argv ) { > int ierr; > MPI_Init(, ); > ierr = MPI_Scan( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > MPI_COMM_WORLD); > ierr = MPI_Exscan( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > MPI_COMM_WORLD); > ierr = MPI_Allreduce( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > MPI_COMM_WORLD); > #if 1 > ierr = MPI_Reduce( > NULL, NULL, > 0, > MPI_INT, > MPI_SUM, > 0, > MPI_COMM_WORLD); > #endif > MPI_Finalize(); > return 0; > } > > > > -- > Lisandro Dalcín > --- > Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) > Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) > Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) > PTLC - Güemes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > -- Lisandro Dalcín --- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
I'm sure someone will object to a name, but the logic looks fine to me On Feb 9, 2010, at 6:35 AM, Jeff Squyres wrote: > On Feb 9, 2010, at 4:34 AM, Ralph Castain wrote: > >>> While we're at it, why not call the option giving MPI_THREAD_MULTIPLE >>> support --enable-thread-multiple ? >> >> Makes sense to me. I agree with Brian that we need three options here. > > Ok, how about these: > > --enable-opal-progress-threads: enables progress thread machinery in opal > > --enable-opal-multi-thread: enables multi threaded machinery in opal >or perhaps --enable-opal-threads ? > > --enable-mpi-thread-multiple: enables the use of MPI_THREAD_MULTIPLE; > affects only the MPI layer >directly implies --enable-opal-multi-thread > > Deprecated options > --enable-mpi-threads: deprecated synonym for --enable-mpi-thread-multiple > --enable-progress-threads: deprecated synonym for > --enable-opal-progress-threads > > -- > Jeff Squyres > jsquy...@cisco.com > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] [patch] return value not updated in ompi_mpi_init()
Oops - yep, that is an oversight! Will fix - thanks! On Feb 9, 2010, at 7:13 AM, Guillaume Thouvenin wrote: > Hello, > > It seems that a return value is not updated during the setup of > process affinity in function ompi_mpi_init() > ompi/runtime/ompi_mpi_init.c:459 > > The problem is in the following piece of code: > >[... here ret == OPAL_SUCCESS ...] >phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank); >if (0 > phys_cpu) { >error = "Could not get physical processor id - cannot set processor > affinity"; >goto error; >} >[...] > > If opal_paffinity_base_get_physical_processor_id() failed ret is not > updated and we will reach the "error:" label while ret == OPAL_SUCCESS. > > As a result MPI_Init() will return without having initialized the > MPI_COMM_WORLD struct leading to a segmentation fault on calls like > MPI_Comm_size(). > > I got the bug recently with new westmere processors for which the > function opal_paffinity_base_get_physical_processor_id() failed if we > are using the mca parameter "opal_paffinity_alone 1" during the > execution. > > I'm not sure that it's the right way to fix the problem but here is a > patch tested with v1.5. This patch allows to report the problem instead > of generating a segmentation fault. > > With the patch, the output is: > > -- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > Could not get physical processor id - cannot set processor affinity > --> Returned "Not found" (-5) instead of "Success" (0) > -- > > Without the patch, the output was: > > *** Process received signal *** > Signal: Segmentation fault (11) > Signal code: Address not mapped (1) > Failing at address: 0x10 > [ 0] /lib64/libpthread.so.0 [0x3d4e20ee90] > [ 1] /home_nfs/thouveng/dev/openmpi-v1.5/lib/libmpi.so.0(MPI_Comm_size+0x9c) > [0x7fce74468dfc] > [ 2] ./IMB-MPI1(IMB_init_pointers+0x2f) [0x40629f] > [ 3] ./IMB-MPI1(main+0x65) [0x4035c5] > [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d4da1ea2d] > [ 5] ./IMB-MPI1 [0x403499] > > > Regards, > Guillaume > > --- > diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c > --- a/ompi/runtime/ompi_mpi_init.c > +++ b/ompi/runtime/ompi_mpi_init.c > @@ -459,6 +459,7 @@ int ompi_mpi_init(int argc, char **argv, > OPAL_PAFFINITY_CPU_ZERO(mask); > phys_cpu = > opal_paffinity_base_get_physical_processor_id(nrank); > if (0 > phys_cpu) { > +ret = phys_cpu; > error = "Could not get physical processor id - cannot set > processor affinity"; > goto error; > } > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] [patch] return value not updated in ompi_mpi_init()
Hello, It seems that a return value is not updated during the setup of process affinity in function ompi_mpi_init() ompi/runtime/ompi_mpi_init.c:459 The problem is in the following piece of code: [... here ret == OPAL_SUCCESS ...] phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank); if (0 > phys_cpu) { error = "Could not get physical processor id - cannot set processor affinity"; goto error; } [...] If opal_paffinity_base_get_physical_processor_id() failed ret is not updated and we will reach the "error:" label while ret == OPAL_SUCCESS. As a result MPI_Init() will return without having initialized the MPI_COMM_WORLD struct leading to a segmentation fault on calls like MPI_Comm_size(). I got the bug recently with new westmere processors for which the function opal_paffinity_base_get_physical_processor_id() failed if we are using the mca parameter "opal_paffinity_alone 1" during the execution. I'm not sure that it's the right way to fix the problem but here is a patch tested with v1.5. This patch allows to report the problem instead of generating a segmentation fault. With the patch, the output is: -- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): Could not get physical processor id - cannot set processor affinity --> Returned "Not found" (-5) instead of "Success" (0) -- Without the patch, the output was: *** Process received signal *** Signal: Segmentation fault (11) Signal code: Address not mapped (1) Failing at address: 0x10 [ 0] /lib64/libpthread.so.0 [0x3d4e20ee90] [ 1] /home_nfs/thouveng/dev/openmpi-v1.5/lib/libmpi.so.0(MPI_Comm_size+0x9c) [0x7fce74468dfc] [ 2] ./IMB-MPI1(IMB_init_pointers+0x2f) [0x40629f] [ 3] ./IMB-MPI1(main+0x65) [0x4035c5] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d4da1ea2d] [ 5] ./IMB-MPI1 [0x403499] Regards, Guillaume --- diff --git a/ompi/runtime/ompi_mpi_init.c b/ompi/runtime/ompi_mpi_init.c --- a/ompi/runtime/ompi_mpi_init.c +++ b/ompi/runtime/ompi_mpi_init.c @@ -459,6 +459,7 @@ int ompi_mpi_init(int argc, char **argv, OPAL_PAFFINITY_CPU_ZERO(mask); phys_cpu = opal_paffinity_base_get_physical_processor_id(nrank); if (0 > phys_cpu) { +ret = phys_cpu; error = "Could not get physical processor id - cannot set processor affinity"; goto error; }
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
On Feb 9, 2010, at 4:34 AM, Ralph Castain wrote: > > While we're at it, why not call the option giving MPI_THREAD_MULTIPLE > > support --enable-thread-multiple ? > > Makes sense to me. I agree with Brian that we need three options here. Ok, how about these: --enable-opal-progress-threads: enables progress thread machinery in opal --enable-opal-multi-thread: enables multi threaded machinery in opal or perhaps --enable-opal-threads ? --enable-mpi-thread-multiple: enables the use of MPI_THREAD_MULTIPLE; affects only the MPI layer directly implies --enable-opal-multi-thread Deprecated options --enable-mpi-threads: deprecated synonym for --enable-mpi-thread-multiple --enable-progress-threads: deprecated synonym for --enable-opal-progress-threads -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
On Feb 9, 2010, at 1:44 AM, Sylvain Jeaugey wrote: > While we're at it, why not call the option giving MPI_THREAD_MULTIPLE support > --enable-thread-multiple ? Makes sense to me. I agree with Brian that we need three options here. > > About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may force > the usage of --enable-thread-safety to configure OPAL and/or ORTE. It definitely will, but I don't see that as an issue. > > I know there are other projects using ORTE and OPAL, but the vast majority of > users are still using OMPI and were already confused by --enable-mpi-threads. > Switching to --enable-multi-threads or --enable-thread-safety will surely > confuse them one more time. > Just to clarify: this actually isn't about other projects. Jeff misspoke, IMO. The problem is in OMPI as it may be necessary/advantageous for ORTE to have threads for proper mpirun and orted operation even though application processes don't use them. Ralph > Sylvain > > On Mon, 8 Feb 2010, Barrett, Brian W wrote: > >> Well, does --disable-multi-threads disable progress threads? And do you >> want to disable thread support in ORTE because you don't want >> MPI_THREAD_MULTIPLE? Perhaps a third option is a rational way to go? >> >> Brain >> >> On Feb 8, 2010, at 6:54 PM, Jeff Squyres wrote: >> >>> How about >>> >>> --enable-mpi-threads ==> --enable-multi-threads >>> ENABLE_MPI_THREADS ==>ENABLE_MULTI_THREADS >>> >>> Essentially, s/mpi/multi/ig. This gives us "progress thread" support and >>> "multi thread" support. Similar, but different. >>> >>> Another possibility instead of "mpi" could be "concurrent". >>> >>> >>> >>> On Jan 28, 2010, at 9:24 PM, Barrett, Brian W wrote: >>> Jeff - I think the idea is ok, but I think the name needs some thought. There's currently two ways to have the lower layers be thread safe -- enabling MPI threads or progress threads. The two can be done independently -- you can disable MPI threads and still enable thread support by enabling progress threads. So either that behavior would need to change or we need a better name (in my opinion...). Brian On Jan 28, 2010, at 8:53 PM, Jeff Squyres wrote: > WHAT: Rename --enable-mpi-threads and ENABLE_MPI_THREADS to > --enable-thread-safety and ENABLE_THREAD_SAFETY, respectively > (--enable-mpi-threads will be maintained as a synonym to > --enable-thread-safety for backwards compat, of course). > > WHY: Other projects are starting to use ORTE and OPAL without OMPI. The > fact that thread safety in OPAL and ORTE requires a configure switch with > "mpi" in the name is very non-intuitive. > > WHERE: A bunch of places in the code; see attached patch. > > WHEN: Next Friday COB > > TIMEOUT: COB, Friday, Feb 5, 2010 > > > > More details: > > Cisco is starting to investigate using ORTE and OPAL in various threading > scenarios -- without the OMPI layer. The fact that you need to enable > thread safety in ORTE/OPAL with a configure switch that has the word > "mpi" in it is extremely counter-intuitive (it bit some of our engineers > very badly, and they were mighty annoyed!!). > > Since this functionality actually has nothing to do with MPI (it's > actually the other way around -- MPI_THREAD_MULTIPLE needs this > functionality), we really should rename this switch and the corresponding > AC_DEFINE -- I suggest: > > --enable|disable-thread-safety > ENABLE_THREAD_SAFETY > > Of course, we need to keep the configure switch > "--enable|disable-mpi-threads" for backwards compatibility, so that can > be a synonym to --enable-thread-safety. > > See the attached patch (the biggest change is in > opal/config/opal_config_threads.m4). If there are no objections, I'll > commit this next Friday COB. > > -- > Jeff Squyres > jsquy...@cisco.com > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> -- >> Brian W. Barrett >> Dept. 1423: Scalable System Software >> Sandia
Re: [OMPI devel] RFC: s/ENABLE_MPI_THREADS/ENABLE_THREAD_SAFETY/g
While we're at it, why not call the option giving MPI_THREAD_MULTIPLE support --enable-thread-multiple ? About ORTE and OPAL, if you have --enable-thread-multiple=yes, it may force the usage of --enable-thread-safety to configure OPAL and/or ORTE. I know there are other projects using ORTE and OPAL, but the vast majority of users are still using OMPI and were already confused by --enable-mpi-threads. Switching to --enable-multi-threads or --enable-thread-safety will surely confuse them one more time. Sylvain On Mon, 8 Feb 2010, Barrett, Brian W wrote: Well, does --disable-multi-threads disable progress threads? And do you want to disable thread support in ORTE because you don't want MPI_THREAD_MULTIPLE? Perhaps a third option is a rational way to go? Brain On Feb 8, 2010, at 6:54 PM, Jeff Squyres wrote: How about --enable-mpi-threads ==> --enable-multi-threads ENABLE_MPI_THREADS ==>ENABLE_MULTI_THREADS Essentially, s/mpi/multi/ig. This gives us "progress thread" support and "multi thread" support. Similar, but different. Another possibility instead of "mpi" could be "concurrent". On Jan 28, 2010, at 9:24 PM, Barrett, Brian W wrote: Jeff - I think the idea is ok, but I think the name needs some thought. There's currently two ways to have the lower layers be thread safe -- enabling MPI threads or progress threads. The two can be done independently -- you can disable MPI threads and still enable thread support by enabling progress threads. So either that behavior would need to change or we need a better name (in my opinion...). Brian On Jan 28, 2010, at 8:53 PM, Jeff Squyres wrote: WHAT: Rename --enable-mpi-threads and ENABLE_MPI_THREADS to --enable-thread-safety and ENABLE_THREAD_SAFETY, respectively (--enable-mpi-threads will be maintained as a synonym to --enable-thread-safety for backwards compat, of course). WHY: Other projects are starting to use ORTE and OPAL without OMPI. The fact that thread safety in OPAL and ORTE requires a configure switch with "mpi" in the name is very non-intuitive. WHERE: A bunch of places in the code; see attached patch. WHEN: Next Friday COB TIMEOUT: COB, Friday, Feb 5, 2010 More details: Cisco is starting to investigate using ORTE and OPAL in various threading scenarios -- without the OMPI layer. The fact that you need to enable thread safety in ORTE/OPAL with a configure switch that has the word "mpi" in it is extremely counter-intuitive (it bit some of our engineers very badly, and they were mighty annoyed!!). Since this functionality actually has nothing to do with MPI (it's actually the other way around -- MPI_THREAD_MULTIPLE needs this functionality), we really should rename this switch and the corresponding AC_DEFINE -- I suggest: --enable|disable-thread-safety ENABLE_THREAD_SAFETY Of course, we need to keep the configure switch "--enable|disable-mpi-threads" for backwards compatibility, so that can be a synonym to --enable-thread-safety. See the attached patch (the biggest change is in opal/config/opal_config_threads.m4). If there are no objections, I'll commit this next Friday COB. -- Jeff Squyres jsquy...@cisco.com ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel