This looks like a new error -- something is potentially going wrong in MPI_Request_free (or perhaps the underlying progress invocation invoked by MPI_Request_free).
I think cloning at that time and running tests is absolutely fine. We tend to track our bugs in Github issues, so if you'd like to file future issues there, that would likely save a step. I filed an issue for this one: https://github.com/open-mpi/ompi/issues/1875 > On Jul 14, 2016, at 9:47 AM, Eric Chamberland > <eric.chamberl...@giref.ulaval.ca> wrote: > > Thanks Ralph, > > It is now *much* better: all sequential executions are working... ;) > but I still have issues with a lot of parallel tests... (but not all) > > The SHA tested last night was c3c262b. > > http://www.giref.ulaval.ca/~cmpgiref/dernier_ompi/2016.07.14.01h20m32s_config.log > > Here is what is the backtrace for most of these issues: > > *** Error in > `/pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt': > free(): invalid pointer: 0x00007f9ab09c6020 *** > ======= Backtrace: ========= > /lib64/libc.so.6(+0x7277f)[0x7f9ab019b77f] > /lib64/libc.so.6(+0x78026)[0x7f9ab01a1026] > /lib64/libc.so.6(+0x78d53)[0x7f9ab01a1d53] > /opt/openmpi-2.x_opt/lib/openmpi/mca_pml_ob1.so(+0x172a1)[0x7f9aa3df32a1] > /opt/openmpi-2.x_opt/lib/libmpi.so.0(MPI_Request_free+0x4c)[0x7f9ab0761dac] > /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4adaf9)[0x7f9ab7fa2af9] > /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecScatterDestroy+0x68d)[0x7f9ab7f9dc35] > /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(+0x4574e7)[0x7f9ab7f4c4e7] > /opt/petsc-3.7.2_debug_openmpi_2.x/lib/libpetsc.so.3.7(VecDestroy+0x648)[0x7f9ab7ef28ca] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_Z15GIREFVecDestroyRP6_p_Vec+0xe)[0x7f9abc9746de] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN12VecteurPETScD1Ev+0x31)[0x7f9abca8bfa1] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN10SolveurGCPD2Ev+0x20c)[0x7f9abc9a013c] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Petsc.so(_ZN10SolveurGCPD0Ev+0x9)[0x7f9abc9a01f9] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/lib/libgiref_opt_Formulation.so(_ZN10ProblemeGDD2Ev+0x42)[0x7f9abeeb94e2] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt[0x4159b9] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9ab014ab25] > /pmi/cmpbib/compilation_BIB_dernier_ompi/COMPILE_AUTO/GIREF/bin/Test.ProblemeGD.opt[0x4084dc] > > The very same code ans tests are all working well with openmpi-1.{8.4,10.2} > and the same version of PETSc... > > And the segfault with MPI_File_write_all_end seems gone... Thanks to Edgar! :) > > Btw, I am wondering when I should report a bug or not, since I am "blindly" > cloning around 01h20 am each day, independently of the "status" of the > master... I don't want to bother anyone on this list with annoying bug > reports... So tell me what you would like please... > > Thanks, > > Eric > > > On 13/07/16 08:36 PM, Ralph Castain wrote: >> Fixed on master >> >>> On Jul 13, 2016, at 12:47 PM, Jeff Squyres (jsquyres) <jsquy...@cisco.com> >>> wrote: >>> >>> I literally just noticed that this morning (that singleton was broken on >>> master), but hadn't gotten to bisecting / reporting it yet... >>> >>> I also haven't tested 2.0.0. I really hope singletons aren't broken then... >>> >>> /me goes to test 2.0.0... >>> >>> Whew -- 2.0.0 singletons are fine. :-) >>> >>> >>>> On Jul 13, 2016, at 3:01 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>> >>>> Hmmm…I see where the singleton on master might be broken - will check >>>> later today >>>> >>>>> On Jul 13, 2016, at 11:37 AM, Eric Chamberland >>>>> <eric.chamberl...@giref.ulaval.ca> wrote: >>>>> >>>>> Hi Howard, >>>>> >>>>> ok, I will wait for 2.0.1rcX... ;) >>>>> >>>>> I've put in place a script to download/compile OpenMPI+PETSc(3.7.2) and >>>>> our code from the git repos. >>>>> >>>>> Now I am in a somewhat uncomfortable situation where neither the >>>>> ompi-release.git or ompi.git repos are working for me. >>>>> >>>>> The first gives me the errors with MPI_File_write_all_end I reported, but >>>>> the former gives me errors like these: >>>>> >>>>> [lorien:106919] [[INVALID],INVALID] ORTE_ERROR_LOG: Bad parameter in file >>>>> ess_singleton_module.c at line 167 >>>>> *** An error occurred in MPI_Init_thread >>>>> *** on a NULL communicator >>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>>>> *** and potentially your MPI job) >>>>> [lorien:106919] Local abort before MPI_INIT completed completed >>>>> successfully, but am not able to aggregate error messages, and not able >>>>> to guarantee that all other processes were killed! >>>>> >>>>> So, for my continuous integration of OpenMPI I am in a no man's land... :( >>>>> >>>>> Thanks anyway for the follow-up! >>>>> >>>>> Eric >>>>> >>>>> On 13/07/16 07:49 AM, Howard Pritchard wrote: >>>>>> Hi Eric, >>>>>> >>>>>> Thanks very much for finding this problem. We decided in order to have >>>>>> a reasonably timely >>>>>> release, that we'd triage issues and turn around a new RC if something >>>>>> drastic >>>>>> appeared. We want to fix this issue (and it will be fixed), but we've >>>>>> decided to >>>>>> defer the fix for this issue to a 2.0.1 bug fix release. >>>>>> >>>>>> Howard >>>>>> >>>>>> >>>>>> >>>>>> 2016-07-12 13:51 GMT-06:00 Eric Chamberland >>>>>> <eric.chamberl...@giref.ulaval.ca >>>>>> <mailto:eric.chamberl...@giref.ulaval.ca>>: >>>>>> >>>>>> Hi Edgard, >>>>>> >>>>>> I just saw that your patch got into ompi/master... any chances it >>>>>> goes into ompi-release/v2.x before rc5? >>>>>> >>>>>> thanks, >>>>>> >>>>>> Eric >>>>>> >>>>>> >>>>>> On 08/07/16 03:14 PM, Edgar Gabriel wrote: >>>>>> >>>>>> I think I found the problem, I filed a pr towards master, and if >>>>>> that >>>>>> passes I will file a pr for the 2.x branch. >>>>>> >>>>>> Thanks! >>>>>> Edgar >>>>>> >>>>>> >>>>>> On 7/8/2016 1:14 PM, Eric Chamberland wrote: >>>>>> >>>>>> >>>>>> On 08/07/16 01:44 PM, Edgar Gabriel wrote: >>>>>> >>>>>> ok, but just to be able to construct a test case, >>>>>> basically what you are >>>>>> doing is >>>>>> >>>>>> MPI_File_write_all_begin (fh, NULL, 0, some datatype); >>>>>> >>>>>> MPI_File_write_all_end (fh, NULL, &status), >>>>>> >>>>>> is this correct? >>>>>> >>>>>> Yes, but with 2 processes: >>>>>> >>>>>> rank 0 writes something, but not rank 1... >>>>>> >>>>>> other info: rank 0 didn't wait for rank1 after >>>>>> MPI_File_write_all_end so >>>>>> it continued to the next MPI_File_write_all_begin with a >>>>>> different >>>>>> datatype but on the same file... >>>>>> >>>>>> thanks! >>>>>> >>>>>> Eric >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>>>> Subscription: >>>>>> https://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2016/07/19173.php >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org <mailto:de...@open-mpi.org> >>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/devel/2016/07/19192.php >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/devel/2016/07/19201.php >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/devel/2016/07/19202.php >>> >>> >>> -- >>> Jeff Squyres >>> jsquy...@cisco.com >>> For corporate legal information go to: >>> http://www.cisco.com/web/about/doing_business/legal/cri/ >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >>> Link to this post: >>> http://www.open-mpi.org/community/lists/devel/2016/07/19203.php >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel >> Link to this post: >> http://www.open-mpi.org/community/lists/devel/2016/07/19208.php >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/07/19210.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/