>> 2) I can reproduce the src/mat/tests/ex242.c error (which explicitly uses 
>> ScaLAPACK, none of the above PC uses it explicitly, except PCBDDC/PCHPDDM 
>> when using MUMPS on “big” problems where root nodes are factorized using 
>> ScaLAPACK, see -mat_mumps_icntl_13)
>> 3) I’m seeing that both on your machine and mine, PETSc BuildSystem insist 
>> on linking libmkl_blacs_intelmpi_lp64.so even though we supply explicitly 
>> libmkl_blacs_openmpi_lp64.so
>> This for example yields a wrong Makefile.inc for MUMPS:
>> $ cat arch-linux2-c-opt-ompi/externalpackages/MUMPS_5.3.5/Makefile.inc|grep 
>> blacs
>> SCALAP  = […] -lmkl_blacs_openmpi_lp64
>> LIBBLAS = […] -lmkl_blacs_intelmpi_lp64 -lgomp -ldl -lpthread -lm […]
>> 
>> Despite what Barry says, I think PETSc is partially to blame as well (why 
>> use libmkl_blacs_intelmpi_lp64.so even though BuildSystem is capable of 
>> detecting we are using OpenMPI).
>> I’ll try to fix this to see if it solves 2).
> Okay, that's a very nice finding!!!  Hope it will be "fixable" easily!
> 
> 
The knowledge is there but the information may not be trivially available to 
make the right decisions. Parts of the BLAS/LAPACK checks use the "check 
everything" approach. For example

# Look for Multi-Threaded MKL for MKL_C/Pardiso
      useCPardiso=0
      usePardiso=0
      if self.argDB['with-mkl_cpardiso'] or 'with-mkl_cpardiso-dir' in 
self.argDB or 'with-mkl_cpardiso-lib' in self.argDB:
        useCPardiso=1
        
mkl_blacs_64=[['mkl_blacs_intelmpi'+ILP64+''],['mkl_blacs_mpich'+ILP64+''],['mkl_blacs_sgimpt'+ILP64+''],['mkl_blacs_openmpi'+ILP64+'']]
        
mkl_blacs_32=[['mkl_blacs_intelmpi'],['mkl_blacs_mpich'],['mkl_blacs_sgimpt'],['mkl_blacs_openmpi']]
      elif self.argDB['with-mkl_pardiso'] or 'with-mkl_pardiso-dir' in 
self.argDB or 'with-mkl_pardiso-lib' in self.argDB:
        usePardiso=1
        mkl_blacs_64=[[]]
        mkl_blacs_32=[[]]
      if useCPardiso or usePardiso:
        self.logPrintBox('BLASLAPACK: Looking for Multithreaded MKL for 
C/Pardiso')
        for libdir in 
[os.path.join('lib','64'),os.path.join('lib','ia64'),os.path.join('lib','em64t'),os.path.join('lib','intel64'),'lib','64','ia64','em64t','intel64',
                       
os.path.join('lib','32'),os.path.join('lib','ia32'),'32','ia32','']:
          if not os.path.exists(os.path.join(dir,libdir)):
            self.logPrint('MKL Path not found.. skipping: 
'+os.path.join(dir,libdir))
          else:
            self.log.write('Files and directories in that 
directory:\n'+str(os.listdir(os.path.join(dir,libdir)))+'\n')
            #  iomp5 is provided by the Intel compilers on MacOS. Run source 
/opt/intel/bin/compilervars.sh intel64 to have it added to LIBRARY_PATH
            #  then locate libimp5.dylib in the LIBRARY_PATH and copy it to 
os.path.join(dir,libdir)
            for i in mkl_blacs_64:
              yield ('User specified MKL-C/Pardiso Intel-Linux64', None, 
[os.path.join(dir,libdir,'libmkl_intel'+ILP64+'.a'),'mkl_core','mkl_intel_thread']+i+['iomp5','dl','pthread'],known,'yes')
              yield ('User specified MKL-C/Pardiso GNU-Linux64', None, 
[os.path.join(dir,libdir,'libmkl_intel'+ILP64+'.a'),'mkl_core','mkl_gnu_thread']+i+['gomp','dl','pthread'],known,'yes')
              yield ('User specified MKL-Pardiso Intel-Windows64', None, 
[os.path.join(dir,libdir,'mkl_core.lib'),'mkl_intel'+ILP64+'.lib','mkl_intel_thread.lib']+i+['libiomp5md.lib'],known,'yes')
            for i in mkl_blacs_32:
              yield ('User specified MKL-C/Pardiso Intel-Linux32', None, 
[os.path.join(dir,libdir,'libmkl_intel.a'),'mkl_core','mkl_intel_thread']+i+['iomp5','dl','pthread'],'32','yes')
              yield ('User specified MKL-C/Pardiso GNU-Linux32', None, 
[os.path.join(dir,libdir,'libmkl_intel.a'),'mkl_core','mkl_gnu_thread']+i+['gomp','dl','pthread'],'32','yes')
              yield ('User specified MKL-Pardiso Intel-Windows32', None, 
[os.path.join(dir,libdir,'mkl_core.lib'),'mkl_intel_c.lib','mkl_intel_thread.lib']+i+['libiomp5md.lib'],'32','yes')
        return

The assumption is that the link will fail unless the correct libraries are in 
the list. But apparently this is not the case; it returns the first case that 
links but that case does not run which is why it appears to be producing 
"silly" results.

If you set the right MPI and threading library, at these locations instead of 
trying all of them it might resolve the problems. 

    if self.openmp.found:
      ITHREAD='intel_thread'
      ITHREADGNU='gnu_thread'
      ompthread = 'yes'
    else:
      ITHREAD='sequential'
      ITHREADGNU='sequential'
      ompthread = 'no'

        
mkl_blacs_64=[['mkl_blacs_intelmpi'+ILP64+''],['mkl_blacs_mpich'+ILP64+''],['mkl_blacs_sgimpt'+ILP64+''],['mkl_blacs_openmpi'+ILP64+'']]
        
mkl_blacs_32=[['mkl_blacs_intelmpi'],['mkl_blacs_mpich'],['mkl_blacs_sgimpt'],['mkl_blacs_openmpi']]




> 


> On Mar 3, 2021, at 8:22 AM, Eric Chamberland 
> <eric.chamberl...@giref.ulaval.ca> wrote:
> 
> Hi Pierre,
> 
> On 2021-03-03 2:42 a.m., Pierre Jolivet wrote:
>>> If it ends that there is a problem combining MKL + openMP that relies on 
>>> linking configuration for example, should it be a good thing to have this 
>>> (--with-openmp=1) tested into the pipelines (with external packages of 
>>> course)?
>>> 
>> As Barry said, there is not much (if any) OpenMP in PETSc.
>> There is however some workers with the MKL (+ Intel compilers) turned on, 
>> but I don’t think we test MKL + GNU compilers (which I feel like is a very 
>> niche combination, hence not really worth testing, IMHO).
> Ouch, this is my almost my personal working configuration and for most of our 
> users too... and it worked well until I activated the OpenMP thing...
> 
> We had good reasons to work with g++ or clang++ instead of intel compilers:
> 
> - It is mandatory to pay to work with an intel compiler (didn't looked at 
> OneAPI licensing yet, but it may have changed?)
> 
> - No support of Intel compilers with iceccd (slow recompilation)
> 
> - MKL was freely distributed, so it can be used with any compiler
> 
> That doesn't mean we don't want to use intel compiler, but maybe we just want 
> to to a specific delivery with it but continue to develop with g++ or clang++ 
> (my personal choice).
> 
> But I understand it is less straightforward to combine gcc and MKL than using 
> native Intel tool-chain....
> 

    I agree the MKL + GNU compilers is commonly used and should be tested and 
maintained in PETSc. 
>>> Does the guys who maintain all these libs are reading petsc-dev? ;)
>>> 
>> I don’t think they are, but don’t worry, we do forward the appropriate 
>> messages to them :)
> :)
>> 
>> About yesterday’s failures…
>> 1) I cannot reproduce any of the PCHYPRE/PCBDDC/PCHPDDM errors (sorry I 
>> didn’t bother putting the SuperLU_DIST tarball on my cluster)
> Hmmm, maybe my environment variables may play a role into this?
> 
> for comparisons considerations, we explicitly set:
> 
> export MKL_CBWR=COMPATIBLE
> export MKL_NUM_THREADS=1
> 
> but it would be surprising it helps reproduce a problem: they usually 
> stabilize results...
> 

> Merci,
> 
> Eric
> 
>> 
>> Thanks,
>> Pierre
>> 
>> http://joliv.et/irene-rome-configure.log 
>> <http://joliv.et/irene-rome-configure.log>
>> $ /usr/bin/gmake -f gmakefile test test-fail=1
>> Using MAKEFLAGS: test-fail=1
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>  ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>  ok ksp_ksp_tutorials-ex50_tut_2 # SKIP PETSC_HAVE_SUPERLU_DIST requirement 
>> not met
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>  ok snes_tutorials-ex56_hypre
>>  ok diff-snes_tutorials-ex56_hypre
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>  ok snes_tutorials-ex17_3d_q3_trig_elas
>>  ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>  ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>  ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>  ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>  ok snes_tutorials-ex19_tut_3
>>  ok diff-snes_tutorials-ex19_tut_3
>>         TEST arch-linux2-c-opt-ompi/tests/counts/mat_tests-ex242_3.counts
>> not ok mat_tests-ex242_3 # Error code: 137
>> #    [1]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> #    [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range
>> #    [1]PETSC ERROR: Try option -start_in_debugger or 
>> -on_error_attach_debugger
>> #    [1]PETSC ERROR: or see 
>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>> #    [1]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on 
>> GNU/linux and Apple Mac OS X to find memory corruption errors
>> #    [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
>> and run
>> #    [1]PETSC ERROR: to get more information on the crash.
>> #    [1]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------
>> #    [1]PETSC ERROR: Signal received
>> #    [1]PETSC ERROR: See 
>> https://www.mcs.anl.gov/petsc/documentation/faq.html 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>> #    [1]PETSC ERROR: Petsc Development GIT revision: v3.14.4-733-g7ab9467ef9 
>>  GIT Date: 2021-03-02 16:15:11 +0000
>> #    [2]PETSC ERROR: 
>> ------------------------------------------------------------------------
>> #    [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
>> probably memory access out of range
>> #    [2]PETSC ERROR: Try option -start_in_debugger or 
>> -on_error_attach_debugger
>> #    [2]PETSC ERROR: or see 
>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>> #    [2]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/> on 
>> GNU/linux and Apple Mac OS X to find memory corruption errors
>> #    [2]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
>> and run
>> #    [2]PETSC ERROR: to get more information on the crash.
>> #    [2]PETSC ERROR: --------------------- Error Message 
>> --------------------------------------------------------------
>> #    [2]PETSC ERROR: Signal received
>> #    [2]PETSC ERROR: See 
>> https://www.mcs.anl.gov/petsc/documentation/faq.html 
>> <https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble shooting.
>> #    [2]PETSC ERROR: Petsc Development GIT revision: v3.14.4-733-g7ab9467ef9 
>>  GIT Date: 2021-03-02 16:15:11 +0000
>> #    [2]PETSC ERROR: 
>> /ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242
>>  on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar  3 08:21:20 
>> 2021
>> #    [2]PETSC ERROR: Configure options --download-hpddm 
>> --download-hpddm-commit=origin/main --download-hypre --download-metis 
>> --download-mumps --download-parmetis --download-ptscotch --download-slepc 
>> --download-slepc-commit=origin/main --download-tetgen 
>> --known-mpi-c-double-complex --known-mpi-int64_t --known-mpi-long-double 
>> --with-avx512-kernels=1 
>> --with-blaslapack-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64
>>  --with-cc=mpicc --with-cxx=mpicxx --with-debugging=0 --with-fc=mpifort 
>> --with-fortran-bindings=0 --with-make-np=40 
>> --with-mkl_cpardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281
>>  --with-mkl_cpardiso=1 
>> --with-mkl_pardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl
>>  --with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1 
>> --with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/
>>  
>> --with-scalapack-include=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/include
>>  
>> --with-scalapack-lib="[/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_scalapack_lp64.so,/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so]"
>>  --with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast -mavx2" 
>> CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3 -fp-model fast 
>> -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
>> #    [2]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> #    [2]PETSC ERROR: Run with -malloc_debug to check if memory corruption is 
>> causing the crash.
>> #    [1]PETSC ERROR: 
>> /ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242
>>  on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar  3 08:21:20 
>> 2021
>> #    [1]PETSC ERROR: Configure options --download-hpddm 
>> --download-hpddm-commit=origin/main --download-hypre --download-metis 
>> --download-mumps --download-parmetis --download-ptscotch --download-slepc 
>> --download-slepc-commit=origin/main --download-tetgen 
>> --known-mpi-c-double-complex --known-mpi-int64_t --known-mpi-long-double 
>> --with-avx512-kernels=1 
>> --with-blaslapack-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64
>>  --with-cc=mpicc --with-cxx=mpicxx --with-debugging=0 --with-fc=mpifort 
>> --with-fortran-bindings=0 --with-make-np=40 
>> --with-mkl_cpardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281
>>  --with-mkl_cpardiso=1 
>> --with-mkl_pardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl
>>  --with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1 
>> --with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/
>>  
>> --with-scalapack-include=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/include
>>  
>> --with-scalapack-lib="[/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_scalapack_lp64.so,/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so]"
>>  --with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast -mavx2" 
>> CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3 -fp-model fast 
>> -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
>> #    [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> #    [1]PETSC ERROR: Run with -malloc_debug to check if memory corruption is 
>> causing the crash.
>> #    
>> --------------------------------------------------------------------------
>> #    MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
>> #    with errorcode 50176059.
>> #
>> #    NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> #    You may or may not see output from other processes, depending on
>> #    exactly when Open MPI kills them.
>> #    
>> --------------------------------------------------------------------------
>> #    
>> --------------------------------------------------------------------------
>> #    MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
>> #    with errorcode 50176059.
>> #
>> #    NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> #    You may or may not see output from other processes, depending on
>> #    exactly when Open MPI kills them.
>> #    
>> --------------------------------------------------------------------------
>> #    srun: Job step aborted: Waiting up to 302 seconds for job step to 
>> finish.
>> #    slurmstepd-irene4047: error: *** STEP 1374176.36 ON irene4047 CANCELLED 
>> AT 2021-03-03T08:21:20 ***
>> #    srun: error: irene4047: task 0: Killed
>> #    srun: error: irene4047: tasks 1-2: Exited with exit code 16
>>  ok mat_tests-ex242_3 # SKIP Command failed so no diff
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>  ok snes_tutorials-ex17_3d_q3_trig_vlap
>>  ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>  ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>  ok diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>  ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>  ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xper_ref.counts
>>  ok ts_tutorials-ex18_p1p1_xper_ref
>>  ok diff-ts_tutorials-ex18_p1p1_xper_ref
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xyper_ref.counts
>>  ok ts_tutorials-ex18_p1p1_xyper_ref
>>  ok diff-ts_tutorials-ex18_p1p1_xyper_ref
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>  ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>  ok diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>         TEST 
>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>  ok ksp_ksp_tutorials-ex64_1 # SKIP PETSC_HAVE_SUPERLU_DIST requirement not 
>> met
>> 
>>> On 3 Mar 2021, at 6:21 AM, Eric Chamberland 
>>> <eric.chamberl...@giref.ulaval.ca 
>>> <mailto:eric.chamberl...@giref.ulaval.ca>> wrote:
>>> 
>>> Just started a discussion on the side:
>>> 
>>> https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974
>>>  
>>> <https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974>
>>> Eric
>>> 
>>> On 2021-03-02 3:50 p.m., Pierre Jolivet wrote:
>>>> Hello Eric,
>>>> src/mat/tests/ex237.c is a recent test with some code paths that should be 
>>>> disabled for “old” MKL versions. It’s tricky to check directly in the 
>>>> source (we do check in BuildSystem) because there is no such thing as 
>>>> PETSC_PKG_MKL_VERSION_LT, but I guess we can change if 
>>>> defined(PETSC_HAVE_MKL) to if defined(PETSC_HAVE_MKL) && 
>>>> defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll make a MR, thanks for 
>>>> reporting this.
>>>> 
>>>> For the other issues, I’m sensing this is a problem with gomp + 
>>>> intel_gnu_thread, but this is pure speculation… sorry.
>>>> I’ll try to reproduce some of these problems if you are not given a more 
>>>> meaningful answer.
>>>> 
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> On 2 Mar 2021, at 9:14 PM, Eric Chamberland 
>>>>> <eric.chamberl...@giref.ulaval.ca 
>>>>> <mailto:eric.chamberl...@giref.ulaval.ca>> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> It all started when I wanted to test PETSC/CUDA compatibility for our 
>>>>> code.
>>>>> 
>>>>> I had to activate --with-openmp to configure with --with-cuda=1 
>>>>> successfully.
>>>>> 
>>>>> I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and some 
>>>>> other places).
>>>>> 
>>>>> So, I configured and tested petsc with openmp activated, without CUDA.
>>>>> 
>>>>> The first thing I see is that our code CI pipelines now fails for many 
>>>>> tests.
>>>>> 
>>>>> After looking deeper, it seems that PETSc itself fails many tests when I 
>>>>> activate openmp!
>>>>> 
>>>>> Here are all the configurations I have results for, after/before 
>>>>> activating OpenMP for PETSc:
>>>>> ==============================================================================
>>>>> 
>>>>> ==============================================================================
>>>>> 
>>>>> For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:
>>>>> 
>>>>> With OpenMP=1
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log>
>>>>> # -------------
>>>>> #   Summary    
>>>>> # -------------
>>>>> # FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij 
>>>>> diff-ksp_ksp_tests-ex33_superlu_dist_2 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 
>>>>> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 
>>>>> ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist 
>>>>> diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas 
>>>>> snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij 
>>>>> ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist 
>>>>> snes_tutorials-ex12_tri_parmetis_hpddm_baij 
>>>>> diff-snes_tutorials-ex19_tut_3 mat_tests-ex242_3 
>>>>> snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5f_superlu_dist_3 
>>>>> snes_tutorials-ex19_superlu_dist 
>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
>>>>> diff-ksp_ksp_tutorials-ex49_hypre_nullspace 
>>>>> ts_tutorials-ex18_p1p1_xper_ref ts_tutorials-ex18_p1p1_xyper_ref 
>>>>> snes_tutorials-ex19_superlu_dist_2 ksp_ksp_tutorials-ex5_superlu_dist_2 
>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
>>>>> ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist 
>>>>> ksp_ksp_tutorials-ex5f_superlu_dist_2
>>>>> # success 8275/10003 tests (82.7%)
>>>>> # failed 33/10003 tests (0.3%)
>>>>> With OpenMP=0
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log>
>>>>> # -------------
>>>>> #   Summary    
>>>>> # -------------
>>>>> # FAILED tao_constrained_tutorials-tomographyADMM_6 
>>>>> snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 
>>>>> snes_tutorials-ex17_3d_q3_trig_vlap 
>>>>> tao_leastsquares_tutorials-tomography_1 
>>>>> tao_constrained_tutorials-tomographyADMM_5
>>>>> # success 8262/9983 tests (82.8%)
>>>>> # failed 6/9983 tests (0.1%)
>>>>> ==============================================================================
>>>>> 
>>>>> ==============================================================================
>>>>> 
>>>>> For OpenMPI 3.1.x/master:
>>>>> 
>>>>> With OpenMP=1:
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log>
>>>>> # -------------
>>>>> #   Summary    
>>>>> # -------------
>>>>> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 
>>>>> diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 
>>>>> diff-ksp_ksp_tutorials-ex49_hypre_nullspace 
>>>>> ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap 
>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
>>>>> diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre 
>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
>>>>> tao_leastsquares_tutorials-tomography_1 
>>>>> tao_constrained_tutorials-tomographyADMM_4 
>>>>> tao_constrained_tutorials-tomographyADMM_6 
>>>>> diff-tao_constrained_tutorials-toyf_1
>>>>> # success 8142/9765 tests (83.4%)
>>>>> # failed 16/9765 tests (0.2%)
>>>>> With OpenMP=0:
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log>
>>>>> # -------------
>>>>> #   Summary    
>>>>> # -------------
>>>>> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 
>>>>> diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex56_2 
>>>>> snes_tutorials-ex17_3d_q3_trig_vlap 
>>>>> tao_leastsquares_tutorials-tomography_1 
>>>>> tao_constrained_tutorials-tomographyADMM_4 
>>>>> diff-tao_constrained_tutorials-toyf_1
>>>>> # success 8151/9767 tests (83.5%)
>>>>> # failed 9/9767 tests (0.1%)
>>>>> ==============================================================================
>>>>> 
>>>>> ==============================================================================
>>>>> 
>>>>> For OpenMPI 4.0.x/master:
>>>>> 
>>>>> With OpenMP=1:
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log>
>>>>> # FAILED snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex19_hypre 
>>>>> ksp_ksp_tutorials-ex56_2 tao_leastsquares_tutorials-tomography_1 
>>>>> tao_constrained_tutorials-tomographyADMM_5 mat_tests-ex242_3 
>>>>> ksp_ksp_tutorials-ex55_hypre ksp_ksp_tutorials-ex5_superlu_dist_2 
>>>>> tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex56_hypre 
>>>>> snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
>>>>> ksp_ksp_tutorials-ex5f_superlu_dist_3 ksp_ksp_tutorials-ex34_hyprestruct 
>>>>> diff-ksp_ksp_tutorials-ex49_hypre_nullspace 
>>>>> snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
>>>>> ksp_ksp_tutorials-ex5f_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2 
>>>>> ksp_ksp_tutorials-ex5_superlu_dist snes_tutorials-ex19_tut_3 
>>>>> snes_tutorials-ex19_superlu_dist ksp_ksp_tutorials-ex50_tut_2 
>>>>> snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5_superlu_dist_3 
>>>>> snes_tutorials-ex19_superlu_dist_2 
>>>>> tao_constrained_tutorials-tomographyADMM_4 ts_tutorials-ex26_2
>>>>> # success 8125/9753 tests (83.3%)
>>>>> # failed 26/9753 tests (0.3%)
>>>>> With OpenMP=0
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log>
>>>>> # FAILED mat_tests-ex242_3
>>>>> # success 8174/9777 tests (83.6%)
>>>>> # failed 1/9777 tests (0.0%)
>>>>> 
>>>>> ==============================================================================
>>>>> 
>>>>> ==============================================================================
>>>>> 
>>>>> Is that known and normal?
>>>>> 
>>>>> In all cases, I am using MKL and I suspect it  may come from there... :/
>>>>> 
>>>>> I also saw a second problem, "make test" fails to compile petsc examples 
>>>>> on older versions of MKL (but that's less important for me, I just 
>>>>> upgraded to OneAPI to avoid this, but you may want to know):
>>>>> 
>>>>> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log>
>>>>> https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log
>>>>>  
>>>>> <https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log>
>>>>> Thanks,
>>>>> 
>>>>> Eric
>>>>> 
>>>>> -- 
>>>>> Eric Chamberland, ing., M. Ing
>>>>> Professionnel de recherche
>>>>> GIREF/Université Laval
>>>>> (418) 656-2131 poste 41 22 42
>>>> 
>>> -- 
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>> 
> -- 
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42

Reply via email to