Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Pierre Jolivet
> If it ends that there is a problem combining MKL + openMP that relies on 
> linking configuration for example, should it be a good thing to have this 
> (--with-openmp=1) tested into the pipelines (with external packages of 
> course)?
> 
As Barry said, there is not much (if any) OpenMP in PETSc.
There is however some workers with the MKL (+ Intel compilers) turned on, but I 
don’t think we test MKL + GNU compilers (which I feel like is a very niche 
combination, hence not really worth testing, IMHO).
> Does the guys who maintain all these libs are reading petsc-dev? ;)
> 
I don’t think they are, but don’t worry, we do forward the appropriate messages 
to them :)

About yesterday’s failures…
1) I cannot reproduce any of the PCHYPRE/PCBDDC/PCHPDDM errors (sorry I didn’t 
bother putting the SuperLU_DIST tarball on my cluster)
2) I can reproduce the src/mat/tests/ex242.c error (which explicitly uses 
ScaLAPACK, none of the above PC uses it explicitly, except PCBDDC/PCHPDDM when 
using MUMPS on “big” problems where root nodes are factorized using ScaLAPACK, 
see -mat_mumps_icntl_13)
3) I’m seeing that both on your machine and mine, PETSc BuildSystem insist on 
linking libmkl_blacs_intelmpi_lp64.so even though we supply explicitly 
libmkl_blacs_openmpi_lp64.so
This for example yields a wrong Makefile.inc for MUMPS:
$ cat arch-linux2-c-opt-ompi/externalpackages/MUMPS_5.3.5/Makefile.inc|grep 
blacs
SCALAP  = […] -lmkl_blacs_openmpi_lp64
LIBBLAS = […] -lmkl_blacs_intelmpi_lp64 -lgomp -ldl -lpthread -lm […]

Despite what Barry says, I think PETSc is partially to blame as well (why use 
libmkl_blacs_intelmpi_lp64.so even though BuildSystem is capable of detecting 
we are using OpenMPI).
I’ll try to fix this to see if it solves 2).

Thanks,
Pierre

http://joliv.et/irene-rome-configure.log 

$ /usr/bin/gmake -f gmakefile test test-fail=1
Using MAKEFLAGS: test-fail=1
TEST 
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
 ok snes_tutorials-ex12_quad_hpddm_reuse_baij
 ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
TEST 
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
 ok ksp_ksp_tutorials-ex50_tut_2 # SKIP PETSC_HAVE_SUPERLU_DIST requirement not 
met
TEST 
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
 ok snes_tutorials-ex56_hypre
 ok diff-snes_tutorials-ex56_hypre
TEST 
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
 ok snes_tutorials-ex17_3d_q3_trig_elas
 ok diff-snes_tutorials-ex17_3d_q3_trig_elas
TEST 
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
 ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
 ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
TEST 
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
 ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
 ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
TEST 
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
 ok snes_tutorials-ex19_tut_3
 ok diff-snes_tutorials-ex19_tut_3
TEST arch-linux2-c-opt-ompi/tests/counts/mat_tests-ex242_3.counts
not ok mat_tests-ex242_3 # Error code: 137
#   [1]PETSC ERROR: 

#   [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
#   [1]PETSC ERROR: Try option -start_in_debugger or 
-on_error_attach_debugger
#   [1]PETSC ERROR: or see 
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
#   [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac 
OS X to find memory corruption errors
#   [1]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
and run
#   [1]PETSC ERROR: to get more information on the crash.
#   [1]PETSC ERROR: - Error Message 
--
#   [1]PETSC ERROR: Signal received
#   [1]PETSC ERROR: See 
https://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
#   [1]PETSC ERROR: Petsc Development GIT revision: v3.14.4-733-g7ab9467ef9 
 GIT Date: 2021-03-02 16:15:11 +
#   [2]PETSC ERROR: 

#   [2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
#   [2]PETSC ERROR: Try option -start_in_debugger or 
-on_error_attach_debugger
#   [2]PETSC ERROR: or see 
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
#   [2]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac 
OS X to find memory corruption errors
#   [2]PETSC ERROR: configure using --with-debugging=yes, recompile, link, 
and run
#   [2]PETSC ERROR: to get more 

Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Patrick Sanan
The whole section on git in the dev manual needs some attention. (It was moved 
there in the consolidation of docs we had scattered in various places, but 
hasn't been expertly updated yet). Ideal, I think, would be to find some good, 
external instructions and link to them, under the idea that we should only 
maintain things in our own docs that aren't adequately documented somewhere 
else. This might not be possible (since we had to create these instructions in 
the first place).

There is a section on squashing but it's currently a bit buried, and the advice 
in this thread is probably more useful/current
https://docs.petsc.org/en/main/developers/integration/#squashing-excessive-commits
 


If anyone wants to go in there and quickly update those docs, remember that you 
can do so all from web interfaces! This workflow still has some wrinkles, but 
for small changes I still think it's appealing:

- go to the docs page you want to edit on docs.petsc.org
- select the version you want (usually "main") in the black ReadTheDocs box in 
the lower right
- click "edit" in "on GitLab" and make your MR (name the branch with "docs-" to 
maybe get it to auto-build on ReadTheDocs, label with docs and docs-only)
- if you get feedback on your MR and need to update, or notice a typo, I 
*think* this will work:
   - click on the last commit of your new branch
   - find the offending file
   - click on "edit at @deadbeef123"
- change the branch *back* to your branch in the pulldown
- click "edit"
- back in your MR, edit to "squash commits"

You can get a partial preview with the usual "preview" button, though not 
everything is interpreted correctly (but for things like links, it works fine).

If you want a full preview, you can

1. Build the Sphinx docs locally from your branch, either with
- "make sphinx-docs-all LOC=$PETSC_DIR"  (you may need to add 
PYTHON=python3, since this relies on Python 3.3+ for venv) 
- install the required Python packages yourself (e.g. pip install -r 
src/docs/sphinx_docs/requirements.txt), go to src/docs/sphinx_docs, run "make 
html", and look in _build/html

2. Build the Sphinx docs for your branch as a version on ReadTheDocs. There is 
currently an automation rule there that if your branch name has "docs-" in it, 
it should build (though I must admit I'm still not completely sure I understand 
exactly when RTD updates its information from GitLab). Or, if you have access, 
you can activate a new version yourself.



> Am 03.03.2021 um 05:32 schrieb Jed Brown :
> 
> Satish Balay via petsc-dev  > writes:
> 
>> On Wed, 3 Mar 2021, Blaise A Bourdin wrote:
>> 
>>> Hi,
>>> 
>>> This is not technically a petsc question. 
>>> It would be great to have a short section in the PETSc integration workflow 
>>> document explaining how to squash commits in a MR for git-impaired 
>>> developers like me.
>>> 
>>> Anybody wants to pitch in, or explain me how to do this?
>> 
>> To squash commits - I use the 'squash' action in 'git rebase -i HASH' and 
>> figure out the HASH to use from 'gitk main..branch'
>> 
>> [as git rebase requires the commit prior to the first commit of interest]
>> 
>> git provides many ways of modifying the branch (and the rebase topic is very 
>> generic) so I think its best to rely on proper git docs/tutorials
>> [and its not really specific to petsc workflow]
> 
> You can do it in one line, without changing the base:
> 
>  git rebase -i $(git merge-base main HEAD)
> 
> 
> An alternative is
> 
>  git rebase -i main
> 
> which gives you interactive rebase to replay on top of current 'main'. This 
> does two things at once and changing the base for your branch is not always 
> desirable.



Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Eric Chamberland

Just started a discussion on the side:

https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974

Eric

On 2021-03-02 3:50 p.m., Pierre Jolivet wrote:

Hello Eric,
src/mat/tests/ex237.c is a recent test with some code paths that 
should be disabled for “old” MKL versions. It’s tricky to check 
directly in the source (we do check in BuildSystem) because there is 
no such thing as PETSC_PKG_MKL_VERSION_LT, but I guess we can change 
if defined(PETSC_HAVE_MKL) to if defined(PETSC_HAVE_MKL) && 
defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll make a MR, thanks for 
reporting this.
For the other issues, I’m sensing this is a problem with gomp + 
intel_gnu_thread, but this is pure speculation… sorry.
I’ll try to reproduce some of these problems if you are not given a 
more meaningful answer.

Thanks,
Pierre
On 2 Mar 2021, at 9:14 PM, Eric Chamberland 
> wrote:


Hi,

It all started when I wanted to test PETSC/CUDA compatibility for our 
code.


I had to activate --with-openmp to configure with --with-cuda=1 
successfully.


I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and 
some other places).


So, I configured and tested petsc with openmp activated, without CUDA.

The first thing I see is that our code CI pipelines now fails for 
many tests.


After looking deeper, it seems that PETSc itself fails many tests 
when I activate openmp!


Here are all the configurations I have results for, after/before 
activating OpenMP for PETSc:


==

==

For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:

With OpenMP=1

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log

# -
#   Summary
# -
# FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij 
diff-ksp_ksp_tests-ex33_superlu_dist_2 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 
ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist 
diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas 
snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij 
ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist 
snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 
mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap 
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref 
ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 
ksp_ksp_tutorials-ex5_superlu_dist_2 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist 
ksp_ksp_tutorials-ex5f_superlu_dist_2
# success 8275/10003 tests (82.7%)
#*failed 33/10003*  tests (0.3%)

With OpenMP=0

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log

# -
#   Summary
# -
# FAILED tao_constrained_tutorials-tomographyADMM_6 
snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 
snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 
tao_constrained_tutorials-tomographyADMM_5
# success 8262/9983 tests (82.8%)
#*failed 6/9983*  tests (0.1%)

==

==

For OpenMPI 3.1.x/master:

With OpenMP=1:

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log

# -
#   Summary
# -
# FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 
diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 
diff-ksp_ksp_tutorials-ex49_hypre_nullspace 
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
tao_leastsquares_tutorials-tomography_1 

Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Eric Chamberland

Hi Pierre,

On 2021-03-02 3:50 p.m., Pierre Jolivet wrote:

Hello Eric,
src/mat/tests/ex237.c is a recent test with some code paths that 
should be disabled for “old” MKL versions. It’s tricky to check 
directly in the source (we do check in BuildSystem) because there is 
no such thing as PETSC_PKG_MKL_VERSION_LT, but I guess we can change 
if defined(PETSC_HAVE_MKL) to if defined(PETSC_HAVE_MKL) && 
defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll make a MR, thanks for 
reporting this.

Just saw your MR, thanks for the fix! :)
For the other issues, I’m sensing this is a problem with gomp + 
intel_gnu_thread, but this is pure speculation… sorry.
I’ll try to reproduce some of these problems if you are not given a 
more meaningful answer.


ok, I feel like I am the first to have problems with this...

If it ends that there is a problem combining MKL + openMP that relies on 
linking configuration for example, should it be a good thing to have 
this (--with-openmp=1) tested into the pipelines (with external packages 
of course)?


Thanks,

Eric


Thanks,
Pierre
On 2 Mar 2021, at 9:14 PM, Eric Chamberland 
> wrote:


Hi,

It all started when I wanted to test PETSC/CUDA compatibility for our 
code.


I had to activate --with-openmp to configure with --with-cuda=1 
successfully.


I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and 
some other places).


So, I configured and tested petsc with openmp activated, without CUDA.

The first thing I see is that our code CI pipelines now fails for 
many tests.


After looking deeper, it seems that PETSc itself fails many tests 
when I activate openmp!


Here are all the configurations I have results for, after/before 
activating OpenMP for PETSc:


==

==

For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:

With OpenMP=1

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log

# -
#   Summary
# -
# FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij 
diff-ksp_ksp_tests-ex33_superlu_dist_2 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 
ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist 
diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas 
snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij 
ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist 
snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 
mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap 
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref 
ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 
ksp_ksp_tutorials-ex5_superlu_dist_2 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist 
ksp_ksp_tutorials-ex5f_superlu_dist_2
# success 8275/10003 tests (82.7%)
#*failed 33/10003*  tests (0.3%)

With OpenMP=0

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log

# -
#   Summary
# -
# FAILED tao_constrained_tutorials-tomographyADMM_6 
snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 
snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 
tao_constrained_tutorials-tomographyADMM_5
# success 8262/9983 tests (82.8%)
#*failed 6/9983*  tests (0.1%)

==

==

For OpenMPI 3.1.x/master:

With OpenMP=1:

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log

# -
#   Summary
# -
# FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 
diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 
diff-ksp_ksp_tutorials-ex49_hypre_nullspace 
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap 

Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Eric Chamberland


On 2021-03-02 10:59 p.m., Barry Smith wrote:


  It could be related to MKL but it could also be due to problems with 
Scalapack when used with OpenMP. Do you need Scalapack? Maybe you want 
to use it since it used by MUMPS?

Yes, exactly for mumps!



#2: I can deal with that! :)


#3: I am not sure if this output is due to the way I configure 
OpenMPI/3.x:


  $ ./configure --prefix=/opt/openmpi-3.x_debug --enable-debug 
--enable-picky CXXFLAGS=-std=c++14 --with-wrapper-cxxflags=-std=c++14 
--with-cma


or this export:

export OMPI_MCA_plm_base_verbose=5

which I left there to track an intermittent bug at singleton startup 
(https://www.mail-archive.com/devel@lists.open-mpi.org/msg19568.html)..


I will remove this now, 4 years later, it does not happen anymore... 
But I don't think it should be harmful for PETSc tests, is it?


I am guessing that this is just informative information that does not 
indicate a problem. But I am confused why it appears only 
occasionally, presumably it is related to the current state of the 
system.


But the PETSc tests have no way of knowing that this type of output to 
stdout or stderr is "harmless" informative information versus am 
indication of something being seriously broken. One way of checking 
PETSc tests in make test is to process the output and look for 
something that is not "normal" and this is definitely not normal.
I think you need to turn off the verbosity when running PETSc tests 
and then hopefully this particular problem will go away.


  ok tao_constrained_tutorials-toyf_1
not ok diff-tao_constrained_tutorials-toyf_1 # Error code: 1
#   0a1,17
#   > [zorg:09243] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh 
path NULL
#   > [zorg:09243] plm:base:set_hnp_name: initial bias 9243 nodename hash 
810220270
#   > [zorg:09243] plm:base:set_hnp_name: final jobfam 61119
#   > [zorg:09243] [[61119,0],0] plm:rsh_setup on agent ssh : rsh path NULL
#   > [zorg:09243] [[61119,0],0] plm:base:receive start comm
#   > [zorg:09243] [[61119,0],0] plm:base:setup_job
#   > [zorg:09243] [[61119,0],0] plm:base:setup_vm
#   > [zorg:09243] [[61119,0],0] plm:base:setup_vm creating map
#   > [zorg:09243] [[61119,0],0] setup:vm: working unmanaged allocation
#   > [zorg:09243] [[61119,0],0] using default hostfile 
/opt/openmpi-3.x_debug/etc/openmpi-default-hostfile
#   > [zorg:09243] [[61119,0],0] plm:base:setup_vm only HNP in allocation
#   > [zorg:09243] [[61119,0],0] plm:base:setting slots for node zorg by 
cores
#   > [zorg:09243] [[61119,0],0] complete_setup on job [61119,1]
#   > [zorg:09243] [[61119,0],0] plm:base:launch_apps for job [61119,1]
#   > [zorg:09243] [[61119,0],0] plm:base:launch wiring up iof for job 
[61119,1]
#   > [zorg:09243] [[61119,0],0] plm:base:launch job [61119,1] is not a 
dynamic spawn
#   > [zorg:09243] [[61119,0],0] plm:base:launch [61119,1] registered
#   58a76,77
#   > [zorg:09243] [[61119,0],0] plm:base:orted_cmd sending orted_exit 
commands
#   > [zorg:09243] [[61119,0],0] plm:base:receive stop comm

Understood: here we did some workaround to filter these out for 
stdout/stderr comparison we do in nightly tests.




#4: I do have a default choice for L2 projections which uses HYPRE 
BoomerAMG preconditioning...that now ends with 
KSP_DIVERGED_INDEFINITE_PC, so this is definitely a problem...


#5: we do sometime use superlu-dist


  I'm afraid for the possibly MUMPS, Superlu_DIST and hypre problems 
you need to debug them one at a time by running the particular 
troublesome example in the debugger to determine the problem.  It 
could also be due to the relationship between the MKL and the OpenMP 
implementation. I don't know exactly how MKL's multi-threaded code 
runs in relation to OpenML and certainly if the compiler is providing 
a different OpenMP than MKL is using it will not work.


Does the guys who maintain all these libs are reading petsc-dev? ;)




  Under most circumstances if you are using MKL with threading and 
PETSc you likely only want to use one MKL thread since PETSc already 
handles the maximum parallelism with MPI and there are no "extra" 
processors available to parallelize the BLAS/LAPACK called from PETSc 
for more performance inside the MKL.   This may not be true if you are 
using MUMPS which makes things far more complicated.


 OpenMP is complicated in the context of PETSc and several external 
packages because different packages may use it in different ways that 
require different tuning and I won't know the tuning for each.


okay: we do no use OpenMP neither... we all rely on MPI for parallelism 
too... So if I could just compile for CUDA without it, I would be happy


But, do you think it could be turned on/off only for specific packages 
at configuration time?  In regards of the bugs encountered, it is not 
interesting to activate it for all external packages...


regards,

Eric



  Barry





#6: we do start 

Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Jed Brown
Satish Balay via petsc-dev  writes:

> On Wed, 3 Mar 2021, Blaise A Bourdin wrote:
>
>> Hi,
>> 
>> This is not technically a petsc question. 
>> It would be great to have a short section in the PETSc integration workflow 
>> document explaining how to squash commits in a MR for git-impaired 
>> developers like me.
>> 
>> Anybody wants to pitch in, or explain me how to do this?
>
> To squash commits - I use the 'squash' action in 'git rebase -i HASH' and 
> figure out the HASH to use from 'gitk main..branch'
>
> [as git rebase requires the commit prior to the first commit of interest]
>
> git provides many ways of modifying the branch (and the rebase topic is very 
> generic) so I think its best to rely on proper git docs/tutorials
> [and its not really specific to petsc workflow]

You can do it in one line, without changing the base:

  git rebase -i $(git merge-base main HEAD)


An alternative is

  git rebase -i main

which gives you interactive rebase to replay on top of current 'main'. This 
does two things at once and changing the base for your branch is not always 
desirable.


Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Barry Smith


> On Mar 2, 2021, at 10:09 PM, Jacob Faibussowitsch  wrote:
> 
>>   I do not get this. I thought that rebasing with main put all the main 
>> changes in before your commits. I have never seen any interspersed, so I do 
>> not understand this.
> 
> Mark, do you rebase your branch over main first or do you merge main intro 
> your branch to update it? If you rebase, you can pick an N such that you only 
> get your commits from the branch.

   I just do 

git rebase main   (from my branch) 

 I don't need to provide an N, it automatically puts all the new changes in 
main BEFORE my commits in my branch. I would never merge my branch with main.

I put the N in when I am rebasing my branch "against itself" and trying to 
organize the commits within my branch. By having the N be within the range of 
my commits it means I am just reorganizing my commits and not messing with any 
previous commits that came before my branch.




> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
> 
>> On Mar 2, 2021, at 22:02, Barry Smith > > wrote:
>> 
>> 
>> 
>>> On Mar 2, 2021, at 9:24 PM, Mark Adams >> > wrote:
>>> 
>>> Ah, 'git graph' I will try that next time.
>>> 
>>> I use 'git rebase -i HEAD~N', but you need an N. 
>>> 
>>> After you 'git rebase origin/main' you get other commits interspersed in 
>>> with yours,
>> 
>>I do not get this. I thought that rebasing with main put all the main 
>> changes in before your commits. I have never seen any interspersed, so I do 
>> not understand this.
>> 
>>   Barry
>> 
>>> so I try to rebase -i before rebasing over main. Then rebase over main and 
>>> you have a clean and updated branch. 
>>> 
>>> Pick N to be large enough to cover the commits that you want to clean up. 
>>> Don't touch the ones that are not yours from main, the last time you 
>>> rebased over main.
>>> 
>>> On Tue, Mar 2, 2021 at 10:02 PM Junchao Zhang >> > wrote:
>>> I am a naive git user, so I use interactive git rebase.  Suppose I am on 
>>> the branch I want to modify, 
>>> 
>>> 1) Use git graph to locate an upstream commit to be used as the base
>>> $ git graph
>>> * 0d5433e9 (HEAD -> jczhang/sf-change-api) SF: rename SFCreateEmbeddedSF to 
>>> SFCreateEmbeddedRootSF
>>> * e7314fbb SF: add an MPI_Op argument to SFBcast
>>> * 83df288d Replace MPIU_REPLACE with MPI_REPLACE
>>> *   b434c516 Merge branch 'barry/2021-02-02/petscsf-communication-specific' 
>>> into 'main'
>>> |\
>>> | * 62152ded (barry/2021-02-02/petscsf-communication-specific) 
>>> PetscSFView() never called viewer for the specific type (bug), hence many 
>>> output files were incorrect.
>>> * |   a4f5d9b4 Merge branch 'jose/upgrade-magma' into 'main'
>>> 
>>> 2) Suppose we choose b434c516 as the base. All commits we want to squash 
>>> are after it.  Do interactive git rebase. It shows a screen for you to 
>>> edit.  Read the help, which is helpful for new users
>>>   $ git rebase -i b434c516
>>> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
>>> pick e7314fbb SF: add an MPI_Op argument to SFBcast
>>> pick 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>>> 
>>> # Rebase b434c516..0d5433e9 onto b434c516 (3 commands)
>>> #
>>> # Commands:
>>> # p, pick  = use commit
>>> # r, reword  = use commit, but edit the commit message
>>> # e, edit  = use commit, but stop for amending
>>> # s, squash  = use commit, but meld into previous commit
>>> # f, fixup  = like "squash", but discard this commit's log message
>>> # x, exec  = run command (the rest of the line) using shell
>>> # b, break = stop here (continue rebase later with 'git rebase --continue')
>>> # d, drop  = remove commit
>>> # l, label  = label current HEAD with a name
>>> # t, reset  = reset HEAD to a label
>>> # m, merge [-C  | -c ]  [# ]
>>> # .   create a merge commit using the original merge commit's
>>> # .   message (or the oneline, if no original merge commit was
>>> # .   specified). Use -c  to reword the commit message.
>>> #
>>> # These lines can be re-ordered; they are executed from top to bottom.
>>> #
>>> # If you remove a line here THAT COMMIT WILL BE LOST.
>>> #
>>> # However, if you remove everything, the rebase will be aborted.
>>> #
>>> # Note that empty commits are commented out
>>> 
>>> 3) Suppose we want to squash the last two commits to 83df288d, replace 
>>> their pick with s (or f, see the help for difference), save and exit the 
>>> screen
>>> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
>>> s e7314fbb SF: add an MPI_Op argument to SFBcast
>>> s 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>>> 
>>> A new screen shows up
>>> 
>>> # This is a combination of 3 commits.
>>> # This is the 1st commit message:
>>> 
>>> Replace MPIU_REPLACE with MPI_REPLACE
>>> 
>>> Since we believe all MPI implementations support MPI_REPLACE
>>> 
>>> # This is the commit message #2:
>>> 
>>> 

Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Jacob Faibussowitsch
>   I do not get this. I thought that rebasing with main put all the main 
> changes in before your commits. I have never seen any interspersed, so I do 
> not understand this.

Mark, do you rebase your branch over main first or do you merge main intro your 
branch to update it? If you rebase, you can pick an N such that you only get 
your commits from the branch.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391

> On Mar 2, 2021, at 22:02, Barry Smith  wrote:
> 
> 
> 
>> On Mar 2, 2021, at 9:24 PM, Mark Adams > > wrote:
>> 
>> Ah, 'git graph' I will try that next time.
>> 
>> I use 'git rebase -i HEAD~N', but you need an N. 
>> 
>> After you 'git rebase origin/main' you get other commits interspersed in 
>> with yours,
> 
>I do not get this. I thought that rebasing with main put all the main 
> changes in before your commits. I have never seen any interspersed, so I do 
> not understand this.
> 
>   Barry
> 
>> so I try to rebase -i before rebasing over main. Then rebase over main and 
>> you have a clean and updated branch. 
>> 
>> Pick N to be large enough to cover the commits that you want to clean up. 
>> Don't touch the ones that are not yours from main, the last time you rebased 
>> over main.
>> 
>> On Tue, Mar 2, 2021 at 10:02 PM Junchao Zhang > > wrote:
>> I am a naive git user, so I use interactive git rebase.  Suppose I am on the 
>> branch I want to modify, 
>> 
>> 1) Use git graph to locate an upstream commit to be used as the base
>> $ git graph
>> * 0d5433e9 (HEAD -> jczhang/sf-change-api) SF: rename SFCreateEmbeddedSF to 
>> SFCreateEmbeddedRootSF
>> * e7314fbb SF: add an MPI_Op argument to SFBcast
>> * 83df288d Replace MPIU_REPLACE with MPI_REPLACE
>> *   b434c516 Merge branch 'barry/2021-02-02/petscsf-communication-specific' 
>> into 'main'
>> |\
>> | * 62152ded (barry/2021-02-02/petscsf-communication-specific) PetscSFView() 
>> never called viewer for the specific type (bug), hence many output files 
>> were incorrect.
>> * |   a4f5d9b4 Merge branch 'jose/upgrade-magma' into 'main'
>> 
>> 2) Suppose we choose b434c516 as the base. All commits we want to squash are 
>> after it.  Do interactive git rebase. It shows a screen for you to edit.  
>> Read the help, which is helpful for new users
>>   $ git rebase -i b434c516
>> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
>> pick e7314fbb SF: add an MPI_Op argument to SFBcast
>> pick 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>> 
>> # Rebase b434c516..0d5433e9 onto b434c516 (3 commands)
>> #
>> # Commands:
>> # p, pick  = use commit
>> # r, reword  = use commit, but edit the commit message
>> # e, edit  = use commit, but stop for amending
>> # s, squash  = use commit, but meld into previous commit
>> # f, fixup  = like "squash", but discard this commit's log message
>> # x, exec  = run command (the rest of the line) using shell
>> # b, break = stop here (continue rebase later with 'git rebase --continue')
>> # d, drop  = remove commit
>> # l, label  = label current HEAD with a name
>> # t, reset  = reset HEAD to a label
>> # m, merge [-C  | -c ]  [# ]
>> # .   create a merge commit using the original merge commit's
>> # .   message (or the oneline, if no original merge commit was
>> # .   specified). Use -c  to reword the commit message.
>> #
>> # These lines can be re-ordered; they are executed from top to bottom.
>> #
>> # If you remove a line here THAT COMMIT WILL BE LOST.
>> #
>> # However, if you remove everything, the rebase will be aborted.
>> #
>> # Note that empty commits are commented out
>> 
>> 3) Suppose we want to squash the last two commits to 83df288d, replace their 
>> pick with s (or f, see the help for difference), save and exit the screen
>> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
>> s e7314fbb SF: add an MPI_Op argument to SFBcast
>> s 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>> 
>> A new screen shows up
>> 
>> # This is a combination of 3 commits.
>> # This is the 1st commit message:
>> 
>> Replace MPIU_REPLACE with MPI_REPLACE
>> 
>> Since we believe all MPI implementations support MPI_REPLACE
>> 
>> # This is the commit message #2:
>> 
>> SF: add an MPI_Op argument to SFBcast
>> 
>> # This is the commit message #3:
>> 
>> SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>> 
>> # Please enter the commit message for your changes. Lines starting
>> # with '#' will be ignored, and an empty message aborts the commit.
>> 
>> 4) Edit the commit message as you want, save and exit, done!
>> 
>> --Junchao Zhang
>> 
>> 
>> On Tue, Mar 2, 2021 at 6:19 PM Blaise A Bourdin > > wrote:
>> Hi,
>> 
>> This is not technically a petsc question. 
>> It would be great to have a short section in the PETSc integration workflow 
>> document explaining how to squash commits in a MR for git-impaired 
>> developers like me.
>> 
>> Anybody 

Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Barry Smith


> On Mar 2, 2021, at 9:24 PM, Mark Adams  wrote:
> 
> Ah, 'git graph' I will try that next time.
> 
> I use 'git rebase -i HEAD~N', but you need an N. 
> 
> After you 'git rebase origin/main' you get other commits interspersed in with 
> yours,

   I do not get this. I thought that rebasing with main put all the main 
changes in before your commits. I have never seen any interspersed, so I do not 
understand this.

  Barry

> so I try to rebase -i before rebasing over main. Then rebase over main and 
> you have a clean and updated branch. 
> 
> Pick N to be large enough to cover the commits that you want to clean up. 
> Don't touch the ones that are not yours from main, the last time you rebased 
> over main.
> 
> On Tue, Mar 2, 2021 at 10:02 PM Junchao Zhang  > wrote:
> I am a naive git user, so I use interactive git rebase.  Suppose I am on the 
> branch I want to modify, 
> 
> 1) Use git graph to locate an upstream commit to be used as the base
> $ git graph
> * 0d5433e9 (HEAD -> jczhang/sf-change-api) SF: rename SFCreateEmbeddedSF to 
> SFCreateEmbeddedRootSF
> * e7314fbb SF: add an MPI_Op argument to SFBcast
> * 83df288d Replace MPIU_REPLACE with MPI_REPLACE
> *   b434c516 Merge branch 'barry/2021-02-02/petscsf-communication-specific' 
> into 'main'
> |\
> | * 62152ded (barry/2021-02-02/petscsf-communication-specific) PetscSFView() 
> never called viewer for the specific type (bug), hence many output files were 
> incorrect.
> * |   a4f5d9b4 Merge branch 'jose/upgrade-magma' into 'main'
> 
> 2) Suppose we choose b434c516 as the base. All commits we want to squash are 
> after it.  Do interactive git rebase. It shows a screen for you to edit.  
> Read the help, which is helpful for new users
>   $ git rebase -i b434c516
> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
> pick e7314fbb SF: add an MPI_Op argument to SFBcast
> pick 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
> 
> # Rebase b434c516..0d5433e9 onto b434c516 (3 commands)
> #
> # Commands:
> # p, pick  = use commit
> # r, reword  = use commit, but edit the commit message
> # e, edit  = use commit, but stop for amending
> # s, squash  = use commit, but meld into previous commit
> # f, fixup  = like "squash", but discard this commit's log message
> # x, exec  = run command (the rest of the line) using shell
> # b, break = stop here (continue rebase later with 'git rebase --continue')
> # d, drop  = remove commit
> # l, label  = label current HEAD with a name
> # t, reset  = reset HEAD to a label
> # m, merge [-C  | -c ]  [# ]
> # .   create a merge commit using the original merge commit's
> # .   message (or the oneline, if no original merge commit was
> # .   specified). Use -c  to reword the commit message.
> #
> # These lines can be re-ordered; they are executed from top to bottom.
> #
> # If you remove a line here THAT COMMIT WILL BE LOST.
> #
> # However, if you remove everything, the rebase will be aborted.
> #
> # Note that empty commits are commented out
> 
> 3) Suppose we want to squash the last two commits to 83df288d, replace their 
> pick with s (or f, see the help for difference), save and exit the screen
> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
> s e7314fbb SF: add an MPI_Op argument to SFBcast
> s 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
> 
> A new screen shows up
> 
> # This is a combination of 3 commits.
> # This is the 1st commit message:
> 
> Replace MPIU_REPLACE with MPI_REPLACE
> 
> Since we believe all MPI implementations support MPI_REPLACE
> 
> # This is the commit message #2:
> 
> SF: add an MPI_Op argument to SFBcast
> 
> # This is the commit message #3:
> 
> SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
> 
> # Please enter the commit message for your changes. Lines starting
> # with '#' will be ignored, and an empty message aborts the commit.
> 
> 4) Edit the commit message as you want, save and exit, done!
> 
> --Junchao Zhang
> 
> 
> On Tue, Mar 2, 2021 at 6:19 PM Blaise A Bourdin  > wrote:
> Hi,
> 
> This is not technically a petsc question. 
> It would be great to have a short section in the PETSc integration workflow 
> document explaining how to squash commits in a MR for git-impaired developers 
> like me.
> 
> Anybody wants to pitch in, or explain me how to do this?
> 
> Regards,
> Blaise
> 
> -- 
> A.K. & Shirley Barton Professor of  Mathematics
> Adjunct Professor of Mechanical Engineering
> Adjunct of the Center for Computation & Technology
> Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803, USA
> Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276 Web 
> http://www.math.lsu.edu/~bourdin 
> 



Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Barry Smith


> On Mar 2, 2021, at 9:09 PM, Eric Chamberland 
>  wrote:
> 
> Hi Barry,
> 
> My ultimate goal is to compile a single PETSc version that is CUDA aware.  
> That means that I would like a single binary of our FE code (that will be 
> distributed on Compute Canada clusters) so my users can do CPU or GPU 
> computations.
> 
> However, if activating GPU implies activating OpenMP wich implies having many 
> bugs with our code usages (with mumps or Hypre for example), it is a no-go 
> for us right now...  or at least, I have to deliver a CUDA only version..
> 
> 
> 
> #1: I always have the feeling that something may go wrong with MKL linking.  
> I know about the link advisor, but...  What do you think this is related to?
> 

  It could be related to MKL but it could also be due to problems with 
Scalapack when used with OpenMP. Do you need Scalapack? Maybe you want to use 
it since it used by MUMPS?
> 
> #2: I can deal with that! :)
> 
> 
> 
> #3: I am not sure if this output is due to the way I configure OpenMPI/3.x:
> 
>   $ ./configure --prefix=/opt/openmpi-3.x_debug --enable-debug --enable-picky 
> CXXFLAGS=-std=c++14 --with-wrapper-cxxflags=-std=c++14 --with-cma
> 
> or this export:
> 
> export OMPI_MCA_plm_base_verbose=5
> 
> which I left there to track an intermittent bug at singleton startup 
> (https://www.mail-archive.com/devel@lists.open-mpi.org/msg19568.html 
> ).. 
> 
> I will remove this now, 4 years later, it does not happen anymore... But I 
> don't think it should be harmful for PETSc tests, is it?
> 
I am guessing that this is just informative information that does not indicate 
a problem. But I am confused why it appears only occasionally, presumably it is 
related to the current state of the system. 

But the PETSc tests have no way of knowing that this type of output to stdout 
or stderr is "harmless" informative information versus am indication of 
something being seriously broken. One way of checking PETSc tests in make test 
is to process the output and look for something that is not "normal" and this 
is definitely not normal. 
I think you need to turn off the verbosity when running PETSc tests and then 
hopefully this particular problem will go away.

 ok tao_constrained_tutorials-toyf_1
not ok diff-tao_constrained_tutorials-toyf_1 # Error code: 1
#   0a1,17
#   > [zorg:09243] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh 
path NULL
#   > [zorg:09243] plm:base:set_hnp_name: initial bias 9243 nodename hash 
810220270
#   > [zorg:09243] plm:base:set_hnp_name: final jobfam 61119
#   > [zorg:09243] [[61119,0],0] plm:rsh_setup on agent ssh : rsh path NULL
#   > [zorg:09243] [[61119,0],0] plm:base:receive start comm
#   > [zorg:09243] [[61119,0],0] plm:base:setup_job
#   > [zorg:09243] [[61119,0],0] plm:base:setup_vm
#   > [zorg:09243] [[61119,0],0] plm:base:setup_vm creating map
#   > [zorg:09243] [[61119,0],0] setup:vm: working unmanaged allocation
#   > [zorg:09243] [[61119,0],0] using default hostfile 
/opt/openmpi-3.x_debug/etc/openmpi-default-hostfile
#   > [zorg:09243] [[61119,0],0] plm:base:setup_vm only HNP in allocation
#   > [zorg:09243] [[61119,0],0] plm:base:setting slots for node zorg by 
cores
#   > [zorg:09243] [[61119,0],0] complete_setup on job [61119,1]
#   > [zorg:09243] [[61119,0],0] plm:base:launch_apps for job [61119,1]
#   > [zorg:09243] [[61119,0],0] plm:base:launch wiring up iof for job 
[61119,1]
#   > [zorg:09243] [[61119,0],0] plm:base:launch job [61119,1] is not a 
dynamic spawn
#   > [zorg:09243] [[61119,0],0] plm:base:launch [61119,1] registered
#   58a76,77
#   > [zorg:09243] [[61119,0],0] plm:base:orted_cmd sending orted_exit 
commands
#   > [zorg:09243] [[61119,0],0] plm:base:receive stop comm


> 
> #4: I do have a default choice for L2 projections which uses HYPRE BoomerAMG 
> preconditioning...that now ends with KSP_DIVERGED_INDEFINITE_PC, so this is 
> definitely a problem... 
> 
> #5: we do sometime use superlu-dist
> 
> 
  
  I'm afraid for the possibly MUMPS, Superlu_DIST and hypre problems you need 
to debug them one at a time by running the particular troublesome example in 
the debugger to determine the problem.  It could also be due to the 
relationship between the MKL and the OpenMP implementation. I don't know 
exactly how MKL's multi-threaded code runs in relation to OpenML and certainly 
if the compiler is providing a different OpenMP than MKL is using it will not 
work.

  Under most circumstances if you are using MKL with threading and PETSc you 
likely only want to use one MKL thread since PETSc already handles the maximum 
parallelism with MPI and there are no "extra" processors available to 
parallelize the BLAS/LAPACK called from PETSc for more performance inside the 
MKL.   This may not be true if you are using MUMPS which makes things far more 
complicated.

 OpenMP is 

Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Satish Balay via petsc-dev
On Wed, 3 Mar 2021, Blaise A Bourdin wrote:

> Hi,
> 
> This is not technically a petsc question. 
> It would be great to have a short section in the PETSc integration workflow 
> document explaining how to squash commits in a MR for git-impaired developers 
> like me.
> 
> Anybody wants to pitch in, or explain me how to do this?

To squash commits - I use the 'squash' action in 'git rebase -i HASH' and 
figure out the HASH to use from 'gitk main..branch'

[as git rebase requires the commit prior to the first commit of interest]

git provides many ways of modifying the branch (and the rebase topic is very 
generic) so I think its best to rely on proper git docs/tutorials
[and its not really specific to petsc workflow]

Satish

> 
> Regards,
> Blaise
>  
> 



Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Mark Adams
Ah, 'git graph' I will try that next time.

I use 'git rebase -i HEAD~N', but you need an N.

After you 'git rebase origin/main' you get other commits interspersed in
with yours, so I try to rebase -i before rebasing over main. Then rebase
over main and you have a clean and updated branch.

Pick N to be large enough to cover the commits that you want to clean up.
Don't touch the ones that are not yours from main, the last time you
rebased over main.

On Tue, Mar 2, 2021 at 10:02 PM Junchao Zhang 
wrote:

> I am a naive git user, so I use interactive git rebase.  Suppose I am on
> the branch I want to modify,
>
> 1) Use git graph to locate an upstream commit to be used as the base
> $ git graph
> * 0d5433e9 (HEAD -> jczhang/sf-change-api) SF: rename SFCreateEmbeddedSF
> to SFCreateEmbeddedRootSF
> * e7314fbb SF: add an MPI_Op argument to SFBcast
> * 83df288d Replace MPIU_REPLACE with MPI_REPLACE
> *   b434c516 Merge branch
> 'barry/2021-02-02/petscsf-communication-specific' into 'main'
> |\
> | * 62152ded (barry/2021-02-02/petscsf-communication-specific)
> PetscSFView() never called viewer for the specific type (bug), hence many
> output files were incorrect.
> * |   a4f5d9b4 Merge branch 'jose/upgrade-magma' into 'main'
>
> 2) Suppose we choose b434c516 as the base. All commits we want to squash
> are after it.  Do interactive git rebase. It shows a screen for you to
> edit.  Read the help, which is helpful for new users
>   $ git rebase -i b434c516
> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
> pick e7314fbb SF: add an MPI_Op argument to SFBcast
> pick 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>
> # Rebase b434c516..0d5433e9 onto b434c516 (3 commands)
> #
> # Commands:
> # p, pick  = use commit
> # r, reword  = use commit, but edit the commit message
> # e, edit  = use commit, but stop for amending
> # s, squash  = use commit, but meld into previous commit
> # f, fixup  = like "squash", but discard this commit's log message
> # x, exec  = run command (the rest of the line) using shell
> # b, break = stop here (continue rebase later with 'git rebase --continue')
> # d, drop  = remove commit
> # l, label  = label current HEAD with a name
> # t, reset  = reset HEAD to a label
> # m, merge [-C  | -c ]  [# ]
> # .   create a merge commit using the original merge commit's
> # .   message (or the oneline, if no original merge commit was
> # .   specified). Use -c  to reword the commit message.
> #
> # These lines can be re-ordered; they are executed from top to bottom.
> #
> # If you remove a line here THAT COMMIT WILL BE LOST.
> #
> # However, if you remove everything, the rebase will be aborted.
> #
> # Note that empty commits are commented out
>
> 3) Suppose we want to squash the last two commits to 83df288d, replace
> their pick with s (or f, see the help for difference), save and exit the
> screen
> pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
> s e7314fbb SF: add an MPI_Op argument to SFBcast
> s 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>
> A new screen shows up
>
> # This is a combination of 3 commits.
> # This is the 1st commit message:
>
> Replace MPIU_REPLACE with MPI_REPLACE
>
> Since we believe all MPI implementations support MPI_REPLACE
>
> # This is the commit message #2:
>
> SF: add an MPI_Op argument to SFBcast
>
> # This is the commit message #3:
>
> SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF
>
> # Please enter the commit message for your changes. Lines starting
> # with '#' will be ignored, and an empty message aborts the commit.
>
> 4) Edit the commit message as you want, save and exit, done!
>
> --Junchao Zhang
>
>
> On Tue, Mar 2, 2021 at 6:19 PM Blaise A Bourdin  wrote:
>
>> Hi,
>>
>> This is not technically a petsc question.
>> It would be great to have a short section in the PETSc integration
>> workflow document explaining how to squash commits in a MR for git-impaired
>> developers like me.
>>
>> Anybody wants to pitch in, or explain me how to do this?
>>
>> Regards,
>> Blaise
>>
>> --
>> A.K. & Shirley Barton Professor of  Mathematics
>> Adjunct Professor of Mechanical Engineering
>> Adjunct of the Center for Computation & Technology
>> Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803,
>> USA
>> Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276 Web
>> http://www.math.lsu.edu/~bourdin
>>
>>


Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Eric Chamberland

Hi Barry,

My ultimate goal is to compile a single PETSc version that is CUDA 
aware.  That means that I would like a single binary of our FE code 
(that will be distributed on Compute Canada clusters) so my users can do 
CPU or GPU computations.


However, if activating GPU implies activating OpenMP wich implies having 
many bugs with our code usages (with mumps or Hypre for example), it is 
a no-go for us right now...  or at least, I have to deliver a CUDA only 
version..



#1: I always have the feeling that something may go wrong with MKL 
linking.  I know about the link advisor, but...  What do you think this 
is related to?



#2: I can deal with that! :)


#3: I am not sure if this output is due to the way I configure OpenMPI/3.x:

  $ ./configure --prefix=/opt/openmpi-3.x_debug --enable-debug 
--enable-picky CXXFLAGS=-std=c++14 --with-wrapper-cxxflags=-std=c++14 
--with-cma


or this export:

export OMPI_MCA_plm_base_verbose=5

which I left there to track an intermittent bug at singleton startup 
(https://www.mail-archive.com/devel@lists.open-mpi.org/msg19568.html)..


I will remove this now, 4 years later, it does not happen anymore... But 
I don't think it should be harmful for PETSc tests, is it?



#4: I do have a default choice for L2 projections which uses HYPRE 
BoomerAMG preconditioning...that now ends with 
KSP_DIVERGED_INDEFINITE_PC, so this is definitely a problem...


#5: we do sometime use superlu-dist

#6: we do start looking at DD solvers like hpddm...

So my killer question is: in regard of to the amount of work to have all 
these external packages fixed, is it possible to activate OpenMP only 
for the CUDA part?


Thanks,

Eric

On 2021-03-02 3:47 p.m., Barry Smith wrote:


  Eric,

    Thanks for the detailed information.

    I have cc:ed Pierre so he can look at the HPDDM failures.


On Mar 2, 2021, at 2:14 PM, Eric Chamberland 
> wrote:


Hi,

It all started when I wanted to test PETSC/CUDA compatibility for our 
code.


I had to activate --with-openmp to configure with --with-cuda=1 
successfully.



Certain packages like SuperLU_DIST require --with-openmp  if using 
--with-cuda=1 but PETSc's own use of CUDA as well as some other 
packages do not require the --with-openmp.


I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and 
some other places).


So, I configured and tested petsc with openmp activated, without CUDA.

The first thing I see is that our code CI pipelines now fails for 
many tests.


After looking deeper, it seems that PETSc itself fails many tests 
when I activate openmp!


Here are all the configurations I have results for, after/before 
activating OpenMP for PETSc:


There seem to be several distinct issues

1) failures inside Scalapack.

2) possibly slightly different convergence rates for some examples 
changing the number of iterations slightly in PETSc.


3) trouble initializing something outside of PETSc, almost for sure 
not related to PETSc


[zorg:08517] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
#   [zorg:08517] plm:base:set_hnp_name: initial bias 8517 nodename hash 
810220270
#   [zorg:08517] plm:base:set_hnp_name: final jobfam 60385
#   [zorg:08517] [[60385,0],0] plm:rsh_setup on agent ssh : rsh path NULL
#   [zorg:08517] [[60385,0],0] plm:base:receive start comm
#   [zorg:08517] [[60385,0],0] plm:base:setup_job
#   [zorg:08517] [[60385,0],0] plm:base:setup_vm
4) problem with a hypre run Linear solve did not converge due to 
DIVERGED_INDEFINITE_PC iterations 3 , again not likely a PETSc issue 
but a hypre and OpenMP issue

5) Different results for initia inside an external package
#   1c1
#   <  MatInertia: nneg: 17, nzero: 0, npos: 83
#   ---
#   >  MatInertia: nneg: 21, nzero: 0, npos: 79
 TEST 
arch-linux-c-debug/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
  ok ksp_ksp_tests-ex33_superlu_dist_2
not ok diff-ksp_ksp_tests-ex33_superlu_dist_2 # Error code: 1
#   1c1
#   <  MatInertia: nneg: 17, nzero: 0, npos: 83
#   ---
#   >  MatInertia: nneg: 25, nzero: 0, npos: 75


6) problems with the external package hpddm

not ok snes_tutorials-ex12_quad_hpddm_reuse_baij # Error code: 139
# 0 SNES Function norm 21.3344
#   [0]PETSC ERROR: 

#   [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
#   [0]PETSC ERROR: Try option -start_in_debugger or 
-on_error_attach_debugger
#   [0]PETSC ERROR: or 
seehttps://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind  

#   [0]PETSC ERROR: or tryhttp://valgrind.org    on 
GNU/linux and Apple Mac OS X to find memory corruption errors
#   [0]PETSC ERROR: likely location of problem given in stack below
#   [0]PETSC ERROR: 

Re: [petsc-dev] Commit squashing in MR

2021-03-02 Thread Junchao Zhang
I am a naive git user, so I use interactive git rebase.  Suppose I am on
the branch I want to modify,

1) Use git graph to locate an upstream commit to be used as the base
$ git graph
* 0d5433e9 (HEAD -> jczhang/sf-change-api) SF: rename SFCreateEmbeddedSF to
SFCreateEmbeddedRootSF
* e7314fbb SF: add an MPI_Op argument to SFBcast
* 83df288d Replace MPIU_REPLACE with MPI_REPLACE
*   b434c516 Merge branch 'barry/2021-02-02/petscsf-communication-specific'
into 'main'
|\
| * 62152ded (barry/2021-02-02/petscsf-communication-specific)
PetscSFView() never called viewer for the specific type (bug), hence many
output files were incorrect.
* |   a4f5d9b4 Merge branch 'jose/upgrade-magma' into 'main'

2) Suppose we choose b434c516 as the base. All commits we want to squash
are after it.  Do interactive git rebase. It shows a screen for you to
edit.  Read the help, which is helpful for new users
  $ git rebase -i b434c516
pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
pick e7314fbb SF: add an MPI_Op argument to SFBcast
pick 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF

# Rebase b434c516..0d5433e9 onto b434c516 (3 commands)
#
# Commands:
# p, pick  = use commit
# r, reword  = use commit, but edit the commit message
# e, edit  = use commit, but stop for amending
# s, squash  = use commit, but meld into previous commit
# f, fixup  = like "squash", but discard this commit's log message
# x, exec  = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop  = remove commit
# l, label  = label current HEAD with a name
# t, reset  = reset HEAD to a label
# m, merge [-C  | -c ]  [# ]
# .   create a merge commit using the original merge commit's
# .   message (or the oneline, if no original merge commit was
# .   specified). Use -c  to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

3) Suppose we want to squash the last two commits to 83df288d, replace
their pick with s (or f, see the help for difference), save and exit the
screen
pick 83df288d Replace MPIU_REPLACE with MPI_REPLACE
s e7314fbb SF: add an MPI_Op argument to SFBcast
s 0d5433e9 SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF

A new screen shows up

# This is a combination of 3 commits.
# This is the 1st commit message:

Replace MPIU_REPLACE with MPI_REPLACE

Since we believe all MPI implementations support MPI_REPLACE

# This is the commit message #2:

SF: add an MPI_Op argument to SFBcast

# This is the commit message #3:

SF: rename SFCreateEmbeddedSF to SFCreateEmbeddedRootSF

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.

4) Edit the commit message as you want, save and exit, done!

--Junchao Zhang


On Tue, Mar 2, 2021 at 6:19 PM Blaise A Bourdin  wrote:

> Hi,
>
> This is not technically a petsc question.
> It would be great to have a short section in the PETSc integration
> workflow document explaining how to squash commits in a MR for git-impaired
> developers like me.
>
> Anybody wants to pitch in, or explain me how to do this?
>
> Regards,
> Blaise
>
> --
> A.K. & Shirley Barton Professor of  Mathematics
> Adjunct Professor of Mechanical Engineering
> Adjunct of the Center for Computation & Technology
> Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803,
> USA
> Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276 Web
> http://www.math.lsu.edu/~bourdin
>
>


[petsc-dev] Commit squashing in MR

2021-03-02 Thread Blaise A Bourdin
Hi,

This is not technically a petsc question. 
It would be great to have a short section in the PETSc integration workflow 
document explaining how to squash commits in a MR for git-impaired developers 
like me.

Anybody wants to pitch in, or explain me how to do this?

Regards,
Blaise
 
-- 
A.K. & Shirley Barton Professor of  Mathematics
Adjunct Professor of Mechanical Engineering
Adjunct of the Center for Computation & Technology
Louisiana State University, Lockett Hall Room 344, Baton Rouge, LA 70803, USA
Tel. +1 (225) 578 1612, Fax  +1 (225) 578 4276 Web 
http://www.math.lsu.edu/~bourdin



Re: [petsc-dev] problem registering a new solver

2021-03-02 Thread Matthew Knepley
Maybe there is a bug in the Register() code.

   Matt

On Tue, Mar 2, 2021 at 4:43 PM Mark Adams  wrote:

> I see the problem, but not the solution. I put a print statement
> in MatSolverTypeDestroy and see:
>
> MatSolverTypeDestroy seqaij inext->next=0x7fb4a9114c70 inext=0x7fb4a9113c70
>
> before it fails. This seqaij node seems to be pointing to itself. If I
> remove my registration call and it works and I see this from my print
> statement:
>
> MatSolverTypeDestroy seqaij inext->next=0x7fe811827270 inext=0x7fe811826270
> MatSolverTypeDestroy seqaijperm inext->next=0x7fe811909c70
> inext=0x7fe811827270
> MatSolverTypeDestroy constantdiagonal inext->next=0x7fe81190ac70
> inext=0x7fe811909c70
>  
>
>
> On Tue, Mar 2, 2021 at 3:42 PM Mark Adams  wrote:
>
>> I am trying to add a band solver to PETSc (later to be moved to Cuda and
>> Kokkos) and I have started by adding some types and a copy of the current
>> LU as placeholders. I register with:
>>
>>  ierr = MatSolverTypeRegister(MATSOLVERPETSC, MATSEQAIJ,
>>  MAT_FACTOR_LUBAND,MatGetFactor_seqaij_petsc);CHKERRQ(ierr);
>>
>> And that is about all that I do that can have any effect at this point. I
>> add a switch in  MatGetFactor_seqaij_petsc on (ftype == MAT_FACTOR_LUBAND)
>> to set the symbolic factorization method (*B)->ops->lufactorsymbolic  =
>> MatLUBandFactorSymbolic_SeqAIJ; But this is not called (I verified this)
>> because I don't know how to get MatGetFactor_seqaij_petsc to receive ftype
>> == MAT_FACTOR_LUBAND. (I need help on this to)
>>
>> Anyway, I would hope that these changes would not do anything but I get
>> an error (appended).
>>
>> It is failing in MatSolverTypeDestroy on this second PetscFree:
>>
>> while (inext) {
>>   ierr = PetscFree(inext->mtype);CHKERRQ(ierr);
>>   iprev = inext;
>>   inext = inext->next;
>>   ierr = PetscFree(iprev);CHKERRQ(ierr);
>> }
>>
>> I tried to clone LU here, but I clearly missed something.
>>
>> Any ideas?
>>
>> And I just made  an MR for this if you want to look at the code.
>>
>> Thanks,
>> Mark
>> ...
>> Number of SNES iterations = 2
>> [0]PETSC ERROR: PetscTrFreeDefault() called from MatSolverTypeDestroy()
>> line 4513 in /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
>> [0]PETSC ERROR: Block [id=0(48)] at address 0x7fb7fb8f7620 is corrupted
>> (probably write past end of array)
>> [0]PETSC ERROR: Block allocated in MatSolverTypeRegister() line 4382 in
>> /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
>> [0]PETSC ERROR: - Error Message
>> --
>> [0]PETSC ERROR: Memory corruption:
>> https://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind
>> [0]PETSC ERROR: Corrupted memory
>> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Development GIT revision: v3.14.4-739-g575f931ef8
>>  GIT Date: 2021-03-02 13:38:55 -0500
>> [0]PETSC ERROR: ./ex19 on a arch-macosx-gnu-g named MarksMac-302.local by
>> markadams Tue Mar  2 15:28:41 2021
>> [0]PETSC ERROR: Configure options
>> --with-mpi-dir=/usr/local/Cellar/mpich/3.3.2_1 COPTFLAGS="-g -O0"
>> CXXOPTFLAGS="-g -O0" --download-metis=1 --download-parmetis=1
>> --download-kokkos=1 --download-kokkos-kernels=1 --download-p4est=1
>> --with-zlib=1 --download-superlu_dist --download-superlu --with-make-np=4
>> --download-hdf5=1 -with-cuda=0 --with-x=0 --with-debugging=1
>> PETSC_ARCH=arch-macosx-gnu-g --with-64-bit-indices=0 --with-openmp=0
>> --with-ctable=0
>> [0]PETSC ERROR: #1 PetscTrFreeDefault() line 310 in
>> /Users/markadams/Codes/petsc/src/sys/memory/mtr.c
>> [0]PETSC ERROR: #2 MatSolverTypeDestroy() line 4513 in
>> /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
>> [0]PETSC ERROR: #3 MatFinalizePackage() line 57 in
>> /Users/markadams/Codes/petsc/src/mat/interface/dlregismat.c
>> [0]PETSC ERROR: #4 PetscRegisterFinalizeAll() line 389 in
>> /Users/markadams/Codes/petsc/src/sys/objects/destroy.c
>> [0]PETSC ERROR: #5 PetscFinalize() line 1474 in
>> /Users/markadams/Codes/petsc/src/sys/objects/pinit.c
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-dev] problem registering a new solver

2021-03-02 Thread Mark Adams
I see the problem, but not the solution. I put a print statement
in MatSolverTypeDestroy and see:

MatSolverTypeDestroy seqaij inext->next=0x7fb4a9114c70 inext=0x7fb4a9113c70

before it fails. This seqaij node seems to be pointing to itself. If I
remove my registration call and it works and I see this from my print
statement:

MatSolverTypeDestroy seqaij inext->next=0x7fe811827270 inext=0x7fe811826270
MatSolverTypeDestroy seqaijperm inext->next=0x7fe811909c70
inext=0x7fe811827270
MatSolverTypeDestroy constantdiagonal inext->next=0x7fe81190ac70
inext=0x7fe811909c70
 


On Tue, Mar 2, 2021 at 3:42 PM Mark Adams  wrote:

> I am trying to add a band solver to PETSc (later to be moved to Cuda and
> Kokkos) and I have started by adding some types and a copy of the current
> LU as placeholders. I register with:
>
>  ierr = MatSolverTypeRegister(MATSOLVERPETSC, MATSEQAIJ,
>  MAT_FACTOR_LUBAND,MatGetFactor_seqaij_petsc);CHKERRQ(ierr);
>
> And that is about all that I do that can have any effect at this point. I
> add a switch in  MatGetFactor_seqaij_petsc on (ftype == MAT_FACTOR_LUBAND)
> to set the symbolic factorization method (*B)->ops->lufactorsymbolic  =
> MatLUBandFactorSymbolic_SeqAIJ; But this is not called (I verified this)
> because I don't know how to get MatGetFactor_seqaij_petsc to receive ftype
> == MAT_FACTOR_LUBAND. (I need help on this to)
>
> Anyway, I would hope that these changes would not do anything but I get an
> error (appended).
>
> It is failing in MatSolverTypeDestroy on this second PetscFree:
>
> while (inext) {
>   ierr = PetscFree(inext->mtype);CHKERRQ(ierr);
>   iprev = inext;
>   inext = inext->next;
>   ierr = PetscFree(iprev);CHKERRQ(ierr);
> }
>
> I tried to clone LU here, but I clearly missed something.
>
> Any ideas?
>
> And I just made  an MR for this if you want to look at the code.
>
> Thanks,
> Mark
> ...
> Number of SNES iterations = 2
> [0]PETSC ERROR: PetscTrFreeDefault() called from MatSolverTypeDestroy()
> line 4513 in /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
> [0]PETSC ERROR: Block [id=0(48)] at address 0x7fb7fb8f7620 is corrupted
> (probably write past end of array)
> [0]PETSC ERROR: Block allocated in MatSolverTypeRegister() line 4382 in
> /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Memory corruption:
> https://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind
> [0]PETSC ERROR: Corrupted memory
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.14.4-739-g575f931ef8
>  GIT Date: 2021-03-02 13:38:55 -0500
> [0]PETSC ERROR: ./ex19 on a arch-macosx-gnu-g named MarksMac-302.local by
> markadams Tue Mar  2 15:28:41 2021
> [0]PETSC ERROR: Configure options
> --with-mpi-dir=/usr/local/Cellar/mpich/3.3.2_1 COPTFLAGS="-g -O0"
> CXXOPTFLAGS="-g -O0" --download-metis=1 --download-parmetis=1
> --download-kokkos=1 --download-kokkos-kernels=1 --download-p4est=1
> --with-zlib=1 --download-superlu_dist --download-superlu --with-make-np=4
> --download-hdf5=1 -with-cuda=0 --with-x=0 --with-debugging=1
> PETSC_ARCH=arch-macosx-gnu-g --with-64-bit-indices=0 --with-openmp=0
> --with-ctable=0
> [0]PETSC ERROR: #1 PetscTrFreeDefault() line 310 in
> /Users/markadams/Codes/petsc/src/sys/memory/mtr.c
> [0]PETSC ERROR: #2 MatSolverTypeDestroy() line 4513 in
> /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
> [0]PETSC ERROR: #3 MatFinalizePackage() line 57 in
> /Users/markadams/Codes/petsc/src/mat/interface/dlregismat.c
> [0]PETSC ERROR: #4 PetscRegisterFinalizeAll() line 389 in
> /Users/markadams/Codes/petsc/src/sys/objects/destroy.c
> [0]PETSC ERROR: #5 PetscFinalize() line 1474 in
> /Users/markadams/Codes/petsc/src/sys/objects/pinit.c
>


Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Pierre Jolivet
Hello Eric,
src/mat/tests/ex237.c is a recent test with some code paths that should be 
disabled for “old” MKL versions. It’s tricky to check directly in the source 
(we do check in BuildSystem) because there is no such thing as 
PETSC_PKG_MKL_VERSION_LT, but I guess we can change if defined(PETSC_HAVE_MKL) 
to if defined(PETSC_HAVE_MKL) && defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll 
make a MR, thanks for reporting this.

For the other issues, I’m sensing this is a problem with gomp + 
intel_gnu_thread, but this is pure speculation… sorry.
I’ll try to reproduce some of these problems if you are not given a more 
meaningful answer.

Thanks,
Pierre

> On 2 Mar 2021, at 9:14 PM, Eric Chamberland 
>  wrote:
> 
> Hi,
> 
> It all started when I wanted to test PETSC/CUDA compatibility for our code.
> 
> I had to activate --with-openmp to configure with --with-cuda=1 successfully.
> 
> I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and some other 
> places).
> 
> So, I configured and tested petsc with openmp activated, without CUDA.
> 
> The first thing I see is that our code CI pipelines now fails for many tests.
> 
> After looking deeper, it seems that PETSc itself fails many tests when I 
> activate openmp!
> 
> Here are all the configurations I have results for, after/before activating 
> OpenMP for PETSc:
> ==
> 
> ==
> 
> For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:
> 
> With OpenMP=1
> 
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log
>  
> 
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log
>  
> 
> # -
> #   Summary
> # -
> # FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij 
> diff-ksp_ksp_tests-ex33_superlu_dist_2 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 
> diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 
> ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist 
> diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas 
> snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij 
> ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist 
> snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 
> mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap 
> ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist 
> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
> diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref 
> ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 
> ksp_ksp_tutorials-ex5_superlu_dist_2 
> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
> ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist 
> ksp_ksp_tutorials-ex5f_superlu_dist_2
> # success 8275/10003 tests (82.7%)
> # failed 33/10003 tests (0.3%)
> With OpenMP=0
> 
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log
>  
> 
> https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log
>  
> 
> # -
> #   Summary
> # -
> # FAILED tao_constrained_tutorials-tomographyADMM_6 
> snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 
> snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 
> tao_constrained_tutorials-tomographyADMM_5
> # success 8262/9983 tests (82.8%)
> # failed 6/9983 tests (0.1%)
> ==
> 
> ==
> 
> For OpenMPI 3.1.x/master:
> 
> With OpenMP=1:
> 
> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log 
> 
> https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log 
> 
> # -
> #   Summary
> # -
> # FAILED mat_tests-ex242_3 mat_tests-ex242_2 

Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Barry Smith

  Eric,

Thanks for the detailed information.   

I have cc:ed Pierre so he can look at the HPDDM failures. 


> On Mar 2, 2021, at 2:14 PM, Eric Chamberland 
>  wrote:
> 
> Hi,
> 
> It all started when I wanted to test PETSC/CUDA compatibility for our code.
> 
> I had to activate --with-openmp to configure with --with-cuda=1 successfully.
> 
> 
Certain packages like SuperLU_DIST require --with-openmp  if using 
--with-cuda=1 but PETSc's own use of CUDA as well as some other packages do not 
require the --with-openmp. 

> I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and some other 
> places).
> 
> So, I configured and tested petsc with openmp activated, without CUDA.
> 
> The first thing I see is that our code CI pipelines now fails for many tests.
> 
> After looking deeper, it seems that PETSc itself fails many tests when I 
> activate openmp!
> 
> Here are all the configurations I have results for, after/before activating 
> OpenMP for PETSc:

There seem to be several distinct issues

1) failures inside Scalapack.  

2) possibly slightly different convergence rates for some examples changing the 
number of iterations slightly in PETSc.

3) trouble initializing something outside of PETSc, almost for sure not related 
to PETSc 

[zorg:08517] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh path NULL
#   [zorg:08517] plm:base:set_hnp_name: initial bias 8517 nodename hash 
810220270
#   [zorg:08517] plm:base:set_hnp_name: final jobfam 60385
#   [zorg:08517] [[60385,0],0] plm:rsh_setup on agent ssh : rsh path NULL
#   [zorg:08517] [[60385,0],0] plm:base:receive start comm
#   [zorg:08517] [[60385,0],0] plm:base:setup_job
#   [zorg:08517] [[60385,0],0] plm:base:setup_vm
4) problem with a hypre run Linear solve did not converge due to 
DIVERGED_INDEFINITE_PC iterations 3 , again not likely a PETSc issue but a 
hypre and OpenMP issue

5) Different results for initia inside an external package 

#   1c1
#   <  MatInertia: nneg: 17, nzero: 0, npos: 83
#   ---
#   >  MatInertia: nneg: 21, nzero: 0, npos: 79
TEST 
arch-linux-c-debug/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
 ok ksp_ksp_tests-ex33_superlu_dist_2
not ok diff-ksp_ksp_tests-ex33_superlu_dist_2 # Error code: 1
#   1c1
#   <  MatInertia: nneg: 17, nzero: 0, npos: 83
#   ---
#   >  MatInertia: nneg: 25, nzero: 0, npos: 75


6) problems with the external package hpddm 

not ok snes_tutorials-ex12_quad_hpddm_reuse_baij # Error code: 139
# 0 SNES Function norm 21.3344 
#   [0]PETSC ERROR: 

#   [0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, 
probably memory access out of range
#   [0]PETSC ERROR: Try option -start_in_debugger or 
-on_error_attach_debugger
#   [0]PETSC ERROR: or see 
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
#   [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac 
OS X to find memory corruption errors
#   [0]PETSC ERROR: likely location of problem given in stack below
#   [0]PETSC ERROR: -  Stack Frames 

#   [0]PETSC ERROR: Note: The EXACT line numbers in the stack are not 
available,
#   [0]PETSC ERROR:   INSTEAD the line number of the start of the 
function
#   [0]PETSC ERROR:   is given.
#   [0]PETSC ERROR: [0] constructionMatrix line 313 
/opt/petsc-main_debug/include/HPDDM_coarse_operator_impl.hpp
#   [0]PETSC ERROR: [0] construction line 256 
/opt/petsc-main_debug/include/HPDDM_coarse_operator_impl.hpp
#   [0]PETSC ERROR: [0] buildTwo line 987 
/opt/petsc-main_debug/include/HPDDM_schwarz.hpp
#   [0]PETSC ERROR: [0] next line 1130 
/opt/petsc-main_debug/include/HPDDM_schwarz.hpp
#   [0]PETSC ERROR: [0] PCSetUp_HPDDM line 746 
/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/pc/impls/hpddm/hpddm.cxx
#   [0]PETSC ERROR: [0] PCSetUp line 974 
/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/pc/interface/precon.c
#   [0]PETSC ERROR: [0] KSPSetUp line 319 
/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/ksp/interface/itfunc.c
#   [0]PETSC ERROR: [0] KSPSolve_Private line 808 
/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/ksp/interface/itfunc.c
#   [0]PETSC ERROR: [0] KSPSolve line 1080 
/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/ksp/ksp/interface/itfunc.c
#   [0]PETSC ERROR: [0] SNESSolve_NEWTONLS line 144 
/pmi/cmpbib/compilation_BIB_gcc_redhat_petsc-master_debug/COMPILE_AUTO/petsc-main-debug/src/snes/impls/ls/ls.c
#   [0]PETSC ERROR: [0] SNESSolve line 4533 

[petsc-dev] problem registering a new solver

2021-03-02 Thread Mark Adams
I am trying to add a band solver to PETSc (later to be moved to Cuda and
Kokkos) and I have started by adding some types and a copy of the current
LU as placeholders. I register with:

 ierr = MatSolverTypeRegister(MATSOLVERPETSC, MATSEQAIJ,
 MAT_FACTOR_LUBAND,MatGetFactor_seqaij_petsc);CHKERRQ(ierr);

And that is about all that I do that can have any effect at this point. I
add a switch in  MatGetFactor_seqaij_petsc on (ftype == MAT_FACTOR_LUBAND)
to set the symbolic factorization method (*B)->ops->lufactorsymbolic  =
MatLUBandFactorSymbolic_SeqAIJ; But this is not called (I verified this)
because I don't know how to get MatGetFactor_seqaij_petsc to receive ftype
== MAT_FACTOR_LUBAND. (I need help on this to)

Anyway, I would hope that these changes would not do anything but I get an
error (appended).

It is failing in MatSolverTypeDestroy on this second PetscFree:

while (inext) {
  ierr = PetscFree(inext->mtype);CHKERRQ(ierr);
  iprev = inext;
  inext = inext->next;
  ierr = PetscFree(iprev);CHKERRQ(ierr);
}

I tried to clone LU here, but I clearly missed something.

Any ideas?

And I just made  an MR for this if you want to look at the code.

Thanks,
Mark
...
Number of SNES iterations = 2
[0]PETSC ERROR: PetscTrFreeDefault() called from MatSolverTypeDestroy()
line 4513 in /Users/markadams/Codes/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: Block [id=0(48)] at address 0x7fb7fb8f7620 is corrupted
(probably write past end of array)
[0]PETSC ERROR: Block allocated in MatSolverTypeRegister() line 4382 in
/Users/markadams/Codes/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: - Error Message
--
[0]PETSC ERROR: Memory corruption:
https://www.mcs.anl.gov/petsc/documentation/installation.html#valgrind
[0]PETSC ERROR: Corrupted memory
[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.14.4-739-g575f931ef8
 GIT Date: 2021-03-02 13:38:55 -0500
[0]PETSC ERROR: ./ex19 on a arch-macosx-gnu-g named MarksMac-302.local by
markadams Tue Mar  2 15:28:41 2021
[0]PETSC ERROR: Configure options
--with-mpi-dir=/usr/local/Cellar/mpich/3.3.2_1 COPTFLAGS="-g -O0"
CXXOPTFLAGS="-g -O0" --download-metis=1 --download-parmetis=1
--download-kokkos=1 --download-kokkos-kernels=1 --download-p4est=1
--with-zlib=1 --download-superlu_dist --download-superlu --with-make-np=4
--download-hdf5=1 -with-cuda=0 --with-x=0 --with-debugging=1
PETSC_ARCH=arch-macosx-gnu-g --with-64-bit-indices=0 --with-openmp=0
--with-ctable=0
[0]PETSC ERROR: #1 PetscTrFreeDefault() line 310 in
/Users/markadams/Codes/petsc/src/sys/memory/mtr.c
[0]PETSC ERROR: #2 MatSolverTypeDestroy() line 4513 in
/Users/markadams/Codes/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: #3 MatFinalizePackage() line 57 in
/Users/markadams/Codes/petsc/src/mat/interface/dlregismat.c
[0]PETSC ERROR: #4 PetscRegisterFinalizeAll() line 389 in
/Users/markadams/Codes/petsc/src/sys/objects/destroy.c
[0]PETSC ERROR: #5 PetscFinalize() line 1474 in
/Users/markadams/Codes/petsc/src/sys/objects/pinit.c


[petsc-dev] Petsc "make test" have more failures for --with-openmp=1

2021-03-02 Thread Eric Chamberland

Hi,

It all started when I wanted to test PETSC/CUDA compatibility for our code.

I had to activate --with-openmp to configure with --with-cuda=1 
successfully.


I then saw that PETSC_HAVE_OPENMP  is used at least in MUMPS (and some 
other places).


So, I configured and tested petsc with openmp activated, without CUDA.

The first thing I see is that our code CI pipelines now fails for many 
tests.


After looking deeper, it seems that PETSc itself fails many tests when I 
activate openmp!


Here are all the configurations I have results for, after/before 
activating OpenMP for PETSc:


==

==

For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:

With OpenMP=1

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log

# -
#   Summary
# -
# FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij 
diff-ksp_ksp_tests-ex33_superlu_dist_2 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 
ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist 
diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas 
snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij 
ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist 
snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3 
mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap 
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref 
ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2 
ksp_ksp_tutorials-ex5_superlu_dist_2 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist 
ksp_ksp_tutorials-ex5f_superlu_dist_2
# success 8275/10003 tests (82.7%)
#*failed 33/10003*  tests (0.3%)

With OpenMP=0

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log

https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log

# -
#   Summary
# -
# FAILED tao_constrained_tutorials-tomographyADMM_6 
snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3 
snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 
tao_constrained_tutorials-tomographyADMM_5
# success 8262/9983 tests (82.8%)
#*failed 6/9983*  tests (0.1%)

==

==

For OpenMPI 3.1.x/master:

With OpenMP=1:

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log

# -
#   Summary
# -
# FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 
diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3 
diff-ksp_ksp_tutorials-ex49_hypre_nullspace 
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre 
diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre 
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre 
tao_leastsquares_tutorials-tomography_1 
tao_constrained_tutorials-tomographyADMM_4 
tao_constrained_tutorials-tomographyADMM_6 diff-tao_constrained_tutorials-toyf_1
# success 8142/9765 tests (83.4%)
#*failed 16/9765*  tests (0.2%)

With OpenMP=0:

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log

https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log

# -
#   Summary
# -
# FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1 
diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex56_2 
snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1 
tao_constrained_tutorials-tomographyADMM_4 diff-tao_constrained_tutorials-toyf_1
# success 8151/9767 tests (83.5%)
#*failed 9/9767*  tests (0.1%)

==

==

For OpenMPI 4.0.x/master:

With OpenMP=1: