Re: [petsc-dev] error with karlrupp/fix-cuda-streams

2019-09-27 Thread Karl Rupp via petsc-dev

Hi Mark,

OK, so now the problem has shifted somewhat in that it now manifests 
itself on small cases. In earlier investigation I was drawn to 
MatTranspose but had a hard time pinning it down. The bug seems more 
stable now or you probably fixed what looks like all the other bugs.


I added print statements with norms of vectors in mg.c (v-cycle) and 
found that the diffs between the CPU and GPU runs came in MatRestrict, 
which calls MatMultTranspose. I added identical print statements in the 
two versions of MatMultTranspose and see this. (pinning to the CPU does 
not seem to make any difference). Note that the problem comes in the 2nd 
iteration where the *output* vector is non-zero coming in (this should 
not matter).


Karl, I zeroed out the output vector (yy) when I come into this method 
and it fixed the problem. This is with -n 4, and this always works with 
-n 3. See the attached process layouts. It looks like this comes when 
you use the 2nd socket.


So this looks like an Nvidia bug. Let me know what you think and I can 
pass it on to ORNL.


Hmm, there were some issues with MatMultTranspose_MPIAIJ at some point. 
I've addressed some of them, but I can't confidently say that all of the 
issues were fixed. Thus, I don't think it's a problem in NVIDIA's 
cuSparse, but rather something we need to fix in PETSc. Note that the 
problem shows up with multiple MPI ranks; if it were a problem in 
cuSparse, it would show up on a single rank as well.


Best regards,
Karli





06:49  /gpfs/alpine/geo127/scratch/adams$ jsrun*-n 4 *-a 4 -c 4 -g 1 
./ex56 -cells 8,12,16 *-ex56_dm_vec_type cuda -ex56_dm_mat_type aijcusparse*

[0] 3465 global equations, 1155 vertices
[0] 3465 equations in vector, 1155 vertices
   0 SNES Function norm 1.725526579328e+01
     0 KSP Residual norm 1.725526579328e+01
         2) call Restrict with |r| = 1.402719214830704e+01
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
1.40271921483070e+01
*                        MatMultTranspose_MPIAIJ |y in| = 
0.00e+00
*                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
3.43436359545813e+00
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
1.29055494844681e+01

                 3) |R| = 1.290554948446808e+01
         2) call Restrict with |r| = 4.109771717986951e+00
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
4.10977171798695e+00
*                        MatMultTranspose_MPIAIJ |y in| = 
0.00e+00
*                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
1.79415048609144e-01
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
9.01083013948788e-01

                 3) |R| = 9.010830139487883e-01
                 4) |X| = 2.864698671963022e+02
                 5) |x| = 9.76328911783e+02
                 6) post smooth |x| = 8.940011621494751e+02
                 4) |X| = 8.940011621494751e+02
                 5) |x| = 1.005081556495388e+03
                 6) post smooth |x| = 1.029043994031627e+03
     1 KSP Residual norm 8.102614049404e+00
         2) call Restrict with |r| = 4.402603749876137e+00
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
4.40260374987614e+00
*                        MatMultTranspose_MPIAIJ |y in| = 
1.29055494844681e+01
*                        MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
1.68544559626318e+00
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
1.82129824300863e+00

                 3) |R| = 1.821298243008628e+00
         2) call Restrict with |r| = 1.068309793900564e+00
                         MatMultTranspose_MPIAIJCUSPARSE |x in| = 
1.06830979390056e+00
                         MatMultTranspose_MPIAIJ |y in| = 
9.01083013948788e-01
                         MatMultTranspose_MPIAIJCUSPARSE |a->lvec| = 
0.00e+00
                         *** MatMultTranspose_MPIAIJCUSPARSE |yy| = 
1.40519177065298e-01
                         MatMultTranspose_MPIAIJCUSPARSE final |yy| = 
1.01853904152812e-01

                 3) |R| = 1.018539041528117e-01
                 4) |X| = 4.949616392884510e+01
                 5) |x| = 9.309440014159884e+01
                 6) post smooth |x| = 5.432486021529479e+01
                 4) |X| = 5.432486021529479e+01
                 5) |x| = 8.246142532204632e+01
                 6) post smooth |x| = 7.605703654091440e+01
   Linear solve did not converge due to DIVERGED_ITS iterations 1
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
06:50  /gpfs/alpine/geo127/scratch/adams$ jsrun -n 4 -a 4 -c 4 -g 1 
./ex56 -cells 8,12,16

[0] 3465 global equations, 1155 vertices
[0] 3465 equations in vector, 1155 ver

Re: [petsc-dev] getting eigen estimates from GAMG to CHEBY

2019-09-27 Thread Mark Adams via petsc-dev
As I recall we attached the eigenstates to the matrix. Is that old attach
mechanism still the used/recommended? Or is there a better way to do this
now?
Thanks,
Mark

On Thu, Sep 26, 2019 at 7:45 AM Mark Adams  wrote:

>
>
>> Okay, it seems like they should be stored in GAMG.
>>
>
> Before we stored them in the matrix. When you get to the test in Cheby you
> don't have caller anymore (GAMG).
>
>
>> Why would the PC type change anything?
>>
>
> Oh, the eigenvalues are the preconditioned ones, the PC (Jacobi) matters
> but it is not too sensitive to normal PCs that you would use in a smoother
> and it is probably not an understatement.
>
>
>>
>>   Thanks,
>>
>> Matt
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>