Thanks, I did not intend to make any (real) changes. The only thing that I did not intend to use from Barry's branch, that conflicted, was the help and comment block at the top of ex5cu.cu
* I ended up with two declarations of PetscSplitCSRDataStructure * I added some includes to fix errors like this: /ccs/home/adams/petsc/include/../src/mat/impls/aij/seq/seqcusparse/cusparsematimpl.h(263): error: incomplete type is not allowed * I end ended not having csr2csc_i in Mat_SeqAIJCUSPARSE so I get: /autofs/nccs-svm1_home1/adams/petsc/src/mat/impls/aij/seq/seqcusparse/ aijcusparse.cu(1348): error: class "Mat_SeqAIJCUSPARSE" has no member "csr2csc_i" On Fri, May 28, 2021 at 3:13 PM Stefano Zampini <[email protected]> wrote: > I can take a quick look at it tomorrow, what are the main changes you made > since then? > > On May 28, 2021, at 9:51 PM, Mark Adams <[email protected]> wrote: > > I am getting messed up in trying to resolve conflicts in rebasing over > main. > Is there a better way of doing this? > Can I just tell git to use Barry's version and then test it? > Or should I just try it again? > > On Fri, May 28, 2021 at 2:15 PM Mark Adams <[email protected]> wrote: > >> I am rebasing over main and its a bit of a mess. I must have missed >> something. I get this. I think the _n_SplitCSRMat must be wrong. >> >> >> In file included from >> /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: >> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting >> types for 'PetscSplitCSRDataStructure' >> typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; >> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous >> declaration of 'PetscSplitCSRDataStructure' was here >> typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; >> ^~~~~~~~~~~~~~~~~~~~~~~~~~ >> CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o >> >> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini < >> [email protected]> wrote: >> >>> OpenMPI.py depends on cuda.py in that, if cuda is present, configures >>> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only >>> weakly, it adds a print if cuda is present) >>> Since eventually the MPI distro will only need a hint to be configured >>> with CUDA, why not removing the dependency at all and add only a flag >>> —download-openmpi-use-cuda? >>> >>> On May 28, 2021, at 8:44 PM, Barry Smith <[email protected]> wrote: >>> >>> >>> Stefano, who has a far better memory than me, wrote >>> >>> > Or probably remove —download-openmpi ? Or, just for the moment, why >>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so >>> that it will be forced to be configured later? >>> >>> MPI.py depends on cuda.py so we cannot also have cuda.py depend on >>> MPI.py using the generic dependencies of configure/packages >>> >>> but perhaps we can just hardwire the rerunning of cuda.py when the MPI >>> compilers are reset. I will try that now and if I can get it to work we >>> should be able to move those old fix branches along as MR. >>> >>> Barry >>> >>> >>> >>> On May 28, 2021, at 12:41 PM, Mark Adams <[email protected]> wrote: >>> >>> OK, I will try to rebase and test Barry's branch. >>> >>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini < >>> [email protected]> wrote: >>> >>>> Yes, it is the branch I was using before force pushing to >>>> Barry’s barry/2020-11-11/cleanup-matsetvaluesdevice >>>> You can use both I guess >>>> >>>> On May 28, 2021, at 8:25 PM, Mark Adams <[email protected]> wrote: >>>> >>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>> >>>> >>>> stefanozampini/simplify-setvalues-device >>>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device> >>>> >>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <[email protected]> wrote: >>>> >>>>> I am fixing rebasing this branch over main. >>>>> >>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >>>>> [email protected]> wrote: >>>>> >>>>>> Or probably remove —download-openmpi ? Or, just for the moment, why >>>>>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so >>>>>> that it will be forced to be configured later? >>>>>> >>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini < >>>>>> [email protected]> wrote: >>>>>> >>>>>> That branch provides a fix for MatSetValuesDevice but it never got >>>>>> merged because of the CI issues with the —download-openmpi. We can >>>>>> probably >>>>>> try to skip the test in that specific configuration? >>>>>> >>>>>> On May 28, 2021, at 7:45 PM, Barry Smith <[email protected]> wrote: >>>>>> >>>>>> >>>>>> ~/petsc/src/mat/tutorials* >>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>>>>> arch-robustify-cuda-gencodearch-check >>>>>> $ ./ex5cu >>>>>> terminate called after throwing an instance of >>>>>> 'thrust::system::system_error' >>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>>>>> illegal memory access was encountered >>>>>> Aborted (core dumped) >>>>>> >>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>> >>>>>> CI does not test with CUDA and no ctable. The code is still broken >>>>>> as it was six months ago in the discussion Stefano pointed to. It is >>>>>> clear >>>>>> why just no one has had the time to clean things up. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On May 28, 2021, at 11:13 AM, Mark Adams <[email protected]> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> If you are referring to your device set values, I guess it is not >>>>>>> currently tested >>>>>>> >>>>>> >>>>>> No. There is a test for that (ex5cu). >>>>>> I have a user that is getting a segv in MatSetValues with >>>>>> aijcusparse. I suspect there is memory corruption but I'm trying to cover >>>>>> all the bases. >>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for >>>>>> it if such a test does not exist. >>>>>> >>>>>> >>>>>>> See the discussions here >>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>> I started cleaning up the code to prepare for testing but we never >>>>>>> finished it >>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>> >>>>>>> >>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <[email protected]> wrote: >>>>>>> >>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >>> >
