I can take a quick look at it tomorrow, what are the main changes you made since then?
> On May 28, 2021, at 9:51 PM, Mark Adams <[email protected]> wrote: > > I am getting messed up in trying to resolve conflicts in rebasing over main. > Is there a better way of doing this? > Can I just tell git to use Barry's version and then test it? > Or should I just try it again? > > On Fri, May 28, 2021 at 2:15 PM Mark Adams <[email protected] > <mailto:[email protected]>> wrote: > I am rebasing over main and its a bit of a mess. I must have missed > something. I get this. I think the _n_SplitCSRMat must be wrong. > > > In file included from > /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: > /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types > for 'PetscSplitCSRDataStructure' > typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration > of 'PetscSplitCSRDataStructure' was here > typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o > > On Fri, May 28, 2021 at 1:50 PM Stefano Zampini <[email protected] > <mailto:[email protected]>> wrote: > OpenMPI.py depends on cuda.py in that, if cuda is present, configures using > cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it > adds a print if cuda is present) > Since eventually the MPI distro will only need a hint to be configured with > CUDA, why not removing the dependency at all and add only a flag > —download-openmpi-use-cuda? > >> On May 28, 2021, at 8:44 PM, Barry Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> >> Stefano, who has a far better memory than me, wrote >> >> > Or probably remove —download-openmpi ? Or, just for the moment, why can’t >> > we just tell configure that mpi is a weak dependence of cuda.py, so that >> > it will be forced to be configured later? >> >> MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py >> using the generic dependencies of configure/packages >> >> but perhaps we can just hardwire the rerunning of cuda.py when the MPI >> compilers are reset. I will try that now and if I can get it to work we >> should be able to move those old fix branches along as MR. >> >> Barry >> >> >> >>> On May 28, 2021, at 12:41 PM, Mark Adams <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> OK, I will try to rebase and test Barry's branch. >>> >>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini <[email protected] >>> <mailto:[email protected]>> wrote: >>> Yes, it is the branch I was using before force pushing to Barry’s >>> barry/2020-11-11/cleanup-matsetvaluesdevice >>> You can use both I guess >>> >>>> On May 28, 2021, at 8:25 PM, Mark Adams <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>>> >>>> >>>> stefanozampini/simplify-setvalues-device >>>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device> >>>> >>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> I am fixing rebasing this branch over main. >>>> >>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Or probably remove —download-openmpi ? Or, just for the moment, why can’t >>>> we just tell configure that mpi is a weak dependence of cuda.py, so that >>>> it will be forced to be configured later? >>>> >>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> That branch provides a fix for MatSetValuesDevice but it never got merged >>>>> because of the CI issues with the —download-openmpi. We can probably try >>>>> to skip the test in that specific configuration? >>>>> >>>>>> On May 28, 2021, at 7:45 PM, Barry Smith <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> >>>>>> ~/petsc/src/mat/tutorials >>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=) >>>>>> arch-robustify-cuda-gencodearch-check >>>>>> $ ./ex5cu >>>>>> terminate called after throwing an instance of >>>>>> 'thrust::system::system_error' >>>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>>>>> illegal memory access was encountered >>>>>> Aborted (core dumped) >>>>>> >>>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>>> >>>>>> CI does not test with CUDA and no ctable. The code is still broken as >>>>>> it was six months ago in the discussion Stefano pointed to. It is clear >>>>>> why just no one has had the time to clean things up. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini >>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>> If you are referring to your device set values, I guess it is not >>>>>>> currently tested >>>>>>> >>>>>>> No. There is a test for that (ex5cu). >>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. >>>>>>> I suspect there is memory corruption but I'm trying to cover all the >>>>>>> bases. >>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it >>>>>>> if such a test does not exist. >>>>>>> >>>>>>> See the discussions here >>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3411> >>>>>>> I started cleaning up the code to prepare for testing but we never >>>>>>> finished it >>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>>> >>>>>>> <https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/> >>>>>>> >>>>>>> >>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <[email protected] >>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> Is there a test with MatSetValues and CUDA? >>>>>>> >>>>>> >>>>> >>>> >>> >> >
