I am getting messed up in trying to resolve conflicts in rebasing over main. Is there a better way of doing this? Can I just tell git to use Barry's version and then test it? Or should I just try it again?
On Fri, May 28, 2021 at 2:15 PM Mark Adams <[email protected]> wrote: > I am rebasing over main and its a bit of a mess. I must have missed > something. I get this. I think the _n_SplitCSRMat must be wrong. > > > In file included from > /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0: > /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types > for 'PetscSplitCSRDataStructure' > typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous > declaration of 'PetscSplitCSRDataStructure' was here > typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure; > ^~~~~~~~~~~~~~~~~~~~~~~~~~ > CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o > > On Fri, May 28, 2021 at 1:50 PM Stefano Zampini <[email protected]> > wrote: > >> OpenMPI.py depends on cuda.py in that, if cuda is present, configures >> using cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only >> weakly, it adds a print if cuda is present) >> Since eventually the MPI distro will only need a hint to be configured >> with CUDA, why not removing the dependency at all and add only a flag >> —download-openmpi-use-cuda? >> >> On May 28, 2021, at 8:44 PM, Barry Smith <[email protected]> wrote: >> >> >> Stefano, who has a far better memory than me, wrote >> >> > Or probably remove —download-openmpi ? Or, just for the moment, why >> can’t we just tell configure that mpi is a weak dependence of cuda.py, so >> that it will be forced to be configured later? >> >> MPI.py depends on cuda.py so we cannot also have cuda.py depend on >> MPI.py using the generic dependencies of configure/packages >> >> but perhaps we can just hardwire the rerunning of cuda.py when the MPI >> compilers are reset. I will try that now and if I can get it to work we >> should be able to move those old fix branches along as MR. >> >> Barry >> >> >> >> On May 28, 2021, at 12:41 PM, Mark Adams <[email protected]> wrote: >> >> OK, I will try to rebase and test Barry's branch. >> >> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini < >> [email protected]> wrote: >> >>> Yes, it is the branch I was using before force pushing to >>> Barry’s barry/2020-11-11/cleanup-matsetvaluesdevice >>> You can use both I guess >>> >>> On May 28, 2021, at 8:25 PM, Mark Adams <[email protected]> wrote: >>> >>> Is this the correct branch? It conflicted with ex5cu so I assume it is. >>> >>> >>> stefanozampini/simplify-setvalues-device >>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device> >>> >>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <[email protected]> wrote: >>> >>>> I am fixing rebasing this branch over main. >>>> >>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini < >>>> [email protected]> wrote: >>>> >>>>> Or probably remove —download-openmpi ? Or, just for the moment, why >>>>> can’t we just tell configure that mpi is a weak dependence of cuda.py, so >>>>> that it will be forced to be configured later? >>>>> >>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini < >>>>> [email protected]> wrote: >>>>> >>>>> That branch provides a fix for MatSetValuesDevice but it never got >>>>> merged because of the CI issues with the —download-openmpi. We can >>>>> probably >>>>> try to skip the test in that specific configuration? >>>>> >>>>> On May 28, 2021, at 7:45 PM, Barry Smith <[email protected]> wrote: >>>>> >>>>> >>>>> ~/petsc/src/mat/tutorials* >>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=)* >>>>> arch-robustify-cuda-gencodearch-check >>>>> $ ./ex5cu >>>>> terminate called after throwing an instance of >>>>> 'thrust::system::system_error' >>>>> what(): fill_n: failed to synchronize: cudaErrorIllegalAddress: an >>>>> illegal memory access was encountered >>>>> Aborted (core dumped) >>>>> >>>>> requires: cuda !define(PETSC_USE_CTABLE) >>>>> >>>>> CI does not test with CUDA and no ctable. The code is still broken >>>>> as it was six months ago in the discussion Stefano pointed to. It is clear >>>>> why just no one has had the time to clean things up. >>>>> >>>>> Barry >>>>> >>>>> >>>>> On May 28, 2021, at 11:13 AM, Mark Adams <[email protected]> wrote: >>>>> >>>>> >>>>> >>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini < >>>>> [email protected]> wrote: >>>>> >>>>>> If you are referring to your device set values, I guess it is not >>>>>> currently tested >>>>>> >>>>> >>>>> No. There is a test for that (ex5cu). >>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. >>>>> I suspect there is memory corruption but I'm trying to cover all the >>>>> bases. >>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it >>>>> if such a test does not exist. >>>>> >>>>> >>>>>> See the discussions here >>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 >>>>>> I started cleaning up the code to prepare for testing but we never >>>>>> finished it >>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/ >>>>>> >>>>>> >>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <[email protected]> wrote: >>>>>> >>>>>> Is there a test with MatSetValues and CUDA? >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >> >>
