Re: [petsc-users] CUDA MatSetValues test

Stefano Zampini Fri, 28 May 2021 12:13:26 -0700

I can take a quick look at it tomorrow, what are the main changes you made 
since then?


> On May 28, 2021, at 9:51 PM, Mark Adams <[email protected]> wrote:
> 
> I am getting messed up in trying to resolve conflicts in rebasing over main.
> Is there a better way of doing this?
> Can I just tell git to use Barry's version and then test it?
> Or should I just try it again?
> 
> On Fri, May 28, 2021 at 2:15 PM Mark Adams <[email protected] 
> <mailto:[email protected]>> wrote:
> I am rebasing over main and its a bit of a mess. I must have missed 
> something. I get this. I think the _n_SplitCSRMat must be wrong.
> 
> 
> In file included from 
> /autofs/nccs-svm1_home1/adams/petsc/src/vec/is/sf/impls/basic/sfbasic.c:128:0:
> /ccs/home/adams/petsc/include/petscmat.h:1976:32: error: conflicting types 
> for 'PetscSplitCSRDataStructure'
>  typedef struct _n_SplitCSRMat *PetscSplitCSRDataStructure;
>                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~
> /ccs/home/adams/petsc/include/petscmat.h:1922:31: note: previous declaration 
> of 'PetscSplitCSRDataStructure' was here
>  typedef struct _p_SplitCSRMat PetscSplitCSRDataStructure;
>                                ^~~~~~~~~~~~~~~~~~~~~~~~~~
>           CC arch-summit-opt-gnu-cuda/obj/vec/vec/impls/seq/dvec2.o
> 
> On Fri, May 28, 2021 at 1:50 PM Stefano Zampini <[email protected] 
> <mailto:[email protected]>> wrote:
> OpenMPI.py depends on cuda.py in that, if cuda is present, configures using 
> cuda. MPI.py or MPICH.py do not depend on cuda.py (MPICH, only weakly, it 
> adds a print if cuda is present)
> Since eventually the MPI distro will only need a hint to be configured with 
> CUDA, why not removing the dependency at all and add only a flag 
> —download-openmpi-use-cuda?
> 
>> On May 28, 2021, at 8:44 PM, Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>>  Stefano, who has a far better memory than me, wrote
>> 
>> > Or probably remove —download-openmpi ? Or, just for the moment, why can’t 
>> > we just tell configure that mpi is a weak dependence of cuda.py, so that 
>> > it will be forced to be configured later?
>> 
>>   MPI.py depends on cuda.py so we cannot also have cuda.py depend on MPI.py 
>> using the generic dependencies of configure/packages  
>> 
>>   but perhaps we can just hardwire the rerunning of cuda.py when the MPI 
>> compilers are reset. I will try that now and if I can get it to work we 
>> should be able to move those old fix branches along as MR.
>> 
>>   Barry
>> 
>> 
>> 
>>> On May 28, 2021, at 12:41 PM, Mark Adams <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> OK, I will try to rebase and test Barry's branch.
>>> 
>>> On Fri, May 28, 2021 at 1:26 PM Stefano Zampini <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Yes, it is the branch I was using before force pushing to Barry’s 
>>> barry/2020-11-11/cleanup-matsetvaluesdevice
>>> You can use both I guess
>>> 
>>>> On May 28, 2021, at 8:25 PM, Mark Adams <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Is this the correct branch? It conflicted with ex5cu so I assume it is.
>>>> 
>>>> 
>>>> stefanozampini/simplify-setvalues-device 
>>>> <https://gitlab.com/petsc/petsc/-/tree/stefanozampini/simplify-setvalues-device>
>>>> 
>>>> On Fri, May 28, 2021 at 1:24 PM Mark Adams <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> I am fixing rebasing this branch over main.
>>>> 
>>>> On Fri, May 28, 2021 at 1:16 PM Stefano Zampini <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Or probably remove —download-openmpi ? Or, just for the moment, why can’t 
>>>> we just tell configure that mpi is a weak dependence of cuda.py, so that 
>>>> it will be forced to be configured later?
>>>> 
>>>>> On May 28, 2021, at 8:12 PM, Stefano Zampini <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> That branch provides a fix for MatSetValuesDevice but it never got merged 
>>>>> because of the CI issues with the —download-openmpi. We can probably try 
>>>>> to skip the test in that specific configuration?
>>>>> 
>>>>>> On May 28, 2021, at 7:45 PM, Barry Smith <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> 
>>>>>> ~/petsc/src/mat/tutorials 
>>>>>> (barry/2021-05-28/robustify-cuda-gencodearch-check=) 
>>>>>> arch-robustify-cuda-gencodearch-check
>>>>>> $ ./ex5cu
>>>>>> terminate called after throwing an instance of 
>>>>>> 'thrust::system::system_error'
>>>>>>   what():  fill_n: failed to synchronize: cudaErrorIllegalAddress: an 
>>>>>> illegal memory access was encountered
>>>>>> Aborted (core dumped)
>>>>>> 
>>>>>>         requires: cuda !define(PETSC_USE_CTABLE)
>>>>>> 
>>>>>>   CI does not test with CUDA and no ctable.  The code is still broken as 
>>>>>> it was six months ago in the discussion Stefano pointed to. It is clear 
>>>>>> why just no one has had the time to clean things up.
>>>>>> 
>>>>>>   Barry
>>>>>> 
>>>>>> 
>>>>>>> On May 28, 2021, at 11:13 AM, Mark Adams <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, May 28, 2021 at 11:57 AM Stefano Zampini 
>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>> If you are referring to your device set values, I guess it is not 
>>>>>>> currently tested
>>>>>>> 
>>>>>>> No. There is a test for that (ex5cu).
>>>>>>> I have a user that is getting a segv in MatSetValues with aijcusparse. 
>>>>>>> I suspect there is memory corruption but I'm trying to cover all the 
>>>>>>> bases.
>>>>>>> I have added a cuda test to ksp/ex56 that works. I can do an MR for it 
>>>>>>> if such a test does not exist.
>>>>>>>  
>>>>>>> See the discussions here 
>>>>>>> https://gitlab.com/petsc/petsc/-/merge_requests/3411 
>>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3411>
>>>>>>> I started cleaning up the code to prepare for testing but we never 
>>>>>>> finished it 
>>>>>>> https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/
>>>>>>>  
>>>>>>> <https://gitlab.com/petsc/petsc/-/commits/stefanozampini/simplify-setvalues-device/>
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 28, 2021, at 6:53 PM, Mark Adams <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> Is there a test with MatSetValues and CUDA? 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: [petsc-users] CUDA MatSetValues test

Reply via email to