Does AMGX require sorted column indices? (Python indentation notation below)

  If not 
     just use MatMPIAIJGetLocalMatMerge instead of MatMPIAIJGetLocalMat.

  If yes, 
     on the first call

     Mat tmplocal;
     PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &tmplocal));
     PetscCall(MatConvert(tmplocal,MATSEQAIJCUSPARSE,&amgx->localA));
     PetscCall(MatDestroy(&tmplocal));

     leave the later calls as is with

     PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA);

Eventually, someone will need to buckle down and write 
MatMPIAIJGetLocalMat_SeqAIJCUSPARSE, but that can be done later.

Barry


> On Jun 25, 2022, at 9:13 AM, Mark Adams <mfad...@lbl.gov> wrote:
> 
> 
> 
> On Fri, Jun 24, 2022 at 1:54 PM Barry Smith <bsm...@petsc.dev 
> <mailto:bsm...@petsc.dev>> wrote:
> 
> 
>> On Jun 24, 2022, at 1:38 PM, Mark Adams <mfad...@lbl.gov 
>> <mailto:mfad...@lbl.gov>> wrote:
>> 
>> I am rearranging the code for clarity from the repo but I have:
>> 
>>   PetscBool is_dev_ptrs;
>>   PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA));
>>   PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, 
>> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, ""));
>>   PetscPrintf(PETSC_COMM_SELF,"checking against mataijcusparse amgx->localA 
>> = %d\n",is_dev_ptrs);
>>   PetscCall(PetscObjectTypeCompareAny((PetscObject)Pmat, &is_dev_ptrs, 
>> MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, ""));
>>   PetscPrintf(PETSC_COMM_SELF,"checking against mataijcusparse Pmat = 
>> %d\n",is_dev_ptrs);
>> 
>> And it seems to show that MatMPIAIJGetLocalMat takes a mpiaijcusparse Mat 
>> and returns an seqaij mat (see below):
> 
>  Yes, this is how it currently behaves as Stefano has indicated. Thus it is 
> not currently directly suitable for use with GPUs. As Stefano has indicated 
> it has be revised to handle mpiaijcusparse matrices correctly in the same way 
> that MatMPIAIJGetLocalMatMerge has been revised for GPUs.
> 
> 
> OK, sorry, I did not understand that this is not supported. We need a 
> MatMPIAIJCusparseGetLocalMatMerge (I read this as supported with "hstack" 
> format, unsorted?, by Stefano)
> 
> What is the best way to proceed? Should we just convert to amgx->localA to 
> mpiaijcusparse if Pmat is a cusparse matrix?
> If so, should this code go in amgx or 
> MatMPIAIJGetLocalMat(MAT_INITIAL_MATRIX) ?
> Or should I add a MatMPIAIJCusparseGetLocalMatMerge that simply wraps these 
> two calls for now?
> 
> Thanks,
> Mark
> 
>  
> 
>> 
>> AMGX version 2.2.0.132-opensource
>> Built on Jun 24 2022, 09:21:43
>> Compiled with CUDA Runtime 11.5, using CUDA driver 11.5
>> checking against mataijcusparse amgx->localA = 0
>> checking against mataijcusparse Pmat = 1
>> localA_name seqaij
>> Pmat_name mpiaijcusparse
>> 
>> Matt's existing testing code (below) then shows the types that conform with 
>> these tests and prints that I added.
>> 
>>   // XXX DEBUG REMOVE
>>   const char* localA_name;
>>   PetscObjectGetType((PetscObject)amgx->localA, &localA_name);
>>   PetscPrintf(PETSC_COMM_SELF,"localA_name %s\n", localA_name);
>>   const char* Pmat_name;
>>   PetscObjectGetType((PetscObject)Pmat, &Pmat_name);
>>   PetscPrintf(PETSC_COMM_SELF,"Pmat_name %s\n", Pmat_name);
>> 
>> 
>> 
>> 
>> 
>> 
>> On Fri, Jun 24, 2022 at 10:00 AM Barry Smith <bsm...@petsc.dev 
>> <mailto:bsm...@petsc.dev>> wrote:
>> 
>> 
>>> On Jun 24, 2022, at 8:58 AM, Mark Adams <mfad...@lbl.gov 
>>> <mailto:mfad...@lbl.gov>> wrote:
>>> 
>>> And before we move to the MR, I think Matt found a clear problem:
>>> 
>>> * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>>> returns "localA seqaij"
>>> 
>>> * And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA, 
>>> &amgx->values)); returns:
>>> "it seems to detect that the pointer is a device mapped pointer but that it 
>>> is invalid"
>> 
>>   It does not return a device mapped pointer, it returns a valid host 
>> pointer only. MatSeqAIJGetArrayRead() is intended to only return a host 
>> pointer, it cannot return a device pointer. MatSeqAIJCusparseGetArrayRead() 
>> returns device pointers and should be used for this purpose.
>> 
>>> 
>>> Matt, lets just comment out the REUSE line and add another INITIAL line 
>>> (destroying the old Mat of course), and lets press on.
>> 
>>   Looking at the code there is no way that simply using INITIAL instead of 
>> REUSE will make a code that does not work on the GPU run on the GPU. The 
>> MatMPIAIJGetLocalMat() returns only a MATSEQAIJ matrix regardless of the 
>> INITIAL versus REUSE and one can never get a device pointer from a non-GPU 
>> matrix.
>> 
>>    As noted by Stefano, the code either needs to use 
>> MatMPIAIJGetLocalMatMerge () which does return a CUSPARSE matrix (but the 
>> columns are not supported) or MatMPIAIJGetLocalMat()
>> needs to be updated to return a CUSPARSE matrix when the input MPI matrix is 
>> a CUSPARSE matrix.
>> 
>> 
>> 
>> 
>>> We can keep the debugging code for now.
>>> 
>>> We (PETSc) can work on this independently,
>>> 
>>> Thanks,
>>> Mark
>>> 
>>> On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <mfad...@lbl.gov 
>>> <mailto:mfad...@lbl.gov>> wrote:
>>> I am not seeing this response, I see my "hstack" comment last.
>>> https://gitlab.com/petsc/petsc/-/merge_requests/4323 
>>> <https://gitlab.com/petsc/petsc/-/merge_requests/4323>
>>> 
>>> On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <bsm...@petsc.dev 
>>> <mailto:bsm...@petsc.dev>> wrote:
>>> 
>>>  I have responded in the MR, which has all the context and the code. Please 
>>> move this conversation from petsc-dev to the MR. Note you can use the 
>>> little cartoon cloud symbol (upper write of the sub window with my text)  
>>> to reply to my post and keep everything in a thread for clarity.
>>> 
>>>   We are confused because it seems you are trying a variety of things and 
>>> we don't know how the different things you tried resulted in the multiple 
>>> errors you reported.
>>> 
>>> 
>>> 
>>>> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <mmartin...@nvidia.com 
>>>> <mailto:mmartin...@nvidia.com>> wrote:
>>>> 
>>>> I checked in the changes and some debugging statements.
>>>>  
>>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>>>> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, 
>>>> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, ""));
>>>>  
>>>> Then the call returns false. If we instead call PetscObjectTypeCompareAny 
>>>> on Pmat then it returns true. If you print the type of the matrices:
>>>>  
>>>> localA seqaij
>>>> Pmat mpiaijcusparse
>>>>  
>>>> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it 
>>>> errors (presumably because of the type mismatch).
>>>>  
>>>> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to 
>>>> AmgX it seems to detect that the pointer is a device mapped pointer but 
>>>> that it is invalid.
>>>>  
>>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA));
>>>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems to 
>>>> return invalid pointer, but I’ll investigate more
>>>>  
>>>> This doesn’t reproduce if we call:
>>>>  
>>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA));
>>>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Pointer 
>>>> appears to be valid and we converge
>>>>  
>>>> Essentially all I want to achieve is that when we are parallel, we fetch 
>>>> the local part of A and the device pointer to the matrix values from that 
>>>> structure so that we can pass to AmgX. Preferring whichever API calls are 
>>>> the most efficient. 
>>>>  
>>>>  
>>>> From: Stefano Zampini <stefano.zamp...@gmail.com 
>>>> <mailto:stefano.zamp...@gmail.com>> 
>>>> Sent: 23 June 2022 20:55
>>>> To: Mark Adams <mfad...@lbl.gov <mailto:mfad...@lbl.gov>>
>>>> Cc: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>; For users of 
>>>> the development version of PETSc <petsc-dev@mcs.anl.gov 
>>>> <mailto:petsc-dev@mcs.anl.gov>>; Matthew Martineau <mmartin...@nvidia.com 
>>>> <mailto:mmartin...@nvidia.com>>
>>>> Subject: Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs
>>>>  
>>>> External email: Use caution opening links or attachments
>>>>  
>>>> The logic is wrong. It should check for MATSEQAIJCUSPARSE. 
>>>>  
>>>> On Thu, Jun 23, 2022, 21:36 Mark Adams <mfad...@lbl.gov 
>>>> <mailto:mfad...@lbl.gov>> wrote:
>>>>  
>>>>  
>>>> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <bsm...@petsc.dev 
>>>> <mailto:bsm...@petsc.dev>> wrote:
>>>>  
>>>>   It looks like the current code copies the nonzero values to the CPU from 
>>>> the MPI matrix (with the calls 
>>>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav));
>>>>   PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them into 
>>>> the CPU memory of the Seq matrix. When the matrix entries are next 
>>>> accessed on the GPU it should automatically copy them down to the GPU.  So 
>>>> the code looks ok even for GPUs. We'll need to see the full error message 
>>>> with what the "invalid pointer" is.
>>>>  
>>>> I showed Matt how to peek into offloadmask and he found that it is a host 
>>>> state, but this is not the issue. The access method should do the copy to 
>>>> the device.
>>>>  
>>>> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" 
>>>> in the comparison below).
>>>>  
>>>> Matt, is the issue that you are calling  MatSeqAIJCUSPARSEGetArrayRead and 
>>>> getting a host pointer?
>>>>  
>>>> I think the state of amgx->localA after the call to 
>>>> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the 
>>>> data to the device so they are both valid and you should have device data.
>>>>  
>>>> 211   PetscBool is_dev_ptrs;
>>>> 212   PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, 
>>>> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, ""));
>>>> 213
>>>> 214   if (is_dev_ptrs) {
>>>> 216     PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, 
>>>> &amgx->values));
>>>> 217   } else {
>>>> 219     PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); 
>>>> 220   }
>>>>  
>>>>  
>>>>   Barry
>>>>  
>>>>  
>>>>  Yes this routine is terribly inefficient for GPU matrices, it needs to be 
>>>> specialized to not use the GPU memory but that is a separate issue from 
>>>> there being bugs in the current code.
>>>>  
>>>>   The code also seems to implicitly assume the parallel matrix has the 
>>>> same nonzero pattern with a reuse. This should be checked with each use by 
>>>> stashing the nonzero state of the matrix into the sequential matrix and 
>>>> making sure the parallel matrix has that same stashed value each time. 
>>>> Currently if one changes the nonzero matrix of the parallel matrix one is 
>>>> likely to get random confusing crashes due to memory corruption. But 
>>>> likely not the problem here.
>>>> 
>>>> 
>>>> On Jun 23, 2022, at 2:23 PM, Mark Adams <mfad...@lbl.gov 
>>>> <mailto:mfad...@lbl.gov>> wrote:
>>>>  
>>>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel.
>>>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass in 
>>>> the code below, where the local matrix is created (INITIAL), but in the 
>>>> next pass, when "REUSE" is used, he sees an invalid pointer.
>>>> Matt found that it does have offloadmask == CPU. 
>>>> Maybe it is missing logic to put the output in same state as the input?
>>>>  
>>>> Any ideas on this or should I just dig into it?
>>>>  
>>>> Thanks,
>>>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != 
>>>> DIFFERENT_NONZERO_PATTERN);
>>>> 199   if (amgx->nranks > 1) {
>>>> 200     if (partial_setup_allowed) {
>>>> 202       PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, 
>>>> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX 
>>>> API
>>>> 203     } else {
>>>> 205       PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, 
>>>> &amgx->localA)); // This path works
>>>> 206     }
>>>> 207   } else {
>>>> 208     amgx->localA = Pmat;
>>>> 209   }
>>>> 210
>>> 
>> 
> 

Reply via email to