On Fri, Jun 24, 2022 at 1:54 PM Barry Smith <bsm...@petsc.dev> wrote:
> > > On Jun 24, 2022, at 1:38 PM, Mark Adams <mfad...@lbl.gov> wrote: > > I am rearranging the code for clarity from the repo but I have: > > PetscBool is_dev_ptrs; > PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); > PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, > &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); > PetscPrintf(PETSC_COMM_SELF,"*checking against mataijcusparse > amgx->localA = %d*\n",is_dev_ptrs); > PetscCall(PetscObjectTypeCompareAny((PetscObject)Pmat, &is_dev_ptrs, > MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); > PetscPrintf(PETSC_COMM_SELF,"*checking against mataijcusparse Pmat = %d* > \n",is_dev_ptrs); > > And it seems to show that MatMPIAIJGetLocalMat takes a mpiaijcusparse Mat > and returns an seqaij mat (see below): > > > Yes, this is how it currently behaves as Stefano has indicated. Thus it > is not currently directly suitable for use with GPUs. As Stefano has > indicated it has be revised to handle mpiaijcusparse matrices correctly > in the same way that MatMPIAIJGetLocalMatMerge has been revised for GPUs. > > OK, sorry, I did not understand that this is not supported. We need a MatMPIAIJCusparseGetLocalMatMerge (I read this as supported with "hstack" format, unsorted?, by Stefano) What is the best way to proceed? Should we just convert to amgx->localA to mpiaijcusparse if Pmat is a cusparse matrix? If so, should this code go in amgx or MatMPIAIJGetLocalMat(MAT_INITIAL_MATRIX) ? Or should I add a MatMPIAIJCusparseGetLocalMatMerge that simply wraps these two calls for now? Thanks, Mark > > > AMGX version 2.2.0.132-opensource > Built on Jun 24 2022, 09:21:43 > Compiled with CUDA Runtime 11.5, using CUDA driver 11.5 > > *checking against mataijcusparse amgx->localA = 0checking against > mataijcusparse Pmat = 1* > localA_name seqaij > Pmat_name mpiaijcusparse > > Matt's existing testing code (below) then shows the types that conform > with these tests and prints that I added. > > // XXX DEBUG REMOVE > const char* localA_name; > PetscObjectGetType((PetscObject)amgx->localA, &localA_name); > PetscPrintf(PETSC_COMM_SELF,"localA_name %s\n", localA_name); > const char* Pmat_name; > PetscObjectGetType((PetscObject)Pmat, &Pmat_name); > PetscPrintf(PETSC_COMM_SELF,"Pmat_name %s\n", Pmat_name); > > > > > > > On Fri, Jun 24, 2022 at 10:00 AM Barry Smith <bsm...@petsc.dev> wrote: > >> >> >> On Jun 24, 2022, at 8:58 AM, Mark Adams <mfad...@lbl.gov> wrote: >> >> And before we move to the MR, I think Matt found a clear problem: >> >> * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >> returns "localA seqaij" >> >> * And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA, >> &amgx->values)); returns: >> "it seems to detect that the pointer is a device mapped pointer but that >> it is invalid" >> >> >> It does not return a device mapped pointer, it returns a valid host >> pointer only. MatSeqAIJGetArrayRead() is intended to only return a host >> pointer, it cannot return a device pointer. MatSeqAIJCusparseGetArrayRead() >> returns device pointers and should be used for this purpose. >> >> >> Matt, lets just comment out the REUSE line and add another INITIAL line >> (destroying the old Mat of course), and lets press on. >> >> >> Looking at the code there is no way that simply using INITIAL instead >> of REUSE will make a code that does not work on the GPU run on the GPU. The >> MatMPIAIJGetLocalMat() returns only a MATSEQAIJ matrix regardless of >> the INITIAL versus REUSE and one can never get a device pointer from a >> non-GPU matrix. >> >> As noted by Stefano, the code either needs to use >> MatMPIAIJGetLocalMatMerge () which does return a CUSPARSE matrix (but >> the columns are not supported) or MatMPIAIJGetLocalMat() >> needs to be updated to return a CUSPARSE matrix when the input MPI matrix >> is a CUSPARSE matrix. >> >> >> >> >> We can keep the debugging code for now. >> >> We (PETSc) can work on this independently, >> >> Thanks, >> Mark >> >> On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <mfad...@lbl.gov> wrote: >> >>> I am not seeing this response, I see my "hstack" comment last. >>> https://gitlab.com/petsc/petsc/-/merge_requests/4323 >>> >>> On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <bsm...@petsc.dev> wrote: >>> >>>> >>>> I have responded in the MR, which has all the context and the code. >>>> Please move this conversation from petsc-dev to the MR. Note you can use >>>> the little cartoon cloud symbol (upper write of the sub window with my >>>> text) to reply to my post and keep everything in a thread for clarity. >>>> >>>> We are confused because it seems you are trying a variety of things >>>> and we don't know how the different things you tried resulted in the >>>> multiple errors you reported. >>>> >>>> >>>> >>>> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <mmartin...@nvidia.com> >>>> wrote: >>>> >>>> I checked in the changes and some debugging statements. >>>> >>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>>> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >>>> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); >>>> >>>> Then the call returns false. If we instead call >>>> PetscObjectTypeCompareAny on Pmat then it returns true. If you print the >>>> type of the matrices: >>>> >>>> >>>> localA seqaij >>>> >>>> Pmat mpiaijcusparse >>>> >>>> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then >>>> it errors (presumably because of the type mismatch). >>>> >>>> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` >>>> to AmgX it seems to detect that the pointer is a device mapped pointer but >>>> that it is invalid. >>>> >>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems >>>> to return invalid pointer, but I’ll investigate more >>>> >>>> This doesn’t reproduce if we call: >>>> >>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, >>>> &amgx->localA)); >>>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // >>>> Pointer appears to be valid and we converge >>>> >>>> Essentially all I want to achieve is that when we are parallel, we >>>> fetch the local part of A and the device pointer to the matrix values from >>>> that structure so that we can pass to AmgX. Preferring whichever API calls >>>> are the most efficient. >>>> >>>> >>>> *From:* Stefano Zampini <stefano.zamp...@gmail.com> >>>> *Sent:* 23 June 2022 20:55 >>>> *To:* Mark Adams <mfad...@lbl.gov> >>>> *Cc:* Barry Smith <bsm...@petsc.dev>; For users of the development >>>> version of PETSc <petsc-dev@mcs.anl.gov>; Matthew Martineau < >>>> mmartin...@nvidia.com> >>>> *Subject:* Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs >>>> >>>> *External email: Use caution opening links or attachments* >>>> >>>> The logic is wrong. It should check for MATSEQAIJCUSPARSE. >>>> >>>> On Thu, Jun 23, 2022, 21:36 Mark Adams <mfad...@lbl.gov> wrote: >>>> >>>> >>>> >>>> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <bsm...@petsc.dev> wrote: >>>> >>>> >>>> It looks like the current code copies the nonzero values to the CPU >>>> from the MPI matrix (with the calls >>>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav)); >>>> PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them >>>> into the CPU memory of the Seq matrix. When the matrix entries are next >>>> accessed on the GPU it should automatically copy them down to the GPU. So >>>> the code looks ok even for GPUs. We'll need to see the full error message >>>> with what the "invalid pointer" is. >>>> >>>> >>>> I showed Matt how to peek into offloadmask and he found that it is a >>>> host state, but this is not the issue. The access method should do the copy >>>> to the device. >>>> >>>> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> >>>> "MAT" in the comparison below). >>>> >>>> Matt, is the issue that you are calling >>>> *MatSeqAIJCUSPARSEGetArrayRead* and getting a host pointer? >>>> >>>> I think the state of amgx->localA after the call to >>>> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data >>>> to the device so they are both valid and you should have device data. >>>> >>>> 211 PetscBool is_dev_ptrs; >>>> 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >>>> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, "")); >>>> 213 >>>> 214 if (is_dev_ptrs) { >>>> >>>> *216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, >>>> &amgx->values));*217 } else { >>>> 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); >>>> 220 } >>>> >>>> >>>> >>>> Barry >>>> >>>> >>>> Yes this routine is terribly inefficient for GPU matrices, it needs to >>>> be specialized to not use the GPU memory but that is a separate issue from >>>> there being bugs in the current code. >>>> >>>> The code also seems to implicitly assume the parallel matrix has the >>>> same nonzero pattern with a reuse. This should be checked with each use by >>>> stashing the nonzero state of the matrix into the sequential matrix and >>>> making sure the parallel matrix has that same stashed value each time. >>>> Currently if one changes the nonzero matrix of the parallel matrix one is >>>> likely to get random confusing crashes due to memory corruption. But likely >>>> not the problem here. >>>> >>>> >>>> On Jun 23, 2022, at 2:23 PM, Mark Adams <mfad...@lbl.gov> wrote: >>>> >>>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. >>>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass >>>> in the code below, where the local matrix is created (INITIAL), but in the >>>> next pass, when "REUSE" is used, he sees an invalid pointer. >>>> Matt found that it does have offloadmask == CPU. >>>> Maybe it is missing logic to put the output in same state as the input? >>>> >>>> Any ideas on this or should I just dig into it? >>>> >>>> Thanks, >>>> >>>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != >>>> DIFFERENT_NONZERO_PATTERN); >>>> >>>> 199 if (amgx->nranks > 1) { >>>> >>>> 200 if (partial_setup_allowed) { >>>> >>>> 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, >>>> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX >>>> API >>>> >>>> 203 } else { >>>> >>>> 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, >>>> &amgx->localA)); // This path works >>>> >>>> 206 } >>>> >>>> 207 } else { >>>> >>>> 208 amgx->localA = Pmat; >>>> >>>> 209 } >>>> >>>> 210 >>>> >>>> >>>> >> >