Does AMGX require sorted column indices? (Python indentation notation below)
If not just use MatMPIAIJGetLocalMatMerge instead of MatMPIAIJGetLocalMat. If yes, on the first call Mat tmplocal; PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &tmplocal)); PetscCall(MatConvert(tmplocal,MATSEQAIJCUSPARSE,&amgx->localA)); PetscCall(MatDestroy(&tmplocal)); leave the later calls as is with PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA); Eventually, someone will need to buckle down and write MatMPIAIJGetLocalMat_SeqAIJCUSPARSE, but that can be done later. Barry > On Jun 25, 2022, at 9:13 AM, Mark Adams <mfad...@lbl.gov> wrote: > > > > On Fri, Jun 24, 2022 at 1:54 PM Barry Smith <bsm...@petsc.dev > <mailto:bsm...@petsc.dev>> wrote: > > >> On Jun 24, 2022, at 1:38 PM, Mark Adams <mfad...@lbl.gov >> <mailto:mfad...@lbl.gov>> wrote: >> >> I am rearranging the code for clarity from the repo but I have: >> >> PetscBool is_dev_ptrs; >> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); >> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); >> PetscPrintf(PETSC_COMM_SELF,"checking against mataijcusparse amgx->localA >> = %d\n",is_dev_ptrs); >> PetscCall(PetscObjectTypeCompareAny((PetscObject)Pmat, &is_dev_ptrs, >> MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); >> PetscPrintf(PETSC_COMM_SELF,"checking against mataijcusparse Pmat = >> %d\n",is_dev_ptrs); >> >> And it seems to show that MatMPIAIJGetLocalMat takes a mpiaijcusparse Mat >> and returns an seqaij mat (see below): > > Yes, this is how it currently behaves as Stefano has indicated. Thus it is > not currently directly suitable for use with GPUs. As Stefano has indicated > it has be revised to handle mpiaijcusparse matrices correctly in the same way > that MatMPIAIJGetLocalMatMerge has been revised for GPUs. > > > OK, sorry, I did not understand that this is not supported. We need a > MatMPIAIJCusparseGetLocalMatMerge (I read this as supported with "hstack" > format, unsorted?, by Stefano) > > What is the best way to proceed? Should we just convert to amgx->localA to > mpiaijcusparse if Pmat is a cusparse matrix? > If so, should this code go in amgx or > MatMPIAIJGetLocalMat(MAT_INITIAL_MATRIX) ? > Or should I add a MatMPIAIJCusparseGetLocalMatMerge that simply wraps these > two calls for now? > > Thanks, > Mark > > > >> >> AMGX version 2.2.0.132-opensource >> Built on Jun 24 2022, 09:21:43 >> Compiled with CUDA Runtime 11.5, using CUDA driver 11.5 >> checking against mataijcusparse amgx->localA = 0 >> checking against mataijcusparse Pmat = 1 >> localA_name seqaij >> Pmat_name mpiaijcusparse >> >> Matt's existing testing code (below) then shows the types that conform with >> these tests and prints that I added. >> >> // XXX DEBUG REMOVE >> const char* localA_name; >> PetscObjectGetType((PetscObject)amgx->localA, &localA_name); >> PetscPrintf(PETSC_COMM_SELF,"localA_name %s\n", localA_name); >> const char* Pmat_name; >> PetscObjectGetType((PetscObject)Pmat, &Pmat_name); >> PetscPrintf(PETSC_COMM_SELF,"Pmat_name %s\n", Pmat_name); >> >> >> >> >> >> >> On Fri, Jun 24, 2022 at 10:00 AM Barry Smith <bsm...@petsc.dev >> <mailto:bsm...@petsc.dev>> wrote: >> >> >>> On Jun 24, 2022, at 8:58 AM, Mark Adams <mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> wrote: >>> >>> And before we move to the MR, I think Matt found a clear problem: >>> >>> * PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>> returns "localA seqaij" >>> >>> * And, oddly, PetscCall(MatSeqAIJGetArrayRead(amgx->localA, >>> &amgx->values)); returns: >>> "it seems to detect that the pointer is a device mapped pointer but that it >>> is invalid" >> >> It does not return a device mapped pointer, it returns a valid host >> pointer only. MatSeqAIJGetArrayRead() is intended to only return a host >> pointer, it cannot return a device pointer. MatSeqAIJCusparseGetArrayRead() >> returns device pointers and should be used for this purpose. >> >>> >>> Matt, lets just comment out the REUSE line and add another INITIAL line >>> (destroying the old Mat of course), and lets press on. >> >> Looking at the code there is no way that simply using INITIAL instead of >> REUSE will make a code that does not work on the GPU run on the GPU. The >> MatMPIAIJGetLocalMat() returns only a MATSEQAIJ matrix regardless of the >> INITIAL versus REUSE and one can never get a device pointer from a non-GPU >> matrix. >> >> As noted by Stefano, the code either needs to use >> MatMPIAIJGetLocalMatMerge () which does return a CUSPARSE matrix (but the >> columns are not supported) or MatMPIAIJGetLocalMat() >> needs to be updated to return a CUSPARSE matrix when the input MPI matrix is >> a CUSPARSE matrix. >> >> >> >> >>> We can keep the debugging code for now. >>> >>> We (PETSc) can work on this independently, >>> >>> Thanks, >>> Mark >>> >>> On Fri, Jun 24, 2022 at 8:51 AM Mark Adams <mfad...@lbl.gov >>> <mailto:mfad...@lbl.gov>> wrote: >>> I am not seeing this response, I see my "hstack" comment last. >>> https://gitlab.com/petsc/petsc/-/merge_requests/4323 >>> <https://gitlab.com/petsc/petsc/-/merge_requests/4323> >>> >>> On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <bsm...@petsc.dev >>> <mailto:bsm...@petsc.dev>> wrote: >>> >>> I have responded in the MR, which has all the context and the code. Please >>> move this conversation from petsc-dev to the MR. Note you can use the >>> little cartoon cloud symbol (upper write of the sub window with my text) >>> to reply to my post and keep everything in a thread for clarity. >>> >>> We are confused because it seems you are trying a variety of things and >>> we don't know how the different things you tried resulted in the multiple >>> errors you reported. >>> >>> >>> >>>> On Jun 23, 2022, at 3:59 PM, Matthew Martineau <mmartin...@nvidia.com >>>> <mailto:mmartin...@nvidia.com>> wrote: >>>> >>>> I checked in the changes and some debugging statements. >>>> >>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>>> PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >>>> &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); >>>> >>>> Then the call returns false. If we instead call PetscObjectTypeCompareAny >>>> on Pmat then it returns true. If you print the type of the matrices: >>>> >>>> localA seqaij >>>> Pmat mpiaijcusparse >>>> >>>> If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it >>>> errors (presumably because of the type mismatch). >>>> >>>> If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to >>>> AmgX it seems to detect that the pointer is a device mapped pointer but >>>> that it is invalid. >>>> >>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); >>>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems to >>>> return invalid pointer, but I’ll investigate more >>>> >>>> This doesn’t reproduce if we call: >>>> >>>> PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); >>>> PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Pointer >>>> appears to be valid and we converge >>>> >>>> Essentially all I want to achieve is that when we are parallel, we fetch >>>> the local part of A and the device pointer to the matrix values from that >>>> structure so that we can pass to AmgX. Preferring whichever API calls are >>>> the most efficient. >>>> >>>> >>>> From: Stefano Zampini <stefano.zamp...@gmail.com >>>> <mailto:stefano.zamp...@gmail.com>> >>>> Sent: 23 June 2022 20:55 >>>> To: Mark Adams <mfad...@lbl.gov <mailto:mfad...@lbl.gov>> >>>> Cc: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>; For users of >>>> the development version of PETSc <petsc-dev@mcs.anl.gov >>>> <mailto:petsc-dev@mcs.anl.gov>>; Matthew Martineau <mmartin...@nvidia.com >>>> <mailto:mmartin...@nvidia.com>> >>>> Subject: Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs >>>> >>>> External email: Use caution opening links or attachments >>>> >>>> The logic is wrong. It should check for MATSEQAIJCUSPARSE. >>>> >>>> On Thu, Jun 23, 2022, 21:36 Mark Adams <mfad...@lbl.gov >>>> <mailto:mfad...@lbl.gov>> wrote: >>>> >>>> >>>> On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <bsm...@petsc.dev >>>> <mailto:bsm...@petsc.dev>> wrote: >>>> >>>> It looks like the current code copies the nonzero values to the CPU from >>>> the MPI matrix (with the calls >>>> PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav)); >>>> PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them into >>>> the CPU memory of the Seq matrix. When the matrix entries are next >>>> accessed on the GPU it should automatically copy them down to the GPU. So >>>> the code looks ok even for GPUs. We'll need to see the full error message >>>> with what the "invalid pointer" is. >>>> >>>> I showed Matt how to peek into offloadmask and he found that it is a host >>>> state, but this is not the issue. The access method should do the copy to >>>> the device. >>>> >>>> I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" >>>> in the comparison below). >>>> >>>> Matt, is the issue that you are calling MatSeqAIJCUSPARSEGetArrayRead and >>>> getting a host pointer? >>>> >>>> I think the state of amgx->localA after the call to >>>> MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the >>>> data to the device so they are both valid and you should have device data. >>>> >>>> 211 PetscBool is_dev_ptrs; >>>> 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, >>>> &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, "")); >>>> 213 >>>> 214 if (is_dev_ptrs) { >>>> 216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, >>>> &amgx->values)); >>>> 217 } else { >>>> 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); >>>> 220 } >>>> >>>> >>>> Barry >>>> >>>> >>>> Yes this routine is terribly inefficient for GPU matrices, it needs to be >>>> specialized to not use the GPU memory but that is a separate issue from >>>> there being bugs in the current code. >>>> >>>> The code also seems to implicitly assume the parallel matrix has the >>>> same nonzero pattern with a reuse. This should be checked with each use by >>>> stashing the nonzero state of the matrix into the sequential matrix and >>>> making sure the parallel matrix has that same stashed value each time. >>>> Currently if one changes the nonzero matrix of the parallel matrix one is >>>> likely to get random confusing crashes due to memory corruption. But >>>> likely not the problem here. >>>> >>>> >>>> On Jun 23, 2022, at 2:23 PM, Mark Adams <mfad...@lbl.gov >>>> <mailto:mfad...@lbl.gov>> wrote: >>>> >>>> We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. >>>> Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass in >>>> the code below, where the local matrix is created (INITIAL), but in the >>>> next pass, when "REUSE" is used, he sees an invalid pointer. >>>> Matt found that it does have offloadmask == CPU. >>>> Maybe it is missing logic to put the output in same state as the input? >>>> >>>> Any ideas on this or should I just dig into it? >>>> >>>> Thanks, >>>> bool partial_setup_allowed = (pc->setupcalled && pc->flag != >>>> DIFFERENT_NONZERO_PATTERN); >>>> 199 if (amgx->nranks > 1) { >>>> 200 if (partial_setup_allowed) { >>>> 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, >>>> &amgx->localA)); // This path seems doesn't work by the time we reach AmgX >>>> API >>>> 203 } else { >>>> 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, >>>> &amgx->localA)); // This path works >>>> 206 } >>>> 207 } else { >>>> 208 amgx->localA = Pmat; >>>> 209 } >>>> 210 >>> >> >