I am not seeing this response, I see my "hstack" comment last. https://gitlab.com/petsc/petsc/-/merge_requests/4323
On Thu, Jun 23, 2022 at 4:37 PM Barry Smith <bsm...@petsc.dev> wrote: > > I have responded in the MR, which has all the context and the code. > Please move this conversation from petsc-dev to the MR. Note you can use > the little cartoon cloud symbol (upper write of the sub window with my > text) to reply to my post and keep everything in a thread for clarity. > > We are confused because it seems you are trying a variety of things and > we don't know how the different things you tried resulted in the multiple > errors you reported. > > > > On Jun 23, 2022, at 3:59 PM, Matthew Martineau <mmartin...@nvidia.com> > wrote: > > I checked in the changes and some debugging statements. > > PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); > PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, > &is_dev_ptrs, MATAIJCUSPARSE, MATSEQAIJCUSPARSE, MATMPIAIJCUSPARSE, "")); > > Then the call returns false. If we instead call PetscObjectTypeCompareAny > on Pmat then it returns true. If you print the type of the matrices: > > > localA seqaij > > Pmat mpiaijcusparse > > If you subsequently call MatSeqAIJCUSPARSEGetArrayRead on localA then it > errors (presumably because of the type mismatch). > > If we call MatSeqAIJGetArrayRead on localA and then pass the `values` to > AmgX it seems to detect that the pointer is a device mapped pointer but > that it is invalid. > > PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, &amgx->localA)); > PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Seems to > return invalid pointer, but I’ll investigate more > > This doesn’t reproduce if we call: > > PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, &amgx->localA)); > PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); // Pointer > appears to be valid and we converge > > Essentially all I want to achieve is that when we are parallel, we fetch > the local part of A and the device pointer to the matrix values from that > structure so that we can pass to AmgX. Preferring whichever API calls are > the most efficient. > > > *From:* Stefano Zampini <stefano.zamp...@gmail.com> > *Sent:* 23 June 2022 20:55 > *To:* Mark Adams <mfad...@lbl.gov> > *Cc:* Barry Smith <bsm...@petsc.dev>; For users of the development > version of PETSc <petsc-dev@mcs.anl.gov>; Matthew Martineau < > mmartin...@nvidia.com> > *Subject:* Re: [petsc-dev] MatMPIAIJGetLocalMat problem with GPUs > > *External email: Use caution opening links or attachments* > > The logic is wrong. It should check for MATSEQAIJCUSPARSE. > > On Thu, Jun 23, 2022, 21:36 Mark Adams <mfad...@lbl.gov> wrote: > > > > On Thu, Jun 23, 2022 at 3:02 PM Barry Smith <bsm...@petsc.dev> wrote: > > > It looks like the current code copies the nonzero values to the CPU from > the MPI matrix (with the calls > PetscCall(MatSeqAIJGetArrayRead(mpimat->A,&aav)); > PetscCall(MatSeqAIJGetArrayRead(mpimat->B,&bav));, then copies them into > the CPU memory of the Seq matrix. When the matrix entries are next accessed > on the GPU it should automatically copy them down to the GPU. So the code > looks ok even for GPUs. We'll need to see the full error message with what > the "invalid pointer" is. > > > I showed Matt how to peek into offloadmask and he found that it is a host > state, but this is not the issue. The access method should do the copy to > the device. > > I am thinking the logic here might be wrong. (Matt fixed "VEC" --> "MAT" > in the comparison below). > > Matt, is the issue that you are calling *MatSeqAIJCUSPARSEGetArrayRead* and > getting a host pointer? > > I think the state of amgx->localA after the call to > MatSeqAIJCUSPARSEGetArrayRead should be "BOTH" because this copied the data > to the device so they are both valid and you should have device data. > > 211 PetscBool is_dev_ptrs; > 212 PetscCall(PetscObjectTypeCompareAny((PetscObject)amgx->localA, > &is_dev_ptrs, VECCUDA, VECMPICUDA, VECSEQCUDA, "")); > 213 > 214 if (is_dev_ptrs) { > > *216 PetscCall(MatSeqAIJCUSPARSEGetArrayRead(amgx->localA, > &amgx->values));*217 } else { > 219 PetscCall(MatSeqAIJGetArrayRead(amgx->localA, &amgx->values)); > 220 } > > > > Barry > > > Yes this routine is terribly inefficient for GPU matrices, it needs to be > specialized to not use the GPU memory but that is a separate issue from > there being bugs in the current code. > > The code also seems to implicitly assume the parallel matrix has the > same nonzero pattern with a reuse. This should be checked with each use by > stashing the nonzero state of the matrix into the sequential matrix and > making sure the parallel matrix has that same stashed value each time. > Currently if one changes the nonzero matrix of the parallel matrix one is > likely to get random confusing crashes due to memory corruption. But likely > not the problem here. > > > On Jun 23, 2022, at 2:23 PM, Mark Adams <mfad...@lbl.gov> wrote: > > We have a bug in the AMGx test snes_tests-ex13_amgx in parallel. > Matt Martineau found that MatMPIAIJGetLocalMat worked in the first pass in > the code below, where the local matrix is created (INITIAL), but in the > next pass, when "REUSE" is used, he sees an invalid pointer. > Matt found that it does have offloadmask == CPU. > Maybe it is missing logic to put the output in same state as the input? > > Any ideas on this or should I just dig into it? > > Thanks, > > bool partial_setup_allowed = (pc->setupcalled && pc->flag != > DIFFERENT_NONZERO_PATTERN); > > 199 if (amgx->nranks > 1) { > > 200 if (partial_setup_allowed) { > > 202 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_REUSE_MATRIX, > &amgx->localA)); // This path seems doesn't work by the time we reach AmgX API > > 203 } else { > > 205 PetscCall(MatMPIAIJGetLocalMat(Pmat, MAT_INITIAL_MATRIX, > &amgx->localA)); // This path works > > 206 } > > 207 } else { > > 208 amgx->localA = Pmat; > > 209 } > > 210 > > >