This is looking good. I'm not seeing the numerical problems, but I've just hid them by avoiding the GPU on coarse grids.
Should I submit a pull request now or test more or wait for Karl? On Sat, Jul 27, 2019 at 7:37 PM Mark Adams <mfad...@lbl.gov> wrote: > Barry, I fixed CUDA to pin to CPUs correctly for GAMG at least. There are > some hacks here that we can work on. > > I will start testing it tomorrow, but I am pretty sure that I have not > regressed. I am hoping that this will fix the numerical problems, which > seem to be associated with empty processors. > > I did need to touch code outside of GAMG and CUDA. It might be nice to > test this in a next. > > GAMG now puts all reduced processorg grids on the CPU. This could be > looked at in the future. > > > On Sat, Jul 27, 2019 at 1:00 PM Smith, Barry F. <bsm...@mcs.anl.gov> > wrote: > >> >> >> > On Jul 27, 2019, at 11:53 AM, Mark Adams <mfad...@lbl.gov> wrote: >> > >> > >> > On Sat, Jul 27, 2019 at 11:39 AM Smith, Barry F. <bsm...@mcs.anl.gov> >> wrote: >> > >> > Good catch. Thanks. Maybe the SeqCUDA has the same problem? >> > >> > THis is done (I may have done it). >> > >> > Now it seems to me that when you call VecPinToCPU you are setting up >> and don't have data, so this copy does not seem necessary. Maybe remove the >> copy here: >> > >> > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin) >> > { >> > PetscErrorCode ierr; >> > >> > PetscFunctionBegin; >> > V->pinnedtocpu = pin; >> > if (pin) { >> > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); ???? >> >> The copy from GPU should actually only do anything if the GPU already >> has data and PETSC_OFFLOAD_GPU. If the GPU does not have data >> the copy doesn't do anything. When one calls VecPinToCPU() one doesn't >> know where the data is so the call must be made, but it may do nothing >> >> Note that VecCUDACopyFromGPU() calls VecCUDAAllocateCheckHost() not >> VecCUDAAllocateCheck() so the GPU will not allocate space, >> VecCUDAAllocateCheck() is called from VecCUDACopyToGPU(). >> >> Yes, perhaps the naming could be more consistent: >> >> 1) in one place it is Host in an other place it is nothing >> 2) some places it is Host, Device, some places GPU,CPU >> >> Perhaps Karl can make these all consistent and simpler in his >> refactorization >> >> >> Barry >> >> >> > >> > or >> > >> > Not allocate the GPU if it is pinned by added in a check here: >> > >> > PetscErrorCode VecCUDAAllocateCheck(Vec v) >> > { >> > PetscErrorCode ierr; >> > cudaError_t err; >> > cudaStream_t stream; >> > Vec_CUDA *veccuda; >> > >> > PetscFunctionBegin; >> > if (!v->spptr) { >> > ierr = PetscMalloc(sizeof(Vec_CUDA),&v->spptr);CHKERRQ(ierr); >> > veccuda = (Vec_CUDA*)v->spptr; >> > if (v->valid_GPU_array != PETSC_OFFLOAD_CPU) { >> > err = >> cudaMalloc((void**)&veccuda->GPUarray_allocated,sizeof(PetscScalar)*((PetscBLASInt)v->map->n));CHKERRCUDA(err); >> > veccuda->GPUarray = veccuda->GPUarray_allocated; >> > err = cudaStreamCreate(&stream);CHKERRCUDA(err); >> > veccuda->stream = stream; >> > veccuda->hostDataRegisteredAsPageLocked = PETSC_FALSE; >> > if (v->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) { >> > if (v->data && ((Vec_Seq*)v->data)->array) { >> > v->valid_GPU_array = PETSC_OFFLOAD_CPU; >> > } else { >> > v->valid_GPU_array = PETSC_OFFLOAD_GPU; >> > } >> > } >> > } >> > } >> > PetscFunctionReturn(0); >> > } >> > >> > >> > >> > >> > >> > > On Jul 27, 2019, at 10:40 AM, Mark Adams <mfad...@lbl.gov> wrote: >> > > >> > > Yea, I just figured out the problem. VecDuplicate_MPICUDA did not >> call PinToCPU or even copy pinnedtocpu. It just copied ops, so I added and >> am testing: >> > > >> > > ierr = >> VecCreate_MPICUDA_Private(*v,PETSC_TRUE,w->nghost,0);CHKERRQ(ierr); >> > > vw = (Vec_MPI*)(*v)->data; >> > > ierr = PetscMemcpy((*v)->ops,win->ops,sizeof(struct >> _VecOps));CHKERRQ(ierr); >> > > ierr = VecPinToCPU(*v,win->pinnedtocpu);CHKERRQ(ierr); >> > > >> > > Thanks, >> > > >> > > On Sat, Jul 27, 2019 at 11:33 AM Smith, Barry F. <bsm...@mcs.anl.gov> >> wrote: >> > > >> > > I don't understand the context. Once a vector is pinned to the CPU >> the flag should be PETSC_OFFLOAD_CPU permanently until the pin to cpu is >> turned off. Do you have a pinned vector that has the value >> PETSC_OFFLOAD_GPU? For example here it is set to PETSC_OFFLOAD_CPU >> > > >> > > PetscErrorCode VecPinToCPU_MPICUDA(Vec V,PetscBool pin) >> > > { >> > > .... >> > > if (pin) { >> > > ierr = VecCUDACopyFromGPU(V);CHKERRQ(ierr); >> > > V->valid_GPU_array = PETSC_OFFLOAD_CPU; /* since the CPU code >> will likely change values in the vector */ >> > > >> > > >> > > Is there any way to reproduce the problem? >> > > >> > > Barry >> > > >> > > >> > > >> > > >> > > > On Jul 27, 2019, at 10:28 AM, Mark Adams <mfad...@lbl.gov> wrote: >> > > > >> > > > I'm not sure what to do here. The problem is that pinned-to-cpu >> vectors are calling VecCUDACopyFromGPU here. >> > > > >> > > > Should I set x->valid_GPU_array to something else, like >> PETSC_OFFLOAD_CPU, in PinToCPU so this block of code i s not executed? >> > > > >> > > > PetscErrorCode VecGetArray(Vec x,PetscScalar **a) >> > > > { >> > > > PetscErrorCode ierr; >> > > > #if defined(PETSC_HAVE_VIENNACL) >> > > > PetscBool is_viennacltype = PETSC_FALSE; >> > > > #endif >> > > > >> > > > PetscFunctionBegin; >> > > > PetscValidHeaderSpecific(x,VEC_CLASSID,1); >> > > > ierr = VecSetErrorIfLocked(x,1);CHKERRQ(ierr); >> > > > if (x->petscnative) { >> > > > #if defined(PETSC_HAVE_VIENNACL) || defined(PETSC_HAVE_CUDA) >> > > > if (x->valid_GPU_array == PETSC_OFFLOAD_GPU) { >> > > > #if defined(PETSC_HAVE_VIENNACL) >> > > > ierr = >> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr); >> > > > if (is_viennacltype) { >> > > > ierr = VecViennaCLCopyFromGPU(x);CHKERRQ(ierr); >> > > > } else >> > > > #endif >> > > > { >> > > > #if defined(PETSC_HAVE_CUDA) >> > > > ierr = VecCUDACopyFromGPU(x);CHKERRQ(ierr); >> > > > #endif >> > > > } >> > > > } else if (x->valid_GPU_array == PETSC_OFFLOAD_UNALLOCATED) { >> > > > #if defined(PETSC_HAVE_VIENNACL) >> > > > ierr = >> PetscObjectTypeCompareAny((PetscObject)x,&is_viennacltype,VECSEQVIENNACL,VECMPIVIENNACL,VECVIENNACL,"");CHKERRQ(ierr); >> > > > if (is_viennacltype) { >> > > > ierr = VecViennaCLAllocateCheckHost(x);CHKERRQ(ierr); >> > > > } else >> > > > #endif >> > > > { >> > > > #if defined(PETSC_HAVE_CUDA) >> > > > ierr = VecCUDAAllocateCheckHost(x);CHKERRQ(ierr); >> > > > #endif >> > > > } >> > > > } >> > > > #endif >> > > > *a = *((PetscScalar**)x->data); >> > > > } else { >> > > > >> > > > >> > > > On Tue, Jul 23, 2019 at 9:18 PM Smith, Barry F. <bsm...@mcs.anl.gov> >> wrote: >> > > > >> > > > Yes, it needs to be able to switch back and forth between the CPU >> and GPU methods so you need to move into it the setting of the methods that >> is currently directly in the create method. See how >> MatConvert_SeqAIJ_SeqAIJViennaCL() calls ierr = >> MatPinToCPU_SeqAIJViennaCL(A,PETSC_FALSE);CHKERRQ(ierr); to set the methods >> for the GPU initially. >> > > > >> > > > Barry >> > > > >> > > > >> > > > > On Jul 23, 2019, at 7:32 PM, Mark Adams <mfad...@lbl.gov> wrote: >> > > > > >> > > > > >> > > > > What are the symptoms of it not working? Does it appear to be >> still copying the matrices to the GPU? then running the functions on the >> GPU? >> > > > > >> > > > > >> > > > > The object is dispatching the CUDA mat-vec etc. >> > > > > >> > > > > I suspect the pinning is incompletely done for CUDA (and >> MPIOpenCL) matrices. >> > > > > >> > > > > >> > > > > Yes, git grep MatPinToCPU shows stuff for ViennaCL but not CUDA. >> > > > > >> > > > > I guess I can add something like this below. Do we need to set >> the device methods? They are already set when this method is set, right? >> > > > > >> > > > > We need the equivalent of >> > > > > >> > > > > static PetscErrorCode MatPinToCPU_SeqAIJViennaCL(Mat A,PetscBool >> flg) >> > > > > { >> > > > > PetscFunctionBegin; >> > > > > A->pinnedtocpu = flg; >> > > > > if (flg) { >> > > > > A->ops->mult = MatMult_SeqAIJ; >> > > > > A->ops->multadd = MatMultAdd_SeqAIJ; >> > > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJ; >> > > > > A->ops->duplicate = MatDuplicate_SeqAIJ; >> > > > > } else { >> > > > > A->ops->mult = MatMult_SeqAIJViennaCL; >> > > > > A->ops->multadd = MatMultAdd_SeqAIJViennaCL; >> > > > > A->ops->assemblyend = MatAssemblyEnd_SeqAIJViennaCL; >> > > > > A->ops->destroy = MatDestroy_SeqAIJViennaCL; >> > > > > A->ops->duplicate = MatDuplicate_SeqAIJViennaCL; >> > > > > } >> > > > > PetscFunctionReturn(0); >> > > > > } >> > > > > >> > > > > for MPIViennaCL and MPISeqAIJ Cusparse but it doesn't look like >> it has been written yet. >> > > > > >> > > > > >> > > > > > >> > > > > > It does not seem to work. It does not look like CUDA has an >> MatCreateVecs. Should I add one and copy this flag over? >> > > > > >> > > > > We do need this function. But I don't see how it relates to >> pinning. When the matrix is pinned to the CPU we want it to create CPU >> vectors which I assume it does. >> > > > > >> > > > > >> > > > > > >> > > > > > Mark >> > > > > >> > > > >> > > >> > >> >>