Re: [petsc-dev] PetscMallocAlign for Cuda
If you use cudaMallocManaged with host affinity, you can drop that into PETSc malloc and it should “just work” including migrating to GPU when touched. Or you can give it device affinity and it will migrate the other way when the CPU touches it. This is way more performance portable that system managed memory on the Summit/Lassen systems, which can do unpleasant things unless you disable NUMA balancing and use CUDA prefetch. Jeff On Wed, Sep 2, 2020 at 10:49 AM Mark Adams wrote: > OK good to know. I will now worry even less about making this very > complete. > > On Wed, Sep 2, 2020 at 1:33 PM Barry Smith wrote: > >> >> >> >> Mark, >> >> >> >> >> >>Currently you use directly the Nvidia provided mallocs cudaMalloc for >> all mallocs on the GPU. See for example aijcusparse.cu. >> >> >> >> >> >>I will be using Stefano's work to start developing a unified PETSc >> based system for all memory management but don't wait for that. >> >> >> >> >> >>Barry >> >> >> >> >> >> >> >> >> > On Sep 2, 2020, at 8:58 AM, Mark Adams wrote: >> >> >> > >> >> >> > PETSc mallocs seem to boil down to PetscMallocAlign. There are switches >> in here but I don't see a Cuda malloc. THis would seem to be convenient if >> I want to create an Object entirely on Cuda or any device. >> >> >> > >> >> >> > Are there any thoughts along these lines or should I just duplicate Mat >> creation, for instance, by hand? >> >> >> >> >> >> > > -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/
Re: [petsc-dev] PetscMallocAlign for Cuda
OK good to know. I will now worry even less about making this very complete. On Wed, Sep 2, 2020 at 1:33 PM Barry Smith wrote: > > Mark, > >Currently you use directly the Nvidia provided mallocs cudaMalloc for > all mallocs on the GPU. See for example aijcusparse.cu. > >I will be using Stefano's work to start developing a unified PETSc > based system for all memory management but don't wait for that. > >Barry > > > > On Sep 2, 2020, at 8:58 AM, Mark Adams wrote: > > > > PETSc mallocs seem to boil down to PetscMallocAlign. There are switches > in here but I don't see a Cuda malloc. THis would seem to be convenient if > I want to create an Object entirely on Cuda or any device. > > > > Are there any thoughts along these lines or should I just duplicate Mat > creation, for instance, by hand? > >
Re: [petsc-dev] PetscMallocAlign for Cuda
Mark, Currently you use directly the Nvidia provided mallocs cudaMalloc for all mallocs on the GPU. See for example aijcusparse.cu. I will be using Stefano's work to start developing a unified PETSc based system for all memory management but don't wait for that. Barry > On Sep 2, 2020, at 8:58 AM, Mark Adams wrote: > > PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in > here but I don't see a Cuda malloc. THis would seem to be convenient if I > want to create an Object entirely on Cuda or any device. > > Are there any thoughts along these lines or should I just duplicate Mat > creation, for instance, by hand?
Re: [petsc-dev] PetscMallocAlign for Cuda
I believe there are a few PetscMallocCuda impls in src/sys/memory/cuda/mcudahost.cu that seem to do what you are describing. If you are creating mats you can also consider cudaMallocPitch, but I’m not sure how that plays with the sparse storage impls that petsc mat uses. Seems more useful for dense. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 > On Sep 2, 2020, at 09:58, Mark Adams wrote: > > PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in > here but I don't see a Cuda malloc. THis would seem to be convenient if I > want to create an Object entirely on Cuda or any device. > > Are there any thoughts along these lines or should I just duplicate Mat > creation, for instance, by hand?
[petsc-dev] PetscMallocAlign for Cuda
PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in here but I don't see a Cuda malloc. THis would seem to be convenient if I want to create an Object entirely on Cuda or any device. Are there any thoughts along these lines or should I just duplicate Mat creation, for instance, by hand?