Re: [petsc-dev] PetscMallocAlign for Cuda

2020-09-03 Thread Jeff Hammond
If you use cudaMallocManaged with host affinity, you can drop that into
PETSc malloc and it should “just work” including migrating to GPU when
touched. Or you can give it device affinity and it will migrate the other
way when the CPU touches it.

This is way more performance portable that system managed memory on the
Summit/Lassen systems, which can do unpleasant things unless you disable
NUMA balancing and use CUDA prefetch.

Jeff

On Wed, Sep 2, 2020 at 10:49 AM Mark Adams  wrote:

> OK good to know. I will now worry even less about making this very
> complete.
>
> On Wed, Sep 2, 2020 at 1:33 PM Barry Smith  wrote:
>
>>
>>
>>
>>   Mark,
>>
>>
>>
>>
>>
>>Currently you use directly the Nvidia provided mallocs cudaMalloc for
>> all mallocs on the GPU. See for example aijcusparse.cu.
>>
>>
>>
>>
>>
>>I will be using Stefano's work to start developing a unified PETSc
>> based system for all memory management but don't wait for that.
>>
>>
>>
>>
>>
>>Barry
>>
>>
>>
>>
>>
>>
>>
>>
>> > On Sep 2, 2020, at 8:58 AM, Mark Adams  wrote:
>>
>>
>> >
>>
>>
>> > PETSc mallocs seem to boil down to PetscMallocAlign. There are switches
>> in here but I don't see a Cuda malloc. THis would seem to be convenient if
>> I want to create an Object entirely on Cuda or any device.
>>
>>
>> >
>>
>>
>> > Are there any thoughts along these lines or should I just duplicate Mat
>> creation, for instance, by hand?
>>
>>
>>
>>
>>
>>
>
> --
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/


Re: [petsc-dev] PetscMallocAlign for Cuda

2020-09-02 Thread Mark Adams
OK good to know. I will now worry even less about making this very complete.

On Wed, Sep 2, 2020 at 1:33 PM Barry Smith  wrote:

>
>   Mark,
>
>Currently you use directly the Nvidia provided mallocs cudaMalloc for
> all mallocs on the GPU. See for example aijcusparse.cu.
>
>I will be using Stefano's work to start developing a unified PETSc
> based system for all memory management but don't wait for that.
>
>Barry
>
>
> > On Sep 2, 2020, at 8:58 AM, Mark Adams  wrote:
> >
> > PETSc mallocs seem to boil down to PetscMallocAlign. There are switches
> in here but I don't see a Cuda malloc. THis would seem to be convenient if
> I want to create an Object entirely on Cuda or any device.
> >
> > Are there any thoughts along these lines or should I just duplicate Mat
> creation, for instance, by hand?
>
>


Re: [petsc-dev] PetscMallocAlign for Cuda

2020-09-02 Thread Barry Smith


  Mark,

   Currently you use directly the Nvidia provided mallocs cudaMalloc for all 
mallocs on the GPU. See for example aijcusparse.cu. 

   I will be using Stefano's work to start developing a unified PETSc based 
system for all memory management but don't wait for that.

   Barry


> On Sep 2, 2020, at 8:58 AM, Mark Adams  wrote:
> 
> PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in 
> here but I don't see a Cuda malloc. THis would seem to be convenient if I 
> want to create an Object entirely on Cuda or any device. 
> 
> Are there any thoughts along these lines or should I just duplicate Mat 
> creation, for instance, by hand?



Re: [petsc-dev] PetscMallocAlign for Cuda

2020-09-02 Thread Jacob Faibussowitsch
I believe there are a few PetscMallocCuda impls in 
src/sys/memory/cuda/mcudahost.cu that seem to do what you are describing. If 
you are creating mats you can also consider cudaMallocPitch, but I’m not sure 
how that plays with the sparse storage impls that petsc mat uses. Seems more 
useful for dense.

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391

> On Sep 2, 2020, at 09:58, Mark Adams  wrote:
> 
> PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in 
> here but I don't see a Cuda malloc. THis would seem to be convenient if I 
> want to create an Object entirely on Cuda or any device. 
> 
> Are there any thoughts along these lines or should I just duplicate Mat 
> creation, for instance, by hand?



[petsc-dev] PetscMallocAlign for Cuda

2020-09-02 Thread Mark Adams
PETSc mallocs seem to boil down to PetscMallocAlign. There are switches in
here but I don't see a Cuda malloc. THis would seem to be convenient if I
want to create an Object entirely on Cuda or any device.

Are there any thoughts along these lines or should I just duplicate Mat
creation, for instance, by hand?