Re: [petsc-users] Using PETSc with GPU

2019-03-15 Thread Yuyun Yang via petsc-users
Good point, thank you so much for the advice! I'll take that into consideration.

Best regards,
Yuyun

Get Outlook for iOS

From: Jed Brown 
Sent: Friday, March 15, 2019 7:06:29 PM
To: Yuyun Yang; Smith, Barry F.
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] Using PETSc with GPU

Yuyun Yang via petsc-users  writes:

> Currently we are forming the sparse matrices explicitly, but I think the goal 
> is to move towards matrix-free methods and use a stencil, which I suppose is 
> good to use GPUs for and more efficient. On the other hand, I've also read 
> about matrix-free operations in the manual just on the CPUs. Would there be 
> any benefit then to switching to GPU (looks like matrix-free in PETSc is 
> rather straightforward to use, whereas writing the kernel function for GPU 
> stencil would require quite a lot of work)?

It all depends what kind of computation happens in there and how well
you can implement it for the GPU.  It's important to have a clear idea
of what you expect to achieve.  For example, if you write an excellent
GPU implementation of your SNES residual/matrix-free Jacobian, it might
be 2-3x faster than a good CPU implementation on hardware of similar
cost ($ or Watt).  But you still need preconditioning, which is usually
at least half the work, and perhaps a preconditioner runs the same speed
on GPU and CPU (CPU version often converges a bit faster;
preconditioning operations are often less amenable to GPUs).  So after
all that effort, and now with code that is likely harder to maintain,
you go from 4 seconds per solve to 3 seconds per solve on hardware of
the same cost.  Is that worth it?

Maybe, but you probably want that to be in the critical path for your
research and/or customers.


Re: [petsc-users] Using PETSc with GPU

2019-03-15 Thread Jed Brown via petsc-users
Yuyun Yang via petsc-users  writes:

> Currently we are forming the sparse matrices explicitly, but I think the goal 
> is to move towards matrix-free methods and use a stencil, which I suppose is 
> good to use GPUs for and more efficient. On the other hand, I've also read 
> about matrix-free operations in the manual just on the CPUs. Would there be 
> any benefit then to switching to GPU (looks like matrix-free in PETSc is 
> rather straightforward to use, whereas writing the kernel function for GPU 
> stencil would require quite a lot of work)?

It all depends what kind of computation happens in there and how well
you can implement it for the GPU.  It's important to have a clear idea
of what you expect to achieve.  For example, if you write an excellent
GPU implementation of your SNES residual/matrix-free Jacobian, it might
be 2-3x faster than a good CPU implementation on hardware of similar
cost ($ or Watt).  But you still need preconditioning, which is usually
at least half the work, and perhaps a preconditioner runs the same speed
on GPU and CPU (CPU version often converges a bit faster;
preconditioning operations are often less amenable to GPUs).  So after
all that effort, and now with code that is likely harder to maintain,
you go from 4 seconds per solve to 3 seconds per solve on hardware of
the same cost.  Is that worth it?

Maybe, but you probably want that to be in the critical path for your
research and/or customers.


Re: [petsc-users] Using PETSc with GPU

2019-03-15 Thread Yuyun Yang via petsc-users
Currently we are forming the sparse matrices explicitly, but I think the goal 
is to move towards matrix-free methods and use a stencil, which I suppose is 
good to use GPUs for and more efficient. On the other hand, I've also read 
about matrix-free operations in the manual just on the CPUs. Would there be any 
benefit then to switching to GPU (looks like matrix-free in PETSc is rather 
straightforward to use, whereas writing the kernel function for GPU stencil 
would require quite a lot of work)?

Thanks!
Yuyun

Get Outlook for iOS

From: Smith, Barry F. 
Sent: Friday, March 15, 2019 5:43:23 PM
To: Yuyun Yang
Cc: Matthew Knepley; petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] Using PETSc with GPU



> On Mar 15, 2019, at 7:33 PM, Yuyun Yang via petsc-users 
>  wrote:
>
> Thanks Matt, I've seen that page, but there isn't that much documentation, 
> and there is only one CUDA example, so I wanted to check if there may be more 
> references or examples somewhere else. We have very large linear systems that 
> need to be solved every time step, and which involves matrix-matrix 
> multiplications,

where do these matrix-matrix multiplications appear? Are you providing a 
"matrix-free" based operator for your linear system where you apply 
matrix-vector operations via a subroutine call? Or are you explicitly forming 
sparse matrices and using them to define the operator?



> so we thought GPU could have some benefits, but we are unsure how difficult 
> it is to migrate parts of the code to GPU with PETSc. From that webpage it 
> seems like we only need to specify the Vec / Mat option on the command line 
> and maybe change a few functions to have CUDA? The CUDA example however also 
> involves using thrust and programming a kernel function, so I want to make 
> sure I know how this works before trying to implement.

   How much, if any, CUDA/GPU code you have to write depends on what you want 
to have done on the GPU. If you provide a sparse matrix and only want  the 
system solve to take place on the GPU then you don't need to write any CUDA/GPU 
code, you just use the "CUDA" vector and matrix class. If you are doing 
"matrix-free" solves and you provide the routine that performs the 
matrix-vector product then you need to write/optimize that routine for CUDA/GPU.

   Barry

>
> Thanks a lot,
> Yuyun
>
> Get Outlook for iOS
> From: Matthew Knepley 
> Sent: Friday, March 15, 2019 2:54:02 PM
> To: Yuyun Yang
> Cc: petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] Using PETSc with GPU
>
> On Fri, Mar 15, 2019 at 5:30 PM Yuyun Yang via petsc-users 
>  wrote:
> Hello team,
>
>
>
> Our group is thinking of using GPUs for the linear solves in our code, which 
> is written in PETSc. I was reading the 2013 book chapter on implementation of 
> PETSc using GPUs but wonder if there is any more updated reference that I 
> check out? I also saw one example cuda code online (using thrust), but would 
> like to check with you if there is a more complete documentation of how the 
> GPU implementation is done?
>
>
> Have you seen this page? https://www.mcs.anl.gov/petsc/features/gpus.html
>
> Also, before using GPUs, I would take some time to understand what you think 
> the possible benefit can be.
> For example, there is almost no benefit is you use BLAS1, and you would have 
> a huge maintenance burden
> with a different toolchain. This is also largely true for SpMV, since the 
> bandwidth difference between CPUs
> and GPUs is now not much. So you really should have some kind of flop 
> intensive (BLAS3-like) work in there
> somewhere or its hard to see your motivation.
>
>   Thanks,
>
>  Matt
>
>
> Thanks very much!
>
>
>
> Best regards,
>
> Yuyun
>
>
>
> --
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] Using PETSc with GPU

2019-03-15 Thread Smith, Barry F. via petsc-users



> On Mar 15, 2019, at 7:33 PM, Yuyun Yang via petsc-users 
>  wrote:
> 
> Thanks Matt, I've seen that page, but there isn't that much documentation, 
> and there is only one CUDA example, so I wanted to check if there may be more 
> references or examples somewhere else. We have very large linear systems that 
> need to be solved every time step, and which involves matrix-matrix 
> multiplications,

where do these matrix-matrix multiplications appear? Are you providing a 
"matrix-free" based operator for your linear system where you apply 
matrix-vector operations via a subroutine call? Or are you explicitly forming 
sparse matrices and using them to define the operator?



> so we thought GPU could have some benefits, but we are unsure how difficult 
> it is to migrate parts of the code to GPU with PETSc. From that webpage it 
> seems like we only need to specify the Vec / Mat option on the command line 
> and maybe change a few functions to have CUDA? The CUDA example however also 
> involves using thrust and programming a kernel function, so I want to make 
> sure I know how this works before trying to implement.

   How much, if any, CUDA/GPU code you have to write depends on what you want 
to have done on the GPU. If you provide a sparse matrix and only want  the 
system solve to take place on the GPU then you don't need to write any CUDA/GPU 
code, you just use the "CUDA" vector and matrix class. If you are doing 
"matrix-free" solves and you provide the routine that performs the 
matrix-vector product then you need to write/optimize that routine for CUDA/GPU.

   Barry

> 
> Thanks a lot,
> Yuyun
> 
> Get Outlook for iOS
> From: Matthew Knepley 
> Sent: Friday, March 15, 2019 2:54:02 PM
> To: Yuyun Yang
> Cc: petsc-users@mcs.anl.gov
> Subject: Re: [petsc-users] Using PETSc with GPU
>  
> On Fri, Mar 15, 2019 at 5:30 PM Yuyun Yang via petsc-users 
>  wrote:
> Hello team,
> 
>  
> 
> Our group is thinking of using GPUs for the linear solves in our code, which 
> is written in PETSc. I was reading the 2013 book chapter on implementation of 
> PETSc using GPUs but wonder if there is any more updated reference that I 
> check out? I also saw one example cuda code online (using thrust), but would 
> like to check with you if there is a more complete documentation of how the 
> GPU implementation is done?
> 
> 
> Have you seen this page? https://www.mcs.anl.gov/petsc/features/gpus.html
> 
> Also, before using GPUs, I would take some time to understand what you think 
> the possible benefit can be.
> For example, there is almost no benefit is you use BLAS1, and you would have 
> a huge maintenance burden
> with a different toolchain. This is also largely true for SpMV, since the 
> bandwidth difference between CPUs
> and GPUs is now not much. So you really should have some kind of flop 
> intensive (BLAS3-like) work in there
> somewhere or its hard to see your motivation.
> 
>   Thanks,
> 
>  Matt
>  
>  
> Thanks very much!
> 
>  
> 
> Best regards,
> 
> Yuyun
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-users] Using PETSc with GPU

2019-03-15 Thread Yuyun Yang via petsc-users
Thanks Matt, I've seen that page, but there isn't that much documentation, and 
there is only one CUDA example, so I wanted to check if there may be more 
references or examples somewhere else. We have very large linear systems that 
need to be solved every time step, and which involves matrix-matrix 
multiplications, so we thought GPU could have some benefits, but we are unsure 
how difficult it is to migrate parts of the code to GPU with PETSc. From that 
webpage it seems like we only need to specify the Vec / Mat option on the 
command line and maybe change a few functions to have CUDA? The CUDA example 
however also involves using thrust and programming a kernel function, so I want 
to make sure I know how this works before trying to implement.

Thanks a lot,
Yuyun

Get Outlook for iOS

From: Matthew Knepley 
Sent: Friday, March 15, 2019 2:54:02 PM
To: Yuyun Yang
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] Using PETSc with GPU

On Fri, Mar 15, 2019 at 5:30 PM Yuyun Yang via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello team,

Our group is thinking of using GPUs for the linear solves in our code, which is 
written in PETSc. I was reading the 2013 book chapter on implementation of 
PETSc using GPUs but wonder if there is any more updated reference that I check 
out? I also saw one example cuda code online (using thrust), but would like to 
check with you if there is a more complete documentation of how the GPU 
implementation is done?

Have you seen this page? https://www.mcs.anl.gov/petsc/features/gpus.html

Also, before using GPUs, I would take some time to understand what you think 
the possible benefit can be.
For example, there is almost no benefit is you use BLAS1, and you would have a 
huge maintenance burden
with a different toolchain. This is also largely true for SpMV, since the 
bandwidth difference between CPUs
and GPUs is now not much. So you really should have some kind of flop intensive 
(BLAS3-like) work in there
somewhere or its hard to see your motivation.

  Thanks,

 Matt


Thanks very much!

Best regards,
Yuyun


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


[petsc-users] Using PETSc with GPU

2019-03-15 Thread Yuyun Yang via petsc-users
Hello team,

Our group is thinking of using GPUs for the linear solves in our code, which is 
written in PETSc. I was reading the 2013 book chapter on implementation of 
PETSc using GPUs but wonder if there is any more updated reference that I check 
out? I also saw one example cuda code online (using thrust), but would like to 
check with you if there is a more complete documentation of how the GPU 
implementation is done?

Thanks very much!

Best regards,
Yuyun