Thanks Mark and Matt for the comments.
________________________________
From: Mark Adams <mfad...@lbl.gov>
Sent: Saturday, April 13, 2024 12:53 PM
To: Matthew Knepley <knep...@gmail.com>
Cc: Ng, Cho-Kuen <c...@slac.stanford.edu>; Barry Smith <bsm...@petsc.dev>; 
petsc-users <petsc-users@mcs.anl.gov>; Jacob Faibussowitsch 
<jacob....@gmail.com>
Subject: Re: [petsc-users] Using PETSc GPU backend

As Matt said, high- frequency Helmholtz is very hard, low-frequency is doable 
by using a larger coarse grid (you have a tiny coarse grid).

On Sat, Apr 13, 2024 at 3:04 PM Matthew Knepley 
<knep...@gmail.com<mailto:knep...@gmail.com>> wrote:
On Fri, Apr 12, 2024 at 8:19 PM Ng, Cho-Kuen via petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
Mark. The FEM problem is of high-frequency Helmholtz type for the Maxwell 
equation. The combination "-ksp_type groppcg -pc_type gamg" worked for this 
small problem and the results agreed well with those obtained from the direct 
solver
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Mark.

The FEM problem is of high-frequency Helmholtz type for the Maxwell equation. 
The combination "-ksp_type groppcg -pc_type gamg" worked for this small problem 
and the results agreed well with those obtained from the direct solver MUMPS. 
However, this ksp-pc combination failed to converge for much larger problem 
sizes. What would be a good combination to solve this kind of large-scale 
problems where direct solvers would not be able to solve?

Solving high frequency Helmholtz is a hard research problem. There are no easy 
answers. I know of no
real solutions for C0 FEM. it seems to be a bad discretization for this from 
the start. I have not seen anything reliable that is not mostly direct, but 
this is not what I do for a living so something might exist,
although not in any publicly available software.

  Thanks,

     Matt

Thanks,
Cho
________________________________
From: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
Sent: Friday, April 12, 2024 4:52 PM
To: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
Cc: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>; 
petsc-users <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>; Jacob 
Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>>
Subject: Re: [petsc-users] Using PETSc GPU backend

As Barry said, this is a bit small but the performance looks reasonable.
The solver does very badly, mathematically.
I would try hypre to get another data point.
You could also try 'cg' to check that the pipelined version is not a problem.
Mark

On Fri, Apr 12, 2024 at 3:54 PM Barry Smith 
<bsm...@petsc.dev<mailto:bsm...@petsc.dev>> wrote:
800k is a pretty small problem for GPUs. We would need to see the runs with 
output from -ksp_view -log_view to see if the timing results are reasonable. On 
Apr 12, 2024, at 1: 48 PM, Ng, Cho-Kuen <cho@ slac. stanford. edu> wrote: I 
performed
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

  800k is a pretty small problem for GPUs.

  We would need to see the runs with output from -ksp_view -log_view to see if 
the timing results are reasonable.

On Apr 12, 2024, at 1:48 PM, Ng, Cho-Kuen 
<c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:

I performed tests on comparison using KSP with and without cuda backend on 
NERSC's Perlmutter. For a finite element solve with 800k degrees of freedom, 
the best times obtained using MPI and MPI+GPU were

o MPI - 128 MPI tasks, 27 s

o MPI+GPU - 4 MPI tasks, 4 GPU's, 32 s

Is that the performance one would expect using the hybrid mode of computation. 
Attached image shows the scaling on a single node.

Thanks,
Cho
________________________________
From: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
Sent: Saturday, August 12, 2023 8:08 AM
To: Jacob Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>>
Cc: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>; petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Using PETSc GPU backend

Thanks Jacob.
________________________________
From: Jacob Faibussowitsch <jacob....@gmail.com<mailto:jacob....@gmail.com>>
Sent: Saturday, August 12, 2023 5:02 AM
To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
Cc: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>; petsc-users 
<petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] Using PETSc GPU backend

> Can petsc show the number of GPUs used?

-device_view

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)

> On Aug 12, 2023, at 00:53, Ng, Cho-Kuen via petsc-users 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>
> Barry,
>
> I tried again today on Perlmutter and running on multiple GPU nodes worked. 
> Likely, I had messed up something the other day. Also, I was able to have 
> multiple MPI tasks on a GPU using Nvidia MPS. The petsc output shows the 
> number of MPI tasks:
>
> KSP Object: 32 MPI processes
>
> Can petsc show the number of GPUs used?
>
> Thanks,
> Cho
>
> From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
> Sent: Wednesday, August 9, 2023 4:09 PM
> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
> Subject: Re: [petsc-users] Using PETSc GPU backend
>
>   We would need more information about "hanging". Do PETSc examples and tiny 
> problems "hang" on multiple nodes? If you run with -info what are the last 
> messages printed? Can you run with a debugger to see where it is "hanging"?
>
>
>
>> On Aug 9, 2023, at 5:59 PM, Ng, Cho-Kuen 
>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>
>> Barry and Matt,
>>
>> Thanks for your help. Now I can use petsc GPU backend on Perlmutter: 1 node, 
>> 4 MPI tasks and 4 GPUs. However, I ran into problems with multiple nodes: 2 
>> nodes, 8 MPI tasks and 8 GPUs. The run hung on KSPSolve. How can I fix this?
>>
>> Best,
>> Cho
>>
>>  From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
>> Sent: Monday, July 17, 2023 6:58 AM
>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>
>>  The examples that use DM, in particular DMDA all trivially support using 
>> the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda
>>
>>
>>
>>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen 
>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>>
>>> Barry,
>>>
>>> Thank you so much for the clarification.
>>>
>>> I see that ex104.c and ex300.c use  MatXAIJSetPreallocation(). Are there 
>>> other tutorials available?
>>>
>>> Cho
>>>  From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
>>> Sent: Saturday, July 15, 2023 8:36 AM
>>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>
>>>      Cho,
>>>
>>>     We currently have a crappy API for turning on GPU support, and our 
>>> documentation is misleading in places.
>>>
>>>     People constantly say "to use GPU's with PETSc you only need to use 
>>> -mat_type aijcusparse (for example)" This is incorrect.
>>>
>>>  This does not work with code that uses the convenience Mat constructors 
>>> such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you 
>>> use the constructor approach of MatCreate(), MatSetSizes(), 
>>> MatSetFromOptions(), MatXXXSetPreallocation(). ...  Similarly you need to 
>>> use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda
>>>
>>>    If you use DM to create the matrices and vectors then you can use 
>>> -dm_mat_type aijcusparse -dm_vec_type cuda
>>>
>>>    Sorry for the confusion.
>>>
>>>    Barry
>>>
>>>
>>>
>>>
>>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley 
>>>> <knep...@gmail.com<mailto:knep...@gmail.com>> wrote:
>>>>
>>>> On Sat, Jul 15, 2023 at 1:44 AM Ng, Cho-Kuen 
>>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>>> Matt,
>>>>
>>>> After inserting 2 lines in the code:
>>>>
>>>>   ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
>>>>   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
>>>>   ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>>>>                       d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>>>>
>>>> "There are no unused options." However, there is no improvement on the GPU 
>>>> performance.
>>>>
>>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you 
>>>> created in steps 1 and 2. This is detailed in the manual.
>>>>
>>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before 
>>>> MatSetFromOptions().
>>>>
>>>>   THanks,
>>>>
>>>>     Matt
>>>>  Thanks,
>>>> Cho
>>>>
>>>>  From: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>
>>>> Sent: Friday, July 14, 2023 5:57 PM
>>>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>> Cc: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>; Mark Adams 
>>>> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>; 
>>>> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>  On Fri, Jul 14, 2023 at 7:57 PM Ng, Cho-Kuen 
>>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>>> I managed to pass the following options to PETSc using a GPU node on 
>>>> Perlmutter.
>>>>
>>>>     -mat_type aijcusparse -vec_type cuda -log_view -options_left
>>>>
>>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task.
>>>>
>>>> o #PETSc Option Table entries:
>>>>    -log_view
>>>>    -mat_type aijcusparse
>>>>    -options_left
>>>>    -vec_type cuda
>>>>    #End of PETSc Option Table entries
>>>>    WARNING! There are options you set that were not used!
>>>>    WARNING! could be spelling mistake, etc!
>>>>    There is one unused database option. It is:
>>>>    Option left: name:-mat_type value: aijcusparse
>>>>
>>>> The -mat_type option has not been used. In the application code, we use
>>>>
>>>>     ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>>>>              d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>>>>
>>>>
>>>> If you create the Mat this way, then you need MatSetFromOptions() in order 
>>>> to set the type from the command line.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>  o The percent flops on the GPU for KSPSolve is 17%.
>>>>
>>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order 
>>>> of magnitude slower. How can I improve the GPU performance?
>>>>
>>>> Thanks,
>>>> Cho
>>>>  From: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>> Sent: Friday, June 30, 2023 7:57 AM
>>>> To: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>; Mark Adams 
>>>> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>> Cc: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>; 
>>>> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>  Barry, Mark and Matt,
>>>>
>>>> Thank you all for the suggestions. I will modify the code so we can pass 
>>>> runtime options.
>>>>
>>>> Cho
>>>>  From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
>>>> Sent: Friday, June 30, 2023 7:01 AM
>>>> To: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>> Cc: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>; Ng, 
>>>> Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>; 
>>>> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>
>>>>   Note that options like -mat_type aijcusparse  -vec_type cuda only work 
>>>> if the program is set up to allow runtime swapping of matrix and vector 
>>>> types. If you have a call to MatCreateMPIAIJ() or other specific types 
>>>> then then these options do nothing but because Mark had you use 
>>>> -options_left the program will tell you at the end that it did not use the 
>>>> option so you will know.
>>>>
>>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams 
>>>>> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>> wrote:
>>>>>
>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args 
>>>>> and you run:
>>>>>
>>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left
>>>>>
>>>>> Mark
>>>>>
>>>>> On Fri, Jun 30, 2023 at 6:16 AM Matthew Knepley 
>>>>> <knep...@gmail.com<mailto:knep...@gmail.com>> wrote:
>>>>> On Fri, Jun 30, 2023 at 1:13 AM Ng, Cho-Kuen via petsc-users 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>>>>> Mark,
>>>>>
>>>>> The application code reads in parameters from an input file, where we can 
>>>>> put the PETSc runtime options. Then we pass the options to 
>>>>> PetscInitialize(...). Does that sounds right?
>>>>>
>>>>> PETSc will read command line argument automatically in PetscInitialize() 
>>>>> unless you shut it off.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>  Cho
>>>>>  From: Ng, Cho-Kuen 
>>>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>>> Sent: Thursday, June 29, 2023 8:32 PM
>>>>> To: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>>  Mark,
>>>>>
>>>>> Thanks for the information. How do I put the runtime options for the 
>>>>> executable, say, a.out, which does not have the provision to append 
>>>>> arguments? Do I need to change the C++ main to read in the options?
>>>>>
>>>>> Cho
>>>>>  From: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>>> Sent: Thursday, June 29, 2023 5:55 PM
>>>>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>>  Run with options: -mat_type aijcusparse -vec_type cuda -log_view 
>>>>> -options_left
>>>>> The last column of the performance data (from -log_view) will be the 
>>>>> percent flops on the GPU. Check that that is > 0.
>>>>>
>>>>> The end of the output will list the options that were used and options 
>>>>> that were _not_ used (if any). Check that there are no options left.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Thu, Jun 29, 2023 at 7:50 PM Ng, Cho-Kuen via petsc-users 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" 
>>>>> and used it by "spack load petsc/fwge6pf". Then I compiled the 
>>>>> application code (purely CPU code) linking to the petsc package, hoping 
>>>>> that I can get performance improvement using the petsc GPU backend. 
>>>>> However, the timing was the same using the same number of MPI tasks with 
>>>>> and without GPU accelerators. Have I missed something in the process, for 
>>>>> example, setting up PETSc options at runtime to use the GPU backend?
>>>>>
>>>>> Thanks,
>>>>> Cho
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their 
>>>>> experiments is infinitely more interesting than any results to which 
>>>>> their experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
>>>>>  
>>>>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
>>>>  
>>>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
>>>>  
>>>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>>
>>
>>
>>  From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
>> Sent: Monday, July 17, 2023 6:58 AM
>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>
>>  The examples that use DM, in particular DMDA all trivially support using 
>> the GPU with -dm_mat_type aijcusparse -dm_vec_type cuda
>>
>>
>>
>>> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen 
>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>>
>>> Barry,
>>>
>>> Thank you so much for the clarification.
>>>
>>> I see that ex104.c and ex300.c use  MatXAIJSetPreallocation(). Are there 
>>> other tutorials available?
>>>
>>> Cho
>>>  From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
>>> Sent: Saturday, July 15, 2023 8:36 AM
>>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>
>>>      Cho,
>>>
>>>     We currently have a crappy API for turning on GPU support, and our 
>>> documentation is misleading in places.
>>>
>>>     People constantly say "to use GPU's with PETSc you only need to use 
>>> -mat_type aijcusparse (for example)" This is incorrect.
>>>
>>>  This does not work with code that uses the convenience Mat constructors 
>>> such as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you 
>>> use the constructor approach of MatCreate(), MatSetSizes(), 
>>> MatSetFromOptions(), MatXXXSetPreallocation(). ...  Similarly you need to 
>>> use VecCreate(), VecSetSizes(), VecSetFromOptions() and -vec_type cuda
>>>
>>>    If you use DM to create the matrices and vectors then you can use 
>>> -dm_mat_type aijcusparse -dm_vec_type cuda
>>>
>>>    Sorry for the confusion.
>>>
>>>    Barry
>>>
>>>
>>>
>>>
>>>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley 
>>>> <knep...@gmail.com<mailto:knep...@gmail.com>> wrote:
>>>>
>>>> On Sat, Jul 15, 2023 at 1:44 AM Ng, Cho-Kuen 
>>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>>> Matt,
>>>>
>>>> After inserting 2 lines in the code:
>>>>
>>>>   ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);
>>>>   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
>>>>   ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>>>>                       d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>>>>
>>>> "There are no unused options." However, there is no improvement on the GPU 
>>>> performance.
>>>>
>>>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you 
>>>> created in steps 1 and 2. This is detailed in the manual.
>>>>
>>>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before 
>>>> MatSetFromOptions().
>>>>
>>>>   THanks,
>>>>
>>>>     Matt
>>>>  Thanks,
>>>> Cho
>>>>
>>>>  From: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>
>>>> Sent: Friday, July 14, 2023 5:57 PM
>>>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>> Cc: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>; Mark Adams 
>>>> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>; 
>>>> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>  On Fri, Jul 14, 2023 at 7:57 PM Ng, Cho-Kuen 
>>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>> wrote:
>>>> I managed to pass the following options to PETSc using a GPU node on 
>>>> Perlmutter.
>>>>
>>>>     -mat_type aijcusparse -vec_type cuda -log_view -options_left
>>>>
>>>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task.
>>>>
>>>> o #PETSc Option Table entries:
>>>>    -log_view
>>>>    -mat_type aijcusparse
>>>>    -options_left
>>>>    -vec_type cuda
>>>>    #End of PETSc Option Table entries
>>>>    WARNING! There are options you set that were not used!
>>>>    WARNING! could be spelling mistake, etc!
>>>>    There is one unused database option. It is:
>>>>    Option left: name:-mat_type value: aijcusparse
>>>>
>>>> The -mat_type option has not been used. In the application code, we use
>>>>
>>>>     ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>>>>              d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>>>>
>>>>
>>>> If you create the Mat this way, then you need MatSetFromOptions() in order 
>>>> to set the type from the command line.
>>>>
>>>>   Thanks,
>>>>
>>>>      Matt
>>>>  o The percent flops on the GPU for KSPSolve is 17%.
>>>>
>>>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order 
>>>> of magnitude slower. How can I improve the GPU performance?
>>>>
>>>> Thanks,
>>>> Cho
>>>>  From: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>> Sent: Friday, June 30, 2023 7:57 AM
>>>> To: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>; Mark Adams 
>>>> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>> Cc: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>; 
>>>> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>  Barry, Mark and Matt,
>>>>
>>>> Thank you all for the suggestions. I will modify the code so we can pass 
>>>> runtime options.
>>>>
>>>> Cho
>>>>  From: Barry Smith <bsm...@petsc.dev<mailto:bsm...@petsc.dev>>
>>>> Sent: Friday, June 30, 2023 7:01 AM
>>>> To: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>> Cc: Matthew Knepley <knep...@gmail.com<mailto:knep...@gmail.com>>; Ng, 
>>>> Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>; 
>>>> petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>
>>>>   Note that options like -mat_type aijcusparse  -vec_type cuda only work 
>>>> if the program is set up to allow runtime swapping of matrix and vector 
>>>> types. If you have a call to MatCreateMPIAIJ() or other specific types 
>>>> then then these options do nothing but because Mark had you use 
>>>> -options_left the program will tell you at the end that it did not use the 
>>>> option so you will know.
>>>>
>>>>> On Jun 30, 2023, at 9:30 AM, Mark Adams 
>>>>> <mfad...@lbl.gov<mailto:mfad...@lbl.gov>> wrote:
>>>>>
>>>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args 
>>>>> and you run:
>>>>>
>>>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left
>>>>>
>>>>> Mark
>>>>>
>>>>> On Fri, Jun 30, 2023 at 6:16 AM Matthew Knepley 
>>>>> <knep...@gmail.com<mailto:knep...@gmail.com>> wrote:
>>>>> On Fri, Jun 30, 2023 at 1:13 AM Ng, Cho-Kuen via petsc-users 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>>>>> Mark,
>>>>>
>>>>> The application code reads in parameters from an input file, where we can 
>>>>> put the PETSc runtime options. Then we pass the options to 
>>>>> PetscInitialize(...). Does that sounds right?
>>>>>
>>>>> PETSc will read command line argument automatically in PetscInitialize() 
>>>>> unless you shut it off.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>  Cho
>>>>>  From: Ng, Cho-Kuen 
>>>>> <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>>> Sent: Thursday, June 29, 2023 8:32 PM
>>>>> To: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>>  Mark,
>>>>>
>>>>> Thanks for the information. How do I put the runtime options for the 
>>>>> executable, say, a.out, which does not have the provision to append 
>>>>> arguments? Do I need to change the C++ main to read in the options?
>>>>>
>>>>> Cho
>>>>>  From: Mark Adams <mfad...@lbl.gov<mailto:mfad...@lbl.gov>>
>>>>> Sent: Thursday, June 29, 2023 5:55 PM
>>>>> To: Ng, Cho-Kuen <c...@slac.stanford.edu<mailto:c...@slac.stanford.edu>>
>>>>> Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>>
>>>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>>>  Run with options: -mat_type aijcusparse -vec_type cuda -log_view 
>>>>> -options_left
>>>>> The last column of the performance data (from -log_view) will be the 
>>>>> percent flops on the GPU. Check that that is > 0.
>>>>>
>>>>> The end of the output will list the options that were used and options 
>>>>> that were _not_ used (if any). Check that there are no options left.
>>>>>
>>>>> Mark
>>>>>
>>>>> On Thu, Jun 29, 2023 at 7:50 PM Ng, Cho-Kuen via petsc-users 
>>>>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>>>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" 
>>>>> and used it by "spack load petsc/fwge6pf". Then I compiled the 
>>>>> application code (purely CPU code) linking to the petsc package, hoping 
>>>>> that I can get performance improvement using the petsc GPU backend. 
>>>>> However, the timing was the same using the same number of MPI tasks with 
>>>>> and without GPU accelerators. Have I missed something in the process, for 
>>>>> example, setting up PETSc options at runtime to use the GPU backend?
>>>>>
>>>>> Thanks,
>>>>> Cho
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their 
>>>>> experiments is infinitely more interesting than any results to which 
>>>>> their experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
>>>>>  
>>>>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
>>>>  
>>>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their 
>>>> experiments is infinitely more interesting than any results to which their 
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
>>>>  
>>>> <https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!fMikK_wvRIVkv5jLV6EHt_rPhWLibqlxAAYjRVMbAEGOUp417LWCH59TvzCtcD3j4dOd4xR_tUy2MRnqU1N7kew$>


<Untitled.png>



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr4c1DnFX$
 
<https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f3SONncpU-GuOpLt3Phwi_lcwoqEozXCaWrTnBiVOFRXkpSp0Pt66h3EHtJTq_DyRgcEOnBZdXB_MOlhr48-J6lA$
 >

Reply via email to