Dear Matthew and Junchao,

I finally found my error now everything works fine. I was a bit stuck at some 
moment and your small comments were very helpful.

Thanks!!!

Herbert Owen
Senior Researcher, Dpt. Computer Applications in Science and Engineering
Barcelona Supercomputing Center (BSC-CNS)
Tel: +34 93 413 4038
Skype: herbert.owen

https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$
 








> On 14 Nov 2025, at 16:08, howen <[email protected]> wrote:
> 
> Thank you very much Matthew,
> 
> I did what you suggested and I also added 
> 
> ierr = MatView(*amat, PETSC_VIEWER_STDOUT_WORLD); CHKERRQ(ierr);
> 
> Now that I can see the matrices I notice that some values differ. I will 
> debug and simplify my code to try to understand where the difference comes 
> from . 
> 
> As soon as I have a more clear picture I will contact you back. 
> 
> Best, 
> 
> 
> Herbert Owen
> Senior Researcher, Dpt. Computer Applications in Science and Engineering
> Barcelona Supercomputing Center (BSC-CNS)
> Tel: +34 93 413 4038
> Skype: herbert.owen
> 
> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$
>  
> 
> 
> 
> 
> 
> 
> 
> 
>> On 13 Nov 2025, at 18:23, Matthew Knepley <[email protected]> wrote:
>> 
>> On Thu, Nov 13, 2025 at 12:11 PM howen via petsc-users 
>> <[email protected] <mailto:[email protected]>> wrote:
>>> Dear Junchao,
>>> 
>>> Thank you for response and sorry for taking so long to answer back. 
>>> I cannot avoid using the nvidia tools. Gfortran is not mature for OpenACC 
>>> and gives us problems when compiling our code.
>>> What I have done to enable using the latest petsc is to create my own C 
>>> code to call petsc. 
>>> I have little experience with c and it took me some time, but I can now use 
>>> petsc 3.24.1  ;)
>>> 
>>> The behaviour remains the same as in my original email . 
>>> Parallel+GPU gives bad results. CPU(serial and parallel) and GPU serial all 
>>> work ok and give the same result.
>>> 
>>> I have gone a bit into petsc comparing the CPU and GPU version with 2 mpi.
>>> I see that the difference starts in 
>>> src/ksp/ksp/impls/cg/cg.c  L170
>>>     PetscCall(KSP_PCApply(ksp, R, Z));  /*    z <- Br                       
>>>     */
>>> I have printed the vectors R and Z and the norm dp.
>>> R is identical on both CPU and GPU; but Z differs.
>>> The correct value of dp (for the first time it enters) is 14.3014, while 
>>> running on the GPU with 2 mpis it gives 14.7493.
>>> If you wish I can send you prints I introduced in cg.c    
>> 
>> Thank you for all the detail in this report. However, since you see a 
>> problem in KSPCG, I believe we can reduce the complexity. You can use
>> 
>>   -ksp_view_mat binary:A.bin -ksp_view_rhs binary:b.bin
>> 
>> and send us those files. Then we can run your system directly using KSP ex10 
>> (and so can you).
>> 
>>   Thanks,
>> 
>>       Matt
>>  
>>> The folder with the input files to run the case can be downloaded from 
>>> https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRSYLlx3K$
>>>   
>>> <https://urldefense.us/v3/__https://b2drop.eudat.eu/s/wKRQ4LK7RTKz2iQ__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAh7n_UO$>
>>> 
>>> For submitting the gpu run I use 
>>> mpirun -np 2 --map-by ppr:4:node:PE=20 --report-bindings ./mn5_bind.sh 
>>> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
>>>  ChannelFlowSolverIncomp.json
>>> 
>>> For the cpu run
>>> mpirun -np 2 
>>> /gpfs/scratch/bsc21/bsc021257/git/140-add-petsc/sod2d_gitlab/build_cpu/src/app_sod2d/sod2d
>>>  ChannelFlowSolverIncomp.json
>>> 
>>> Our code can be downloaded with :
>>> git clone --recursive 
>>> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRQCr0eq8$
>>>   
>>> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEFjsBTIo$>
>>> 
>>> -and the branch I am using with
>>> git checkout 140-add-petsc
>>> 
>>> To use exactly the same commit I am using 
>>> git checkout 09a923c9b57e46b14ae54b935845d50272691ace
>>> 
>>> 
>>> I am currently using: Currently Loaded Modules:
>>>   1) nvidia-hpc-sdk/25.1   2) hdf5/1.14.1-2-nvidia-nvhpcx   3) cmake/3.25.1
>>> I guess/hope similar modules should be available in any supercomputer.
>>> 
>>> To build the cpu version 
>>> mkdir build_cpu
>>> cd build_cpu
>>> 
>>> export 
>>> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241_cpu/hhinstal
>>> export LD_LIBRARY_PATH=$PETSC_INSTALL/lib:$LD_LIBRARY_PATH
>>> export LIBRARY_PATH=$PETSC_INSTALL/lib:$LIBRARY_PATH
>>> export C_INCLUDE_PATH=$PETSC_INSTALL/include:$C_INCLUDE_PATH
>>> export CPLUS_INCLUDE_PATH=$PETSC_INSTALL/include:$CPLUS_INCLUDE_PATH
>>> export PKG_CONFIG_PATH=$PETSC_INSTALL/lib/pkgconfig:$PKG_CONFIG_PATH
>>> 
>>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=OFF 
>>> -DDEBUG_MODE=OFF ..
>>> make -j 80
>>> 
>>> I have built petsc myself  as follows
>>> 
>>> git clone -b release 
>>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRZKzWAoJ$
>>>   
>>> <https://urldefense.us/v3/__https://gitlab.com/petsc/petsc.git__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLELP8U6d0$>
>>>  petsc
>>> cd petsc
>>> git checkout v3.24.1     
>>> module purge
>>> module load nvidia-hpc-sdk/25.1   hdf5/1.14.1-2-nvidia-nvhpcx cmake/3.25.1 
>>> ./configure 
>>> --PETSC_DIR=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/petsc 
>>> --prefix=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal 
>>> --with-fortran-bindings=0  --with-fc=0 --with-petsc-arch=linux-x86_64-opt 
>>> --with-scalar-type=real --with-debugging=yes --with-64-bit-indices=1 
>>> --with-precision=single --download-hypre 
>>> CFLAGS=-I/apps/ACC/HDF5/1.14.1-2/NVIDIA/NVHPCX/include CXXFLAGS= FCFLAGS= 
>>> --with-shared-libraries=1 --with-mpi=1 
>>> --with-blacs-lib=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/lib/intel64/libmkl_blacs_openmpi_lp64.a
>>>  --with-blacs-include=/gpfs/apps/MN5/ACC/ONEAPI/2025.1/mkl/2025.1/include 
>>> --with-mpi-dir=/apps/ACC/NVIDIA-HPC-SDK/25.1/Linux_x86_64/25.1/comm_libs/12.6/hpcx/latest/ompi/
>>>  --download-ptscotch=yes --download-metis --download-parmetis
>>> make all check
>>> make install
>>> 
>>> -------------------
>>> For the GPU version when configuring petsc I add : --with-cuda 
>>> 
>>> I then change the export PETSC_INSTALL  to 
>>> export 
>>> PETSC_INSTALL=/gpfs/scratch/bsc21/bsc021257/git/petsc_oct25/3241/hhinstal
>>> and repeat all other exports
>>> 
>>> mkdir build_gpu
>>> cd build_gpu
>>> cmake -DUSE_RP=8 -DUSE_PORDER=3 -DUSE_PETSC=ON -DUSE_GPU=ON 
>>> -DDEBUG_MODE=OFF ..
>>> make -j 80
>>> 
>>> As you can see from the submit instructions the executable is found in 
>>> sod2d_gitlab/build_gpu/src/app_sod2d/sod2d
>>> 
>>> I hope I have not forgotten anything and my instructions are 'easy' to 
>>> follow. If you have any issue do not doubt to contact me.
>>> The wiki for our code can be found in 
>>> https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRTtC2VEI$
>>>   
>>> <https://urldefense.us/v3/__https://gitlab.com/bsc_sod2d/sod2d_gitlab/-/wikis/home__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEA1vqPYk$>
>>> 
>>> Best, 
>>> 
>>> Herbert Owen
>>>  
>>> Herbert Owen
>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>>> Barcelona Supercomputing Center (BSC-CNS)
>>> Tel: +34 93 413 4038
>>> Skype: herbert.owen
>>> 
>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$
>>>   
>>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!Y_YIXIdrN81gDKgNed6V4icL3nN9OG62-ZsdnB1Bkc7iiGAoJ2riwbTzxJMnIROon3mXgiFVLnbH0RTlsXrLEAA5PwtO$>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 16 Oct 2025, at 18:30, Junchao Zhang <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi, Herbert,
>>>>    I don't have much experience on OpenACC and PETSc CI doesn't have such 
>>>> tests.  Could you avoid using nvfortran and instead use gfortran to 
>>>> compile your Fortran + OpenACC code?  If you, then you can use the latest 
>>>> petsc code and make our debugging easier. 
>>>>    Also, could you provide us with a test and instructions to reproduce 
>>>> the problem?
>>>>    
>>>>    Thanks!
>>>> --Junchao Zhang
>>>> 
>>>> 
>>>> On Thu, Oct 16, 2025 at 5:07 AM howen via petsc-users 
>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>> Dear All,
>>>>> 
>>>>> I am interfacing our CFD code (Fortran + OpenACC)  to Petsc. 
>>>>> Since we use OpenACC the natural choice for us is to use Nvidia´s nvhpc 
>>>>> compiler. The Gnu compiler does not work well and we do not have access 
>>>>> to the Cray compiler.  
>>>>> 
>>>>> I already know that the latest version of Petsc does not compile with 
>>>>> nvhpc, I am therefore using version 3.21.  
>>>>> I get good results on the CPU both in serial and parallel (MPI). However, 
>>>>> the GPU implementation, that is what we are interested in, only work 
>>>>> correctly for the serial version. In parallel, the results are different. 
>>>>> Even for a CG solve. 
>>>>> 
>>>>> I would like to know, if you have experience with the Nvidia compiler.  I 
>>>>> am particularly interested if you have already observed issues with it. 
>>>>> Your opinion on whether to put further effort into trying to find a bug I 
>>>>> may have introduced during the interfacing is highly appreciated.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Herbert Owen
>>>>> Senior Researcher, Dpt. Computer Applications in Science and Engineering
>>>>> Barcelona Supercomputing Center (BSC-CNS)
>>>>> Tel: +34 93 413 4038
>>>>> Skype: herbert.owen
>>>>> 
>>>>> https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRfz8esh1$
>>>>>   
>>>>> <https://urldefense.us/v3/__https://scholar.google.es/citations?user=qe5O2IYAAAAJ&hl=en__;!!G_uCfscf7eWS!abuM7ozzUs7eISYBumHNxpvO2Tuy74KRM4-WWcunXHZVjQf1V032xQrCzTfC5vA_NM-35xMEZ9yJ8XK-3QFqjWBSWuUi$>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> 
>> 
>> --
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://urldefense.us/v3/__https://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRY11r9Bz$
>>   
>> <https://urldefense.us/v3/__http://www.cse.buffalo.edu/*knepley/__;fg!!G_uCfscf7eWS!f2aJLIJjM-mazjlNYof4HlyYtStFvSqBARrm1edFZiRRKxeneBWEc4um7RyuOjp8es6iTTywGdvGPHvmzdeHRZltPRPT$
>>  >
> 

Reply via email to