The cost of the initial matrix creation to get the row starts and ends is 
trivial compared to the later computations, so is fine to retain.

> On Aug 19, 2023, at 10:51 AM, Erik Kneller via petsc-users 
> <petsc-users@mcs.anl.gov> wrote:
> 
> Hi All,
> 
> Thank you for the recommendations. The first option provided by Matthew works 
> in petsc4py but required some additional computations to re-organize 
> information. I developed an approach that was easier to build within my 
> current system using the option provided by Hong along with some other posts 
> I found on the user forum. See below for an example of how I integrated Numba 
> and petcs4py so the filling of matrix elements on each processor is done 
> using the optimized machine code produced by Numba (you can eliminate the 
> cleaning step if the number of non-zero elements for each row is well 
> defined). Scaling and overall performance is now satisfactory. 
> 
> I do have another question. In order to make this work I create the Petsc 
> matrix 'A' twice, first to get local ownership and second to define the local 
> elements for a particular processor:
> 
> Algorithm
> ------------
> (1) Create a Petsc matrix 'A' and set size and type
> (2) Get the local row start and end for matrix 'A'
> (3) Define the local non-zero coefficients for the rows owned by processor 
> using a Numba JIT-compiled loop and store result in a csr matrix defined 
> using Scipy. 
> (4) Redefine Petsc matix 'A' using the the local csr elements define in step 
> 3.
> (5) Begin and end assembly
> (6) Define RHS and initial solution vectors
> (7) Solve system 
>  (see code example below for details)
> 
> Steps (1) and (4) appear redundant and potentially sub-optimal from a 
> performance perspective (perhaps due to my limited experience with 
> petscy4py). Is there a better approach in terms of elegance and performance? 
> Also, if this is the optimal syntax could someone elaborate on what exactly 
> is occurring behind the seen in terms of memory allocation?
> 
> Thank you all again for your guidance and insights.
> 
> Modified Version of Dalcin's 2D Poisson Example with Numba
> ----------------------------------------------------------------------------------
> import sys
> from typing import Tuple
> import numpy.typing as npt
> import numpy as np
> from numba import njit
> import petsc4py
> petsc4py.init(sys.argv)
> from petsc4py import PETSc
> import scipy.sparse as sps
> 
> 
> def main() -> None:
>     xnodes = 8000
>     nnz_max = 10
>     
>     ynodes = xnodes
>     N = xnodes*ynodes
>     dx = 1.0/(xnodes + 1)
>     
>     A = PETSc.Mat()
>     A.create(comm=PETSc.COMM_WORLD)
>     A.setSizes((xnodes*ynodes, xnodes*ynodes))
>     A.setType(PETSc.Mat.Type.AIJ)
>     
>     rstart, rend = A.getOwnershipRange()
>     Ascipy = build_csr_matrix(
>         rstart, rend, nnz_max, dx, xnodes, ynodes)
>     csr=(
>         Ascipy.indptr[rstart:rend+1] - Ascipy.indptr[rstart],
>         Ascipy.indices[Ascipy.indptr[rstart]:Ascipy.indptr[rend]],
>         Ascipy.data[Ascipy.indptr[rstart]:Ascipy.indptr[rend]]
>         )
>     A = PETSc.Mat().createAIJ(size=(N,N), csr=csr)
>     
>     A.assemblyBegin()
>     A.assemblyEnd()
>     
>     ksp = PETSc.KSP()
>     ksp.create(comm=A.getComm())
>     ksp.setType(PETSc.KSP.Type.CG)
>     ksp.getPC().setType(PETSc.PC.Type.GAMG)
>     
>     ksp.setOperators(A)
>     ksp.setFromOptions()
>     
>     x, b = A.createVecs()
>     b.set(1.0)
>     
>     ksp.solve(b, x)
>     
>     
> def build_csr_matrix(
>         rstart:int, 
>         rend:int, 
>         nnz_max:int, 
>         dx:float, 
>         xnodes:int, 
>         ynodes:int
> ):
>     Anz, Arow, Acol = build_nonzero_arrays(
>         rstart, rend, nnz_max, dx, xnodes, ynodes
>         )
>     N = xnodes*ynodes
>     Ls = sps.csr_matrix((Anz, (Arow,Acol)), shape=(N,N), dtype=np.float64)
>     return Ls
>   
>   
> def build_nonzero_arrays(
>         rstart:int, 
>         rend:int, 
>         nnz_max:int, 
>         dx:float, 
>         xnodes:int, 
>         ynodes:int
> ) -> Tuple[
>     npt.NDArray[np.float64], 
>     npt.NDArray[np.float64], 
>     npt.NDArray[np.float64]
>     ]:
>     nrows_local = (rend - rstart) + 1
>     Anz_ini = np.zeros((nnz_max*nrows_local), dtype=np.float64)
>     Arow_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32)
>     Acol_ini = np.zeros((nnz_max*nrows_local), dtype=np.int32)
>     icount_nz = define_nonzero_values(
>         rstart, rend, dx, xnodes, ynodes, Anz_ini, Arow_ini, Acol_ini
>         )
>     (
>         Anz, Arow, Acol
>     ) = clean_nonzero_arrays(icount_nz, Anz_ini, Arow_ini, Acol_ini)
>     return Anz, Arow, Acol
> 
> 
> @njit
> def define_nonzero_values(
>         rstart:int, 
>         rend:int,
>         dx:float,
>         xnodes:int,
>         ynodes:int,
>         Anz:npt.NDArray[np.float64],
>         Arow:npt.NDArray[np.int64],
>         Acol:npt.NDArray[np.int64]
> ) -> int:
>     """ Fill matrix A
>     """
>     icount_nz = 0
>     for row in range(rstart, rend):
>         i, j = index_to_grid(row, xnodes)
>         #A[row, row] = 4.0/dx**2
>         Anz[icount_nz] = 4.0/dx**2
>         Arow[icount_nz] = row
>         Acol[icount_nz] = row
>         icount_nz += 1
>         if i > 0:
>             column = row - xnodes
>             #A[row, column] = -1.0/dx**2
>             Anz[icount_nz] = -1.0/dx**2
>             Arow[icount_nz] = row
>             Acol[icount_nz] = column
>             icount_nz += 1
>         if i < xnodes - 1:
>             column = row + xnodes
>             #A[row, column] = -1.0/dx**2
>             Anz[icount_nz] = -1.0/dx**2
>             Arow[icount_nz] = row
>             Acol[icount_nz] = column
>             icount_nz += 1
>         if j > 0:
>             column = row - 1
>             #A[row, column] = -1.0/dx**2
>             Anz[icount_nz] = -1.0/dx**2
>             Arow[icount_nz] = row
>             Acol[icount_nz] = column
>             icount_nz += 1
>         if j < xnodes - 1:
>             column = row + 1
>             #A[row, column] = -1.0/dx**2
>             Anz[icount_nz] = -1.0/dx**2
>             Arow[icount_nz] = row
>             Acol[icount_nz] = column
>             icount_nz += 1
>     return icount_nz
> 
> 
> @njit
> def clean_nonzero_arrays(
>         icount_nz:int, 
>         Anz_ini:npt.NDArray[np.float64], 
>         Arow_ini:npt.NDArray[np.float64], 
>         Acol_ini:npt.NDArray[np.float64]
> ) -> Tuple[
>     npt.NDArray[np.float64], 
>     npt.NDArray[np.float64], 
>     npt.NDArray[np.float64]
>     ]:
>     Anz = np.zeros((icount_nz), dtype=np.float64)
>     Arow = np.zeros((icount_nz), dtype=np.int32)
>     Acol = np.zeros((icount_nz), dtype=np.int32)
>     for i in range(icount_nz):
>         Anz[i] = Anz_ini[i]
>         Arow[i] = Arow_ini[i]
>         Acol[i] = Acol_ini[i]
>     return Anz, Arow, Acol
> 
> 
> @njit
> def index_to_grid(r:int, n:int) -> Tuple[int,int]:
>     """Convert a row number into a grid point."""
>     return (r//n, r%n)
> 
> 
> if __name__ == "__main__":
>     main()
> 
> 
> On Friday, August 18, 2023 at 11:24:36 AM CDT, Zhang, Hong 
> <hongzh...@anl.gov> wrote:
> 
> 
> You can use this to build a PETSc matrix with the index arrays ai,aj and the 
> value array aa:
>     PETSc.Mat().createAIJ(size=(nrows,ncols), csr=(ai,aj,aa))
> 
> Hong (Mr.)
> 
> > On Aug 17, 2023, at 7:37 AM, Erik Kneller via petsc-users 
> > <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:
> > 
> > Hi All,
> > 
> > I need to fill non-zero values of a Petsc matrix via petsc4py for the 
> > domain defined by A.getOwnershipRange() using three Numpy arrays: (1) array 
> > containing row indices of non-zero value, (2) array containing column 
> > indices of non-zero values and (3) array containing the non-zero matrix 
> > values. How can one perform this type of filling operation in petsc4py? The 
> > method A.setValues does not appear to allow this since it only works on an 
> > individual matrix element or a block of matrix elements.
> > 
> > I am using Numpy arrays since they can be computed in loops optimized using 
> > Numba on each processor. I also cannot pass the Petsc matrix to a Numba 
> > compiled function since type information cannot be inferred. I absolutely 
> > need to avoid looping in standard Python to define Petsc matrix elements 
> > due to performance issues. I also need to use a standard petscy4py method 
> > and avoid writing new C or Fortran wrappers to minimize language complexity.
> > 
> > Example Algorithm Building on Lisandro Dalcin's 2D Poisson Example:
> > ----------------------------------------------------------------------------------------------
> > comm = PETSc.COMM_WORLD
> > rank = comm.getRank()
> > 
> > dx = 1.0/(xnodes + 1) # xnodes is the number of nodes in the x and 
> > y-directions of the grid
> > nnz_max = 5 # max number of non-zero values per row
> > 
> > A = PETSc.Mat()
> > A.create(comm=PETSc.COMM_WORLD)
> > A.setSizes((xnodes*ynodes, xnodes*ynodes))
> > A.setType(PETSc.Mat.Type.AIJ)
> > A.setPreallocationNNZ(nnz_max)
> > 
> > rstart, rend = A.getOwnershipRange()
> > 
> > # Here Anz, Arow and Acol are vectors with size equal to the number of 
> > non-zero values 
> > Anz, Arow, Acol = build_nonzero_numpy_arrays_using_numba(rstart, rend, 
> > nnz_max, dx, xnodes, ynodes)
> > 
> > A.setValues(Arow, Acol, Anz) # <--- This does not work.
> > 
> > A.assemblyBegin()
> > A.assemblyEnd()
> > ksp = PETSc.KSP()
> > ksp.create(comm=A.getComm())
> > ksp.setType(PETSc.KSP.Type.CG)
> > ksp.getPC().setType(PETSc.PC.Type.GAMG)
> > ksp.setOperators(A)
> > ksp.setFromOptions()
> > x, b = A.createVecs()
> > b.set(1.0)
> > 
> > ksp.solve(b, x)
> > 
> > Regards,
> > Erik Kneller, Ph.D.
> > 
> 

Reply via email to