Should add: To make sure your matrix assembly is efficient - you should run the code with the option -info, and make sure there are no mallocs during assembly.
I'm not sure how petsc4py processes options. One way is to set such options with PETSC_OPTIONS env variable - and then run the code. Satish On Wed, 1 Nov 2017, Matthew Knepley wrote: > On Tue, Oct 31, 2017 at 10:00 PM, Cetinbas, Cankur Firat <ccetin...@anl.gov> > wrote: > > > Hi, > > > > Thanks a lot. Based on both of your suggestions I modified the code using > > Mat.createAIJ() and csr option. The computation time decreased > > significantly after using this method. Still if there is a better option > > please let me know after seeing the modified code below. > > > > At first trial with 1000x1000 matrix with 96019 non-zeros in the matrix, > > the computation time did not scale with the number of cores : Single core > > python @ 0.0035s, single core petsc @ 0.0024s, 2 cores petsc @ 0.0036s, 4 > > cores petsc @ 0.0032, 8 cores petsc @ 0.0030s. > > > > Then I tried with larger matrix 181797x181797 with more non-zeros and I > > got the following results: Single core python @ 0.021, single core petsc @ > > 0.031, 2 cores petsc @ 0.024s, 4 cores petsc @ 0.014, 8 cores petsc @ > > 0.009s, 16 cores petsc @ 0.0087s. > > > > I think the optimum number of nodes is highly dependent on matrix size and > > the number of non-zeros. In the real code matrix size (and so the number of > > non-zero elements) will grow at every iteration starting with very small > > matrices growing to very big ones. Is it possible to set the number process > > from the code dynamically? > > > > I am not sure how you are interpreting these measurements. Normally, I > would say > > 1) Everything timed below is "parallel overhead". This is not intended to > scale with P, instead it will look like a constant, as you observe > > 2) The time to compute the matrix entires should far outstrip the time > below to figure the nonzero structure. Is this true? > > 3) Solve time is often larger than matrix calculation. Is it? > > Thus, we deciding on parallelism, you need to look at the largest costs, > and how they scale with P. > > Thanks, > > Matt > > Another question is about the data types; mpi4py only let me transfer float > > type data, and petsc4py only lets me use int32 type indices. Besides keep > > converting the data, is there any solution for this? > > > > The modified code for matrix creation part: > > > > comm = MPI.COMM_WORLD > > rank = comm.Get_rank() > > size = comm.Get_size() > > > > if rank==0: > > row = np.loadtxt('row1000.out').astype(dtype='int32') > > col = np.loadtxt('col1000.out').astype(dtype='int32') > > val = np.loadtxt('val1000.out').astype(dtype='int32') > > m = 1000 # 1000 x 1000 matrix > > if size>1: > > rbc = np.bincount(row)*1.0 > > ieq = int(np.floor(m/size)) > > a = [ieq]*size > > ix = int(np.mod(m,size)) > > if ix>0: > > for i in range(0,ix): > > a[i]= a[i]+1 > > a = np.array([0]+a).cumsum() > > b = np.zeros(a.shape[0]-1) > > for i in range(0,a.shape[0]-1): > > b[i]=rbc[a[i]:a[i+1]].sum() # b is the send counts for > > Scatterv > > row = row.astype(dtype=float) > > col = col.astype(dtype=float) > > val = val.astype(dtype=float) > > else: > > row=None > > col=None > > val=None > > indpt=None > > b=None > > m=None > > > > if size>1: > > ml = comm.bcast(m,root=0) > > bl = comm.bcast(b,root=0) > > row_lcl = np.zeros(bl[rank]) > > col_lcl = row_lcl.copy() > > val_lcl = row_lcl.copy() > > > > comm.Scatterv([row,b],row_lcl) > > comm.Scatterv([col,b],col_lcl) > > comm.Scatterv([val,b],val_lcl) > > comm.Barrier() > > > > row_lcl = row_lcl.astype(dtype='int32') > > col_lcl = col_lcl.astype(dtype='int32') > > val_lcl = val_lcl.astype(dtype='int32') > > > > indptr = np.bincount(row_lcl) > > indptr = indptr[indptr>0] > > indptr = np.insert(indptr,0,0).cumsum() > > indptr = indptr.astype(dtype='int32') > > comm.Barrier() > > > > pA = PETSc.Mat().createAIJ(size=(ml,ml),csr=(indptr, col_lcl, > > val_lcl)) # Matrix generation > > > > else: > > indptr = np.bincount(row) > > indptr = np.insert(indptr,0,0).cumsum() > > indptr = indptr.astype(dtype='int32') > > st=time.time() > > pA = PETSc.Mat().createAIJ(size=(m,m),csr=(indptr, col, val)) > > print('dt:',time.time()-st) > > > > > > Regards, > > > > Firat > > > > > > -----Original Message----- > > From: Smith, Barry F. > > Sent: Tuesday, October 31, 2017 10:18 AM > > To: Matthew Knepley > > Cc: Cetinbas, Cankur Firat; petsc-users@mcs.anl.gov; Ahluwalia, Rajesh K. > > Subject: Re: [petsc-users] petsc4py sparse matrix construction time > > > > > > You also need to make sure that most matrix entries are generated on > > the process that they will belong on. > > > > Barry > > > > > On Oct 30, 2017, at 8:01 PM, Matthew Knepley <knep...@gmail.com> wrote: > > > > > > On Mon, Oct 30, 2017 at 8:06 PM, Cetinbas, Cankur Firat < > > ccetin...@anl.gov> wrote: > > > Hello, > > > > > > > > > > > > I am a beginner both in PETSc and mpi4py. I have been working on > > parallelizing our water transport code (where we solve linear system of > > equations) and I started with the toy code below. > > > > > > > > > > > > The toy code reads right hand size (rh), row, column, value vectors to > > construct sparse coefficient matrix and then scatters them to construct > > parallel PETSc coefficient matrix and right hand side vector. > > > > > > > > > > > > The sparse matrix generation time is extremely high in comparison to > > sps.csr_matrix((val, (row, col)), shape=(n,n)) in python. For instance > > python generates 181197x181197 sparse matrix in 0.06 seconds and this code > > with 32 cores:1.19s, 16 cores:6.98s and 8 cores:29.5 s. I was wondering if > > I am making a mistake in generating sparse matrix? Is there a more > > efficient way? > > > > > > > > > It looks like you do not preallocate the matrix. There is a chapter on > > this in the manual. > > > > > > Matt > > > > > > Thanks for your help in advance. > > > > > > > > > > > > Regards, > > > > > > > > > > > > Firat > > > > > > > > > > > > from petsc4py import PETSc > > > > > > from mpi4py import MPI > > > > > > import numpy as np > > > > > > import time > > > > > > > > > > > > comm = MPI.COMM_WORLD > > > > > > rank = comm.Get_rank() > > > > > > size = comm.Get_size() > > > > > > > > > > > > if rank==0: > > > > > > # proc 0 loads tomo image and does fast calculations to append row, > > col, val, rh lists > > > > > > # in the real code this vectors will be available on proc 0 no txt > > files are read > > > > > > row = np.loadtxt('row.out') # indices of non-zero rows > > > > > > col = np.loadtxt('col.out') # indices of non-zero columns > > > > > > val = np.loadtxt('vs.out') # values in the sparse matrix > > > > > > rh = np.loadtxt('RHS.out') # right hand side vector > > > > > > n = row.shape[0] #1045699 > > > > > > m = rh.shape[0] #181197 square sparse matrix size > > > > > > else: > > > > > > n = None > > > > > > m = None > > > > > > row = None > > > > > > col = None > > > > > > val = None > > > > > > rh = None > > > > > > rh_ind = None > > > > > > > > > > > > m_lcl = comm.bcast(m,root=0) > > > > > > n_lcl = comm.bcast(n,root=0) > > > > > > neq = n_lcl//size > > > > > > meq = m_lcl//size > > > > > > nx = np.mod(n_lcl,size) > > > > > > mx = np.mod(m_lcl,size) > > > > > > row_lcl = np.zeros(neq) > > > > > > col_lcl = np.zeros(neq) > > > > > > val_lcl = np.zeros(neq) > > > > > > rh_lcl = np.zeros(meq) > > > > > > a = [neq]*size #send counts for Scatterv > > > > > > am = [meq]*size #send counts for Scatterv > > > > > > > > > > > > if nx>0: > > > > > > for i in range(0,nx): > > > > > > if rank==i: > > > > > > row_lcl = np.zeros(neq+1) > > > > > > col_lcl = np.zeros(neq+1) > > > > > > val_lcl = np.zeros(neq+1) > > > > > > a[i] = a[i]+1 > > > > > > if mx>0: > > > > > > for ii in range(0,mx): > > > > > > if rank==ii: > > > > > > rh_lcl = np.zeros(meq+1) > > > > > > am[ii] = am[ii]+1 > > > > > > > > > > > > comm.Scatterv([row,a],row_lcl) > > > > > > comm.Scatterv([col,a],col_lcl) > > > > > > comm.Scatterv([val,a],val_lcl) > > > > > > comm.Scatterv([rh,am],rh_lcl) > > > > > > comm.Barrier() > > > > > > > > > > > > A = PETSc.Mat() > > > > > > A.create() > > > > > > A.setSizes([m_lcl,m_lcl]) > > > > > > A.setType('aij') > > > > > > A.setUp() > > > > > > lr = row_lcl.shape[0] > > > > > > for i in range(0,lr): > > > > > > A[row_lcl[i],col_lcl[i]] = val_lcl[i] > > > > > > A.assemblyBegin() > > > > > > A.assemblyEnd() > > > > > > > > > > > > if size>1: # to get the range for scattered vectors > > > > > > ami = [0] > > > > > > ami = np.array([0]+am).cumsum() > > > > > > for kk in range(0,size): > > > > > > if rank==kk: > > > > > > Is = ami[kk] > > > > > > Ie = ami[kk+1] > > > > > > else: > > > > > > Is=0; Ie=m_lcl > > > > > > > > > > > > b= PETSc.Vec() > > > > > > b.create() > > > > > > b.setSizes(m_lcl) > > > > > > b.setFromOptions() > > > > > > b.setUp() > > > > > > b.setValues(list(range(Is,Ie)),rh_lcl) > > > > > > b.assemblyBegin() > > > > > > b.assemblyEnd() > > > > > > > > > > > > # solution vector > > > > > > x = b.duplicate() > > > > > > x.assemblyBegin() > > > > > > x.assemblyEnd() > > > > > > > > > > > > # create linear solver > > > > > > ksp = PETSc.KSP() > > > > > > ksp.create() > > > > > > ksp.setOperators(A) > > > > > > ksp.setType('cg') > > > > > > #ksp.getPC().setType('icc') # only sequential > > > > > > ksp.getPC().setType('jacobi') > > > > > > print('solving with:', ksp.getType()) > > > > > > > > > > > > #solve > > > > > > st=time.time() > > > > > > ksp.solve(b,x) > > > > > > et=time.time() > > > > > > print(et-st) > > > > > > > > > > > > if size>1: > > > > > > #gather > > > > > > if rank==0: > > > > > > xGthr = np.zeros(m) > > > > > > else: > > > > > > xGthr = None > > > > > > comm.Gatherv(x,[xGthr,am]) > > > > > > > > > > > > > > > -- > > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > > -- Norbert Wiener > > > > > > https://www.cse.buffalo.edu/~knepley/ > > > > > > >