On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees < christopher.e.kees at usace.army.mil> wrote:
> Mark, Barry, > > Thanks for the help. For now I'm sticking with the approach of copying the > csr matrix and using the csr data structures to do the preallocations. I'll > eventually get around to writing the code for assembling directly into the > petsc matrix. I have two more questions. > > 1) On 1 processor, the linear solver is not converging in 1 iteration using > -pc_type lu -pc_factor_mat_solver_package spooles. It converges in one > iteration in parallel or on a single processor if I set -mat_type seqaij. > Also, if I go back to the old way of constructing the matrix it DOES > converge in 1 iteration on 1 processor. That is, if I replace the > MatCreate/MatSetSizes/... calls from the earlier email, with what I had > originally: > > > MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N,1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m); > > then the solver converges in 1 iteration as expected. This wouldn't be a > big deal except that we're trying to verify that our quadratic finite > elements are working, and the behavior for p1 and p2 elements is different. > The above observations are for p2, while all the cases work as expected for > p1. Should we just not be using mpiaij at all on 1 processor? > No, you have a problem. > 2) In the code snippet I took from the petsc example there are two > preallocation calls (see below). Is that correct, or should I be checking > the matrix type and calling only the preallocation function for the actual > matrix type? i.e. something like > > MatCreate(... > MatSetSizes(... > MatSetFromOptions(... > MatGetType(... > if type==mpiaij: MatMPIAIJSetPreallocationCSR(... > It does not hurt to call them all. Matt > Chris > > On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote: > > Chris, just a note, >> >> Perhaps I missed something here but do something similar to you, eg >> overlapping partitions, and PETSc is setup very well to be intrusive in your >> code (I sometimes write a little 'addvalue' wrapper) and assemble your FE >> matrices directly into a global matrix. You use the (global) >> MatSetValues(...) and set the matrix option MAT_IGNORE_OFF_PROC_ENTRIES (or >> something like that) so that these off processor matrix entries, which are >> computed redundantly, are thrown away and you get the correct matrix. >> >> As Barry said you don't need exact non-zero counts to allocate storage in >> PETSC matrices, just don't underestimate. >> >> Mark >> >> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote: >> >> >>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote: >>> >>> Thanks. Just to make sure I'm following, when I create the matrix I do: >>>> >>>> MatCreate(PETSC_COMM_WORLD,&self->m); >>>> MatSetSizes(self->m,n,n,N,N); >>>> MatSetType(self->m,MATMPIAIJ); >>>> MatSetFromOptions(self->m); >>>> >>>> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)->A.colind,(double*)(SMP(L)->A.nzval)); >>>> >>>> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)->A.colind,(double*)(SMP(L)->A.nzval)); >>>> >>>> and then I copy the matrix at each Newton iteration using the code in >>>> the first message below. You said "...PreallocationCSR() does the copy >>>> very >>>> efficiently", but I don't think I'm supposed to repeatedly call it, right? >>>> Looking at the output some more it does seem like building the >>>> preconditioner initially is taking much more time than the initial pass >>>> through MatSetValues. >>>> >>> >>> If the matrix nonzero structure stays the same then yes you need call >>> MatMPIAIJSetPreallocationCSR() only once and use the MatSetValues() calls >>> each Newton step to update the nonzeros. >>> >>> Barry >>> >>> >>>> Chris >>>> >>>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote: >>>> >>>> >>>>> Here's the deal. In parallel PETSc does not use a single CSR matrix on >>>>> each process to hold the MPIAIJ matrix. Hence if you store the matrix on a >>>>> process as a single CSR matrix then it has to be copied into the PETSc >>>>> datastructure. MatMPIAIJSetPreallocationCSR() does the copy very >>>>> efficiently. But you are right, until you free YOUR rowpt[], colind[] and >>>>> a[] there are two copies of the matrix. >>>>> One could argue that this is ok, because anyways any preconditioner >>>>> worth anything will take up at least that much memory later anyways. For >>>>> example, if you use the PETSc default preconditioner, block Jacobi with >>>>> ILU(0) on each block it will take pretty much the same amount of memory. >>>>> >>>>> For people who use SOR or some other preconditioner that requires very >>>>> little memory this is not satisfactory because they feel they could run >>>>> larger problems without the copy. >>>>> >>>>> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and live >>>>> with any memory limitation. The reason is that doing more than that has >>>>> orders of magnitude more pain with generally little payoff. >>>>> If you like pain then you can >>>>> 1) change your code to not build directly into CSR, instead call >>>>> MatSetValuesLocal() as you compute the entries. With this you need to >>>>> figure >>>>> out the preallocation which can be painful code. >>>>> 2) write an entire matrix class that stores the matrix like you store >>>>> it, not like you store it. This is a huge project, since you have to write >>>>> your own matrix-vector, ILU factorization etc. >>>>> >>>>> Barry >>>>> >>>>> >>>>> >>>>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote: >>>>> >>>>> Hi, >>>>>> >>>>>> Our unstructured finite element code does a fairly standard >>>>>> overlapping decomposition that allows it to calculate all the non-zero >>>>>> column entries for the rows owned by the processor (rows 0...ldim-1 in >>>>>> the >>>>>> local numbering). We assemble them into a local CSR matrix and then copy >>>>>> them into a PETSc matrix like this: >>>>>> >>>>>> for (int i=0;i<ldim;i++) >>>>>> { >>>>>> irow[0] = i; >>>>>> >>>>>> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]-rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES); >>>>>> } >>>>>> >>>>>> where rowptr and colind are the CSR indexing arrays in the local >>>>>> numbering. >>>>>> >>>>>> I would like to eliminate the duplicate memory (and the copy above). >>>>>> Is there a recommended way to let PETSc reference array 'a' directly or a >>>>>> way to get rid of 'a' and translate our CSR assembly routines to use low >>>>>> level PETSc data structures? >>>>>> >>>>>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up the >>>>>> first solve, but the documentation is clear that 'a' is not a pointer to >>>>>> the >>>>>> AIJ storage so I still need the copy. I tried using >>>>>> MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but I got >>>>>> indexing errors, and the documentation really seems to imply that the >>>>>> local >>>>>> matrix is really not a standard CSR anyway. If that's true then it seems >>>>>> like the options I have are to write a shell matrix for parallel CSR >>>>>> storage >>>>>> or write some new assembly routines that use MatSetValuesLocal. I'm more >>>>>> concerned about the memory than the overhead of MatSetValuesLocal, but it >>>>>> would certainly be easier on me to use the CSR-based assembly routines we >>>>>> already have. >>>>>> >>>>>> Thanks, >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091029/7a7f7f4e/attachment.html>