Have you run this with the debug version of PETSc? It does additional argument testing that might catch a funny value passed in.
Barry On Oct 29, 2009, at 5:02 PM, Chris Kees wrote: > I think I added the correct test code and got a difference of 0, but > spooles is still not returning the correct result, at least not > until the 2nd iteration. I'm attaching some of the code and output. > > Chris > > code for mat creates > --------------------------- > MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, > 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2); > > MatCreate(Py_PETSC_COMM_WORLD,&self->m); > MatSetSizes(self->m,n,n,N,N); > MatSetFromOptions(self->m); > MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- > >A.colind,(double*)(SMP(L)->A.nzval)); > MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- > >A.colind,(double*)(SMP(L)->A.nzval)); > > > code for difference in mats > ----------------------------------- > > MatAXPY > (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN); > PetscReal norm=-1.0; > MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm); > std::cout<<"inf norm of L - L2 = "<<norm<<std::endl; > > results for -pc_type lu -pc_factor_mat_solver_package spooles - > ksp_monitor_true_residual -mat_type mpiaij > ----------------------------------- > > inf norm of L - L2 = 0 > 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm > 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm > 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01 > 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm > 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14 > On Oct 29, 2009, at 3:56 PM, Barry Smith wrote: > >> >> Hmm, this should not happen. The matrix should be identical in >> both cases (note the initial residual is the same in both cases so >> they may be identical). >> >> Here's one more thing you can try. In the same routine create TWO >> matrices; one each way, then use MatAXPY() to take the difference >> between them. >> Is it zero? If not, that will help lead to where the problem might >> be. >> >> If that does not help the situation, can you send the code the >> demonstrates this problem to petsc-maint at mcs.anl.gov >> >> >> Barry >> >> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote: >> >>> I pulled the most recent version of petsc-dev and added >>> superlu_dist. The only case that fails is spooles on 1 processor >>> with an mpiaij matrix created with the Create/SetSizes/... >>> paradigm. I would suspect something in spooles wrapper code. >>> >>> code allocating mat: >>> MatCreate(Py_PETSC_COMM_WORLD,&self->m); >>> MatSetSizes(self->m,n,n,N,N); >>> MatSetType(self->m,MATMPIAIJ); >>> >>> results: >>> -pc_type lu -pc_factor_mat_solver_packages spooles - >>> ksp_monitor_true_residual (spooles on 1 proc) >>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm >>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm >>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01 >>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm >>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14 >>> >>> -pc_type lu -pc_factor_mat_solver_packages spooles - >>> ksp_monitor_true_residual (spooleson 8 procs) >>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm >>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm >>> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15 >>> >>> -pc_type lu -pc_factor_mat_solver_packages superlu_dist - >>> ksp_monitor_true_residual >>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm >>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm >>> 1.299023464819e-13 ||Ae||/||Ax|| 1.222263471084e-14 >>> >>> code allocating mat: >>> >>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, >>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m); >>> >>> results: >>> -pc_type lu -pc_factor_mat_solver_packages spooles - >>> ksp_monitor_true_residual (spooles on 1 proc) >>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm >>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm >>> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15 >>> >>> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote: >>> >>>> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <christopher.e.kees at >>>> usace.army.mil >>>> > wrote: >>>> Mark, Barry, >>>> >>>> Thanks for the help. For now I'm sticking with the approach of >>>> copying the csr matrix and using the csr data structures to do >>>> the preallocations. I'll eventually get around to writing the >>>> code for assembling directly into the petsc matrix. I have two >>>> more questions. >>>> >>>> 1) On 1 processor, the linear solver is not converging in 1 >>>> iteration using -pc_type lu -pc_factor_mat_solver_package >>>> spooles. It converges in one iteration in parallel or on a single >>>> processor if I set -mat_type seqaij. Also, if I go back to the >>>> old way of constructing the matrix it DOES converge in 1 >>>> iteration on 1 processor. That is, if I replace the MatCreate/ >>>> MatSetSizes/... calls from the earlier email, with what I had >>>> originally: >>>> >>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, >>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m); >>>> >>>> then the solver converges in 1 iteration as expected. This >>>> wouldn't be a big deal except that we're trying to verify that >>>> our quadratic finite elements are working, and the behavior for >>>> p1 and p2 elements is different. The above observations are for >>>> p2, while all the cases work as expected for p1. Should we just >>>> not be using mpiaij at all on 1 processor? >>>> >>>> No, you have a problem. >>>> >>>> 2) In the code snippet I took from the petsc example there are >>>> two preallocation calls (see below). Is that correct, or should I >>>> be checking the matrix type and calling only the preallocation >>>> function for the actual matrix type? i.e. something like >>>> >>>> MatCreate(... >>>> MatSetSizes(... >>>> MatSetFromOptions(... >>>> MatGetType(... >>>> if type==mpiaij: MatMPIAIJSetPreallocationCSR(... >>>> >>>> It does not hurt to call them all. >>>> >>>> Matt >>>> >>>> Chris >>>> >>>> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote: >>>> >>>> Chris, just a note, >>>> >>>> Perhaps I missed something here but do something similar to you, >>>> eg overlapping partitions, and PETSc is setup very well to be >>>> intrusive in your code (I sometimes write a little 'addvalue' >>>> wrapper) and assemble your FE matrices directly into a global >>>> matrix. You use the (global) MatSetValues(...) and set the >>>> matrix option MAT_IGNORE_OFF_PROC_ENTRIES (or something like >>>> that) so that these off processor matrix entries, which are >>>> computed redundantly, are thrown away and you get the correct >>>> matrix. >>>> >>>> As Barry said you don't need exact non-zero counts to allocate >>>> storage in PETSC matrices, just don't underestimate. >>>> >>>> Mark >>>> >>>> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote: >>>> >>>> >>>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote: >>>> >>>> Thanks. Just to make sure I'm following, when I create the matrix >>>> I do: >>>> >>>> MatCreate(PETSC_COMM_WORLD,&self->m); >>>> MatSetSizes(self->m,n,n,N,N); >>>> MatSetType(self->m,MATMPIAIJ); >>>> MatSetFromOptions(self->m); >>>> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- >>>> >A.colind,(double*)(SMP(L)->A.nzval)); >>>> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- >>>> >A.colind,(double*)(SMP(L)->A.nzval)); >>>> >>>> and then I copy the matrix at each Newton iteration using the >>>> code in the first message below. You said "...PreallocationCSR() >>>> does the copy very efficiently", but I don't think I'm supposed >>>> to repeatedly call it, right? Looking at the output some more it >>>> does seem like building the preconditioner initially is taking >>>> much more time than the initial pass through MatSetValues. >>>> >>>> If the matrix nonzero structure stays the same then yes you need >>>> call MatMPIAIJSetPreallocationCSR() only once and use the >>>> MatSetValues() calls each Newton step to update the nonzeros. >>>> >>>> Barry >>>> >>>> >>>> Chris >>>> >>>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote: >>>> >>>> >>>> Here's the deal. In parallel PETSc does not use a single CSR >>>> matrix on each process to hold the MPIAIJ matrix. Hence if you >>>> store the matrix on a process as a single CSR matrix then it has >>>> to be copied into the PETSc datastructure. >>>> MatMPIAIJSetPreallocationCSR() does the copy very efficiently. >>>> But you are right, until you free YOUR rowpt[], colind[] and a[] >>>> there are two copies of the matrix. >>>> One could argue that this is ok, because anyways any >>>> preconditioner worth anything will take up at least that much >>>> memory later anyways. For example, if you use the PETSc default >>>> preconditioner, block Jacobi with ILU(0) on each block it will >>>> take pretty much the same amount of memory. >>>> >>>> For people who use SOR or some other preconditioner that requires >>>> very little memory this is not satisfactory because they feel >>>> they could run larger problems without the copy. >>>> >>>> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and >>>> live with any memory limitation. The reason is that doing more >>>> than that has orders of magnitude more pain with generally little >>>> payoff. >>>> If you like pain then you can >>>> 1) change your code to not build directly into CSR, instead call >>>> MatSetValuesLocal() as you compute the entries. With this you >>>> need to figure out the preallocation which can be painful code. >>>> 2) write an entire matrix class that stores the matrix like you >>>> store it, not like you store it. This is a huge project, since >>>> you have to write your own matrix-vector, ILU factorization etc. >>>> >>>> Barry >>>> >>>> >>>> >>>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote: >>>> >>>> Hi, >>>> >>>> Our unstructured finite element code does a fairly standard >>>> overlapping decomposition that allows it to calculate all the non- >>>> zero column entries for the rows owned by the processor (rows >>>> 0...ldim-1 in the local numbering). We assemble them into a >>>> local CSR matrix and then copy them into a PETSc matrix like this: >>>> >>>> for (int i=0;i<ldim;i++) >>>> { >>>> irow[0] = i; >>>> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- >>>> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES); >>>> } >>>> >>>> where rowptr and colind are the CSR indexing arrays in the local >>>> numbering. >>>> >>>> I would like to eliminate the duplicate memory (and the copy >>>> above). Is there a recommended way to let PETSc reference array >>>> 'a' directly or a way to get rid of 'a' and translate our CSR >>>> assembly routines to use low level PETSc data structures? >>>> >>>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up >>>> the first solve, but the documentation is clear that 'a' is not a >>>> pointer to the AIJ storage so I still need the copy. I tried >>>> using MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to >>>> zero but I got indexing errors, and the documentation really >>>> seems to imply that the local matrix is really not a standard CSR >>>> anyway. If that's true then it seems like the options I have are >>>> to write a shell matrix for parallel CSR storage or write some >>>> new assembly routines that use MatSetValuesLocal. I'm more >>>> concerned about the memory than the overhead of >>>> MatSetValuesLocal, but it would certainly be easier on me to use >>>> the CSR-based assembly routines we already have. >>>> >>>> Thanks, >>>> Chris >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to >>>> which their experiments lead. >>>> -- Norbert Wiener >>> >> >