most efficient way to use local CSR matrix?

Barry Smith Thu, 29 Oct 2009 21:00:00 -0500

   Have you run this with the debug version of PETSc? It does  
additional argument testing that might catch a funny value passed in.



    Barry

On Oct 29, 2009, at 5:02 PM, Chris Kees wrote:

> I think I added the correct test code and got a difference of 0, but  
> spooles is still not returning the correct result, at least not  
> until the 2nd iteration.  I'm attaching some of the code and output.
>
> Chris
>
> code for mat creates
> ---------------------------
>       MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);
>
>       MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>       MatSetSizes(self->m,n,n,N,N);
>       MatSetFromOptions(self->m);
>       MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>       MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>
>
> code for difference in mats
> -----------------------------------
>
> MatAXPY 
> (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
> PetscReal norm=-1.0;
> MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
> std::cout<<"inf norm of L - L2 = "<<norm<<std::endl;
>
> results for -pc_type lu -pc_factor_mat_solver_package spooles - 
> ksp_monitor_true_residual -mat_type mpiaij
> -----------------------------------
>
> inf norm of L - L2 = 0
> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
> On Oct 29, 2009, at 3:56 PM, Barry Smith wrote:
>
>>
>>  Hmm, this should not happen. The matrix should be identical in  
>> both cases (note the initial residual is the same in both cases so  
>> they may be identical).
>>
>>  Here's one more thing you can try. In the same routine create TWO  
>> matrices; one each way, then use MatAXPY() to take the difference  
>> between them.
>> Is it zero? If not, that will help lead to where the problem might  
>> be.
>>
>>  If that does not help the situation, can you send the code the  
>> demonstrates this problem to petsc-maint at mcs.anl.gov
>>
>>
>>   Barry
>>
>> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>>
>>> I pulled the most recent version of petsc-dev and added  
>>> superlu_dist.  The only case that fails is spooles on 1 processor  
>>> with an mpiaij matrix created with the Create/SetSizes/...  
>>> paradigm.  I would suspect something in spooles wrapper code.
>>>
>>> code allocating mat:
>>> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>>> MatSetSizes(self->m,n,n,N,N);
>>> MatSetType(self->m,MATMPIAIJ);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
>>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
>>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooleson 8 procs)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
>>> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
>>> ksp_monitor_true_residual
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
>>> 1.299023464819e-13 ||Ae||/||Ax|| 1.222263471084e-14
>>>
>>> code allocating mat:
>>>
>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
>>> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15
>>>
>>> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:
>>>
>>>> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <christopher.e.kees at 
>>>> usace.army.mil 
>>>> > wrote:
>>>> Mark, Barry,
>>>>
>>>> Thanks for the help.  For now I'm sticking with the approach of  
>>>> copying the csr matrix and using the csr data structures to do  
>>>> the preallocations. I'll eventually get around to writing the  
>>>> code for assembling directly into the petsc matrix.  I have two  
>>>> more questions.
>>>>
>>>> 1) On 1 processor, the linear solver is not converging in 1  
>>>> iteration using -pc_type lu -pc_factor_mat_solver_package  
>>>> spooles. It converges in one iteration in parallel or on a single  
>>>> processor if I set -mat_type seqaij. Also, if I go back to the  
>>>> old way of constructing the matrix it DOES converge in 1  
>>>> iteration on 1 processor. That is, if I replace the MatCreate/ 
>>>> MatSetSizes/...  calls from the earlier email, with what I had  
>>>> originally:
>>>>
>>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>>
>>>> then the solver converges in 1 iteration as expected.  This  
>>>> wouldn't be a big deal except that we're trying to verify that  
>>>> our quadratic finite elements are working, and the behavior for  
>>>> p1 and p2 elements is different. The above observations are for  
>>>> p2, while all the cases work as expected for p1. Should we just  
>>>> not be using mpiaij at all on 1 processor?
>>>>
>>>> No, you have a problem.
>>>>
>>>> 2) In the code snippet I took from the petsc example there are  
>>>> two preallocation calls (see below). Is that correct, or should I  
>>>> be checking the matrix type and calling only the preallocation  
>>>> function for the actual matrix type? i.e. something like
>>>>
>>>> MatCreate(...
>>>> MatSetSizes(...
>>>> MatSetFromOptions(...
>>>> MatGetType(...
>>>> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>>>>
>>>> It does not hurt to call them all.
>>>>
>>>> Matt
>>>>
>>>> Chris
>>>>
>>>> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>>>>
>>>> Chris, just a note,
>>>>
>>>> Perhaps I missed something here but do something similar to you,  
>>>> eg overlapping partitions, and PETSc is setup very well to be  
>>>> intrusive in your code (I sometimes write a little 'addvalue'  
>>>> wrapper) and assemble your FE matrices directly into a global  
>>>> matrix.  You use the (global) MatSetValues(...) and set the  
>>>> matrix option MAT_IGNORE_OFF_PROC_ENTRIES (or something like  
>>>> that) so that these off processor matrix entries, which are  
>>>> computed redundantly, are thrown away and you get the correct  
>>>> matrix.
>>>>
>>>> As Barry said you don't need exact non-zero counts to allocate  
>>>> storage in PETSC matrices, just don't underestimate.
>>>>
>>>> Mark
>>>>
>>>> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>>>>
>>>>
>>>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>>>>
>>>> Thanks. Just to make sure I'm following, when I create the matrix  
>>>> I do:
>>>>
>>>> MatCreate(PETSC_COMM_WORLD,&self->m);
>>>> MatSetSizes(self->m,n,n,N,N);
>>>> MatSetType(self->m,MATMPIAIJ);
>>>> MatSetFromOptions(self->m);
>>>> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>>>> >A.colind,(double*)(SMP(L)->A.nzval));
>>>> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>>>> >A.colind,(double*)(SMP(L)->A.nzval));
>>>>
>>>> and then I copy the matrix at each Newton iteration using the  
>>>> code in the first message below.  You said "...PreallocationCSR()  
>>>> does the copy very efficiently", but I don't think I'm supposed  
>>>> to repeatedly call it, right?  Looking at the output some more it  
>>>> does seem like building the preconditioner initially is taking  
>>>> much more time than the initial pass through MatSetValues.
>>>>
>>>> If the matrix nonzero structure stays the same then yes you need  
>>>> call MatMPIAIJSetPreallocationCSR() only once and use the  
>>>> MatSetValues() calls each Newton step to update the nonzeros.
>>>>
>>>> Barry
>>>>
>>>>
>>>> Chris
>>>>
>>>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>>>>
>>>>
>>>> Here's the deal. In parallel PETSc does not use a single CSR  
>>>> matrix on each process to hold the MPIAIJ matrix. Hence if you  
>>>> store the matrix on a process as a single CSR matrix then it has  
>>>> to be copied into the PETSc datastructure.  
>>>> MatMPIAIJSetPreallocationCSR() does the copy very efficiently.  
>>>> But you are right, until you free YOUR rowpt[], colind[] and a[]  
>>>> there are two copies of the matrix.
>>>> One could argue that this is ok, because anyways any  
>>>> preconditioner worth anything will take up at least that much  
>>>> memory later anyways.  For example, if you use the PETSc default  
>>>> preconditioner, block Jacobi with ILU(0) on each block it will  
>>>> take pretty much the same amount of memory.
>>>>
>>>> For people who use SOR or some other preconditioner that requires  
>>>> very little memory this is not satisfactory because they feel  
>>>> they could run larger problems without the copy.
>>>>
>>>> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
>>>> live with any memory limitation. The reason is that doing more  
>>>> than that has orders of magnitude more pain with generally little  
>>>> payoff.
>>>> If you like pain then you can
>>>> 1) change your code to not build directly into CSR, instead call  
>>>> MatSetValuesLocal() as you compute the entries. With this you  
>>>> need to figure out the preallocation which can be painful code.
>>>> 2) write an entire matrix class that stores the matrix like you  
>>>> store it, not like you store it. This is a huge project, since  
>>>> you have to write your own matrix-vector, ILU factorization etc.
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>>>>
>>>> Hi,
>>>>
>>>> Our unstructured finite element code does a fairly standard  
>>>> overlapping decomposition that allows it to calculate all the non- 
>>>> zero column entries for the rows owned by the processor (rows  
>>>> 0...ldim-1 in the local numbering).  We assemble them into a  
>>>> local CSR matrix and then copy them into a PETSc matrix like this:
>>>>
>>>> for (int i=0;i<ldim;i++)
>>>> {
>>>> irow[0] = i;
>>>> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
>>>> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
>>>> }
>>>>
>>>> where rowptr and colind are the CSR indexing arrays in the local  
>>>> numbering.
>>>>
>>>> I would like to eliminate the duplicate memory (and the copy  
>>>> above). Is there a recommended way to let PETSc reference array  
>>>> 'a' directly or a way to get rid of 'a' and translate our CSR  
>>>> assembly routines to use low level PETSc data structures?
>>>>
>>>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
>>>> the first solve, but the documentation is clear that 'a' is not a  
>>>> pointer to the AIJ storage so I still need the copy. I tried  
>>>> using MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to  
>>>> zero but I got indexing errors, and the documentation really  
>>>> seems to imply that the local matrix is really not a standard CSR  
>>>> anyway. If that's true then it seems like the options I have are  
>>>> to write a shell matrix for parallel CSR storage or write some  
>>>> new assembly routines that use MatSetValuesLocal. I'm more  
>>>> concerned about the memory than the overhead of  
>>>> MatSetValuesLocal, but it would certainly be easier on me to use  
>>>> the CSR-based assembly routines we already have.
>>>>
>>>> Thanks,
>>>> Chris
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> What most experimenters take for granted before they begin their  
>>>> experiments is infinitely more interesting than any results to  
>>>> which their experiments lead.
>>>> -- Norbert Wiener
>>>
>>
>

most efficient way to use local CSR matrix?

Reply via email to