most efficient way to use local CSR matrix?

2009-10-29 Thread Chris Kees
I'm just using a simple configuration:
./config/configure.py --with-clanguage=C --with-cc='/usr/bin/mpicc - 
arch x86_64' --with-cxx='/usr/bin/mpicxx -arch x86_64' --without- 
fortran  --with-mpi-compilers --without-shared --without-dynamic -- 
download-parmeti
s=ifneeded --download-spooles=ifneeded

At this point we're not really hung up on this. I was mainly  
continuing to provide information in case it was helpful to you guys.  
The way the matrix is assembled there is no reason to explicitly set  
the type, and so the petsc default on one processor behaves correctly.  
You have to force it to use mpiaij and spooles on one processor to  
produce the error.

Chris

On Oct 29, 2009, at 9:00 PM, Barry Smith wrote:

>
>  Have you run this with the debug version of PETSc? It does  
> additional argument testing that might catch a funny value passed in.
>
>
>   Barry
>
> On Oct 29, 2009, at 5:02 PM, Chris Kees wrote:
>
>> I think I added the correct test code and got a difference of 0,  
>> but spooles is still not returning the correct result, at least not  
>> until the 2nd iteration.  I'm attaching some of the code and output.
>>
>> Chris
>>
>> code for mat creates
>> ---
>>  MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);
>>
>>  MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>>  MatSetSizes(self->m,n,n,N,N);
>>  MatSetFromOptions(self->m);
>>  MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>> >A.colind,(double*)(SMP(L)->A.nzval));
>>  MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>> >A.colind,(double*)(SMP(L)->A.nzval));
>>
>>
>> code for difference in mats
>> ---
>>
>> MatAXPY 
>> (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
>> PetscReal norm=-1.0;
>> MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
>> std::cout<<"inf norm of L - L2 = "<>
>> results for -pc_type lu -pc_factor_mat_solver_package spooles - 
>> ksp_monitor_true_residual -mat_type mpiaij
>> ---
>>
>> inf norm of L - L2 = 0
>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>> On Oct 29, 2009, at 3:56 PM, Barry Smith wrote:
>>
>>>
>>> Hmm, this should not happen. The matrix should be identical in  
>>> both cases (note the initial residual is the same in both cases so  
>>> they may be identical).
>>>
>>> Here's one more thing you can try. In the same routine create TWO  
>>> matrices; one each way, then use MatAXPY() to take the difference  
>>> between them.
>>> Is it zero? If not, that will help lead to where the problem might  
>>> be.
>>>
>>> If that does not help the situation, can you send the code the  
>>> demonstrates this problem to petsc-maint at mcs.anl.gov
>>>
>>>
>>>  Barry
>>>
>>> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>>>
 I pulled the most recent version of petsc-dev and added  
 superlu_dist.  The only case that fails is spooles on 1 processor  
 with an mpiaij matrix created with the Create/SetSizes/...  
 paradigm.  I would suspect something in spooles wrapper code.

 code allocating mat:
 MatCreate(Py_PETSC_COMM_WORLD,&self->m);
 MatSetSizes(self->m,n,n,N,N);
 MatSetType(self->m,MATMPIAIJ);

 results:
 -pc_type lu -pc_factor_mat_solver_packages spooles - 
 ksp_monitor_true_residual (spooles on 1 proc)
 0 KSP preconditioned resid norm 2.284405915285e+01 true resid  
 norm 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
 1 KSP preconditioned resid norm 3.936838941525e-01 true resid  
 norm 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
 2 KSP preconditioned resid norm 7.332823321496e-14 true resid  
 norm 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14

 -pc_type lu -pc_factor_mat_solver_packages spooles - 
 ksp_monitor_true_residual (spooleson 8 procs)
 0 KSP preconditioned resid norm 2.284405915285e+01 true resid  
 norm 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
 1 KSP preconditioned resid norm 2.898588418444e-14 true resid  
 norm 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15

 -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
 ksp_monitor_true_residual
 0 KSP preconditioned resid norm 2.284405915285e+01 true resid  
 norm 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
 1 KSP preconditioned resid norm 2.917052183641e-14 true resid  
 norm 1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14

 code allocating mat:

 MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
 1,PETSC_NULL,max_nNeighbors,P

most efficient way to use local CSR matrix?

2009-10-29 Thread Chris Kees

On Oct 29, 2009, at 8:43 PM, Hong Zhang wrote:

>
> On Oct 29, 2009, at 5:02 PM, Chris Kees wrote:
>
>> I think I added the correct test code and got a difference of 0,  
>> but spooles is still not returning the correct result, at least not  
>> until the 2nd iteration.  I'm attaching some of the code and output.
>
> We have not been actively updating spooles interface since
> Spooles has been out of support by its developers for 10+ years.
> For direct parallel solvers, mumps and superlu_dist have been  
> constantly supported and
> outperform spooles in general.
>
> Why do you use spooles?
>

Inertia mainly.  It doesn't require a fortran compiler, which comes in  
handy on some systems, and it has been a reliable way to make sure  
everything else in the code is working before trying to loosen the  
linear solver tolerances on larger parallel runs. There were also some  
problems about 10 years ago when superlu_dist was not giving reliable  
results on Jacobians from degenerate nonlinear pde's.  If you guys  
aren't going to support the spooles wrapper, it would be nice to  
provide some kind of robust C version of a sparse direct solver in  
it's place for building a minimal petsc for debugging/development/ 
porting.

Chris

> Hong
>>
>> Chris
>>
>> code for mat creates
>> ---
>>  MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);
>>
>>  MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>>  MatSetSizes(self->m,n,n,N,N);
>>  MatSetFromOptions(self->m);
>>  MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>> >A.colind,(double*)(SMP(L)->A.nzval));
>>  MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>> >A.colind,(double*)(SMP(L)->A.nzval));
>>
>>
>> code for difference in mats
>> ---
>>
>> MatAXPY 
>> (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
>> PetscReal norm=-1.0;
>> MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
>> std::cout<<"inf norm of L - L2 = "<>
>> results for -pc_type lu -pc_factor_mat_solver_package spooles - 
>> ksp_monitor_true_residual -mat_type mpiaij
>> ---
>>
>> inf norm of L - L2 = 0
>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>> On Oct 29, 2009, at 3:56 PM, Barry Smith wrote:
>>
>>>
>>> Hmm, this should not happen. The matrix should be identical in  
>>> both cases (note the initial residual is the same in both cases so  
>>> they may be identical).
>>>
>>> Here's one more thing you can try. In the same routine create TWO  
>>> matrices; one each way, then use MatAXPY() to take the difference  
>>> between them.
>>> Is it zero? If not, that will help lead to where the problem might  
>>> be.
>>>
>>> If that does not help the situation, can you send the code the  
>>> demonstrates this problem to petsc-maint at mcs.anl.gov
>>>
>>>
>>>  Barry
>>>
>>> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>>>
 I pulled the most recent version of petsc-dev and added  
 superlu_dist.  The only case that fails is spooles on 1 processor  
 with an mpiaij matrix created with the Create/SetSizes/...  
 paradigm.  I would suspect something in spooles wrapper code.

 code allocating mat:
 MatCreate(Py_PETSC_COMM_WORLD,&self->m);
 MatSetSizes(self->m,n,n,N,N);
 MatSetType(self->m,MATMPIAIJ);

 results:
 -pc_type lu -pc_factor_mat_solver_packages spooles - 
 ksp_monitor_true_residual (spooles on 1 proc)
 0 KSP preconditioned resid norm 2.284405915285e+01 true resid  
 norm 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
 1 KSP preconditioned resid norm 3.936838941525e-01 true resid  
 norm 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
 2 KSP preconditioned resid norm 7.332823321496e-14 true resid  
 norm 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14

 -pc_type lu -pc_factor_mat_solver_packages spooles - 
 ksp_monitor_true_residual (spooleson 8 procs)
 0 KSP preconditioned resid norm 2.284405915285e+01 true resid  
 norm 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
 1 KSP preconditioned resid norm 2.898588418444e-14 true resid  
 norm 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15

 -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
 ksp_monitor_true_residual
 0 KSP preconditioned resid norm 2.284405915285e+01 true resid  
 norm 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
 1 KSP preconditioned resid norm 2.917052183641e-14 true resid  
 norm 1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14

 

most efficient way to use local CSR matrix?

2009-10-29 Thread Barry Smith

   Have you run this with the debug version of PETSc? It does  
additional argument testing that might catch a funny value passed in.


Barry

On Oct 29, 2009, at 5:02 PM, Chris Kees wrote:

> I think I added the correct test code and got a difference of 0, but  
> spooles is still not returning the correct result, at least not  
> until the 2nd iteration.  I'm attaching some of the code and output.
>
> Chris
>
> code for mat creates
> ---
>   MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);
>
>   MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>   MatSetSizes(self->m,n,n,N,N);
>   MatSetFromOptions(self->m);
>   MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>   MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>
>
> code for difference in mats
> ---
>
> MatAXPY 
> (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
> PetscReal norm=-1.0;
> MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
> std::cout<<"inf norm of L - L2 = "<
> results for -pc_type lu -pc_factor_mat_solver_package spooles - 
> ksp_monitor_true_residual -mat_type mpiaij
> ---
>
> inf norm of L - L2 = 0
> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
> On Oct 29, 2009, at 3:56 PM, Barry Smith wrote:
>
>>
>>  Hmm, this should not happen. The matrix should be identical in  
>> both cases (note the initial residual is the same in both cases so  
>> they may be identical).
>>
>>  Here's one more thing you can try. In the same routine create TWO  
>> matrices; one each way, then use MatAXPY() to take the difference  
>> between them.
>> Is it zero? If not, that will help lead to where the problem might  
>> be.
>>
>>  If that does not help the situation, can you send the code the  
>> demonstrates this problem to petsc-maint at mcs.anl.gov
>>
>>
>>   Barry
>>
>> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>>
>>> I pulled the most recent version of petsc-dev and added  
>>> superlu_dist.  The only case that fails is spooles on 1 processor  
>>> with an mpiaij matrix created with the Create/SetSizes/...  
>>> paradigm.  I would suspect something in spooles wrapper code.
>>>
>>> code allocating mat:
>>> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>>> MatSetSizes(self->m,n,n,N,N);
>>> MatSetType(self->m,MATMPIAIJ);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
>>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
>>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooleson 8 procs)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
>>> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
>>> ksp_monitor_true_residual
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
>>> 1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14
>>>
>>> code allocating mat:
>>>
>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
>>> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15
>>>
>>> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:
>>>
 On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees >>> usace.army.mil 
 > wrote:
 Mark, Barry,

 Thanks for the help.  For now I'm sticking with the approach of  
 copying the csr matrix and using the csr data structures to do  
 the preallocations. I'll eventually get around to writing the  
 code for assembling directly into t

most efficient way to use local CSR matrix?

2009-10-29 Thread Hong Zhang

On Oct 29, 2009, at 5:02 PM, Chris Kees wrote:

> I think I added the correct test code and got a difference of 0, but  
> spooles is still not returning the correct result, at least not  
> until the 2nd iteration.  I'm attaching some of the code and output.

We have not been actively updating spooles interface since
Spooles has been out of support by its developers for 10+ years.
For direct parallel solvers, mumps and superlu_dist have been  
constantly supported and
outperform spooles in general.

Why do you use spooles?

Hong
>
> Chris
>
> code for mat creates
> ---
>   MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);
>
>   MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>   MatSetSizes(self->m,n,n,N,N);
>   MatSetFromOptions(self->m);
>   MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>   MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>
>
> code for difference in mats
> ---
>
> MatAXPY 
> (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
> PetscReal norm=-1.0;
> MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
> std::cout<<"inf norm of L - L2 = "<
> results for -pc_type lu -pc_factor_mat_solver_package spooles - 
> ksp_monitor_true_residual -mat_type mpiaij
> ---
>
> inf norm of L - L2 = 0
> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
> On Oct 29, 2009, at 3:56 PM, Barry Smith wrote:
>
>>
>>  Hmm, this should not happen. The matrix should be identical in  
>> both cases (note the initial residual is the same in both cases so  
>> they may be identical).
>>
>>  Here's one more thing you can try. In the same routine create TWO  
>> matrices; one each way, then use MatAXPY() to take the difference  
>> between them.
>> Is it zero? If not, that will help lead to where the problem might  
>> be.
>>
>>  If that does not help the situation, can you send the code the  
>> demonstrates this problem to petsc-maint at mcs.anl.gov
>>
>>
>>   Barry
>>
>> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>>
>>> I pulled the most recent version of petsc-dev and added  
>>> superlu_dist.  The only case that fails is spooles on 1 processor  
>>> with an mpiaij matrix created with the Create/SetSizes/...  
>>> paradigm.  I would suspect something in spooles wrapper code.
>>>
>>> code allocating mat:
>>> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>>> MatSetSizes(self->m,n,n,N,N);
>>> MatSetType(self->m,MATMPIAIJ);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
>>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
>>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooleson 8 procs)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
>>> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
>>> ksp_monitor_true_residual
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
>>> 1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14
>>>
>>> code allocating mat:
>>>
>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>>> 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
>>> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15
>>>
>>> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:
>>>
 On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees >>> usace.army.mil 
 > wrote:
 Mark, Barry,

 Thanks for the help.  For now I'm sticking with the approach of  
 copying the csr matrix and using the csr data

most efficient way to use local CSR matrix?

2009-10-29 Thread Chris Kees
I think I added the correct test code and got a difference of 0, but  
spooles is still not returning the correct result, at least not until  
the 2nd iteration.  I'm attaching some of the code and output.

Chris

code for mat creates
---
MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);

MatCreate(Py_PETSC_COMM_WORLD,&self->m);
MatSetSizes(self->m,n,n,N,N);
MatSetFromOptions(self->m);
MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
 >A.colind,(double*)(SMP(L)->A.nzval));
MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
 >A.colind,(double*)(SMP(L)->A.nzval));


code for difference in mats
---

   
MatAXPY 
(PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
  PetscReal norm=-1.0;
  MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
  std::cout<<"inf norm of L - L2 = "<
>   Hmm, this should not happen. The matrix should be identical in  
> both cases (note the initial residual is the same in both cases so  
> they may be identical).
>
>   Here's one more thing you can try. In the same routine create TWO  
> matrices; one each way, then use MatAXPY() to take the difference  
> between them.
> Is it zero? If not, that will help lead to where the problem might be.
>
>   If that does not help the situation, can you send the code the  
> demonstrates this problem to petsc-maint at mcs.anl.gov
>
>
>Barry
>
> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>
>> I pulled the most recent version of petsc-dev and added  
>> superlu_dist.  The only case that fails is spooles on 1 processor  
>> with an mpiaij matrix created with the Create/SetSizes/...  
>> paradigm.  I would suspect something in spooles wrapper code.
>>
>> code allocating mat:
>> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>> MatSetSizes(self->m,n,n,N,N);
>> MatSetType(self->m,MATMPIAIJ);
>>
>> results:
>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>> ksp_monitor_true_residual (spooles on 1 proc)
>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>>
>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>> ksp_monitor_true_residual (spooleson 8 procs)
>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>> 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
>> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15
>>
>> -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
>> ksp_monitor_true_residual
>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>> 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
>> 1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14
>>
>> code allocating mat:
>>
>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>
>> results:
>> -pc_type lu -pc_factor_mat_solver_packages spooles - 
>> ksp_monitor_true_residual (spooles on 1 proc)
>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
>> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>> 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
>> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15
>>
>> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:
>>
>>> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees >> usace.army.mil 
>>> > wrote:
>>> Mark, Barry,
>>>
>>> Thanks for the help.  For now I'm sticking with the approach of  
>>> copying the csr matrix and using the csr data structures to do the  
>>> preallocations. I'll eventually get around to writing the code for  
>>> assembling directly into the petsc matrix.  I have two more  
>>> questions.
>>>
>>> 1) On 1 processor, the linear solver is not converging in 1  
>>> iteration using -pc_type lu -pc_factor_mat_solver_package spooles.  
>>> It converges in one iteration in parallel or on a single processor  
>>> if I set -mat_type seqaij. Also, if I go back to the old way of  
>>> constructing the matrix it DOES converge in 1 iteration on 1  
>>> processor. That is, if I replace the MatCreate/MatSetSizes/...   
>>> calls from the earlier email, with what I had originally:
>>>
>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>
>>> then the solver converges in 1 iteration as expected.  This  
>>> wouldn't be a big deal except that we're trying to verify that our  
>>> quadratic finite elements are working, and the behavior for p1 and  
>>> p2 elements is different. The above obser

most efficient way to use local CSR matrix?

2009-10-29 Thread Barry Smith

Hmm, this should not happen. The matrix should be identical in  
both cases (note the initial residual is the same in both cases so  
they may be identical).

Here's one more thing you can try. In the same routine create TWO  
matrices; one each way, then use MatAXPY() to take the difference  
between them.
Is it zero? If not, that will help lead to where the problem might be.

If that does not help the situation, can you send the code the  
demonstrates this problem to petsc-maint at mcs.anl.gov


 Barry

On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:

> I pulled the most recent version of petsc-dev and added  
> superlu_dist.  The only case that fails is spooles on 1 processor  
> with an mpiaij matrix created with the Create/SetSizes/...  
> paradigm.  I would suspect something in spooles wrapper code.
>
> code allocating mat:
> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
> MatSetSizes(self->m,n,n,N,N);
> MatSetType(self->m,MATMPIAIJ);
>
> results:
> -pc_type lu -pc_factor_mat_solver_packages spooles - 
> ksp_monitor_true_residual (spooles on 1 proc)
>  0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>  1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>  2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>
> -pc_type lu -pc_factor_mat_solver_packages spooles - 
> ksp_monitor_true_residual (spooleson 8 procs)
> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>  1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15
>
> -pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
> ksp_monitor_true_residual
> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>  1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
> 1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14
>
> code allocating mat:
>
> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>
> results:
> -pc_type lu -pc_factor_mat_solver_packages spooles - 
> ksp_monitor_true_residual (spooles on 1 proc)
>  0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
> 1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
>  1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15
>
> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:
>
>> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees > usace.army.mil 
>> > wrote:
>> Mark, Barry,
>>
>> Thanks for the help.  For now I'm sticking with the approach of  
>> copying the csr matrix and using the csr data structures to do the  
>> preallocations. I'll eventually get around to writing the code for  
>> assembling directly into the petsc matrix.  I have two more  
>> questions.
>>
>> 1) On 1 processor, the linear solver is not converging in 1  
>> iteration using -pc_type lu -pc_factor_mat_solver_package spooles.  
>> It converges in one iteration in parallel or on a single processor  
>> if I set -mat_type seqaij. Also, if I go back to the old way of  
>> constructing the matrix it DOES converge in 1 iteration on 1  
>> processor. That is, if I replace the MatCreate/MatSetSizes/...   
>> calls from the earlier email, with what I had originally:
>>
>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>
>> then the solver converges in 1 iteration as expected.  This  
>> wouldn't be a big deal except that we're trying to verify that our  
>> quadratic finite elements are working, and the behavior for p1 and  
>> p2 elements is different. The above observations are for p2, while  
>> all the cases work as expected for p1. Should we just not be using  
>> mpiaij at all on 1 processor?
>>
>> No, you have a problem.
>>
>> 2) In the code snippet I took from the petsc example there are two  
>> preallocation calls (see below). Is that correct, or should I be  
>> checking the matrix type and calling only the preallocation  
>> function for the actual matrix type? i.e. something like
>>
>> MatCreate(...
>> MatSetSizes(...
>> MatSetFromOptions(...
>> MatGetType(...
>> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>>
>> It does not hurt to call them all.
>>
>>   Matt
>>
>> Chris
>>
>> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>>
>> Chris, just a note,
>>
>> Perhaps I missed something here but do something similar to you, eg  
>> overlapping partitions, and PETSc is setup very well to be  
>> intrusive in your code (I sometimes write a little 'addvalue'  
>> wrapper) and assemble your FE matrices directly into a global  
>> matrix.  You use the (global) MatSetValues(...) and set the matr

most efficient way to use local CSR matrix?

2009-10-29 Thread Chris Kees
I pulled the most recent version of petsc-dev and added superlu_dist.   
The only case that fails is spooles on 1 processor with an mpiaij  
matrix created with the Create/SetSizes/... paradigm.  I would suspect  
something in spooles wrapper code.

code allocating mat:
MatCreate(Py_PETSC_COMM_WORLD,&self->m);
MatSetSizes(self->m,n,n,N,N);
MatSetType(self->m,MATMPIAIJ);

results:
-pc_type lu -pc_factor_mat_solver_packages spooles - 
ksp_monitor_true_residual (spooles on 1 proc)
  0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
  1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
  2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14

-pc_type lu -pc_factor_mat_solver_packages spooles - 
ksp_monitor_true_residual (spooleson 8 procs)
0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
  1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15

-pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
ksp_monitor_true_residual
0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
  1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
1.299023464819e-13 ||Ae||/||Ax|| 1.63471084e-14

code allocating mat:

MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);

results:
-pc_type lu -pc_factor_mat_solver_packages spooles - 
ksp_monitor_true_residual (spooles on 1 proc)
  0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.e+00
  1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15

On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:

> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees  usace.army.mil 
> > wrote:
> Mark, Barry,
>
> Thanks for the help.  For now I'm sticking with the approach of  
> copying the csr matrix and using the csr data structures to do the  
> preallocations. I'll eventually get around to writing the code for  
> assembling directly into the petsc matrix.  I have two more questions.
>
> 1) On 1 processor, the linear solver is not converging in 1  
> iteration using -pc_type lu -pc_factor_mat_solver_package spooles.  
> It converges in one iteration in parallel or on a single processor  
> if I set -mat_type seqaij. Also, if I go back to the old way of  
> constructing the matrix it DOES converge in 1 iteration on 1  
> processor. That is, if I replace the MatCreate/MatSetSizes/...   
> calls from the earlier email, with what I had originally:
>
> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>
> then the solver converges in 1 iteration as expected.  This wouldn't  
> be a big deal except that we're trying to verify that our quadratic  
> finite elements are working, and the behavior for p1 and p2 elements  
> is different. The above observations are for p2, while all the cases  
> work as expected for p1. Should we just not be using mpiaij at all  
> on 1 processor?
>
> No, you have a problem.
>
> 2) In the code snippet I took from the petsc example there are two  
> preallocation calls (see below). Is that correct, or should I be  
> checking the matrix type and calling only the preallocation function  
> for the actual matrix type? i.e. something like
>
> MatCreate(...
> MatSetSizes(...
> MatSetFromOptions(...
> MatGetType(...
> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>
> It does not hurt to call them all.
>
>   Matt
>
> Chris
>
> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>
> Chris, just a note,
>
> Perhaps I missed something here but do something similar to you, eg  
> overlapping partitions, and PETSc is setup very well to be intrusive  
> in your code (I sometimes write a little 'addvalue' wrapper) and  
> assemble your FE matrices directly into a global matrix.  You use  
> the (global) MatSetValues(...) and set the matrix option  
> MAT_IGNORE_OFF_PROC_ENTRIES (or something like that) so that these  
> off processor matrix entries, which are computed redundantly, are  
> thrown away and you get the correct matrix.
>
> As Barry said you don't need exact non-zero counts to allocate  
> storage in PETSC matrices, just don't underestimate.
>
> Mark
>
> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>
>
> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>
> Thanks. Just to make sure I'm following, when I create the matrix I  
> do:
>
>   MatCreate(PETSC_COMM_WORLD,&self->m);
>   MatSetSizes(self->m,n,n,N,N);
>   MatSetType(self->m,MATMPIAIJ);
>   MatSetFromOptions(self->m);
>   MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr

most efficient way to use local CSR matrix?

2009-10-29 Thread Matthew Knepley
On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <
christopher.e.kees at usace.army.mil> wrote:

> Mark, Barry,
>
> Thanks for the help.  For now I'm sticking with the approach of copying the
> csr matrix and using the csr data structures to do the preallocations. I'll
> eventually get around to writing the code for assembling directly into the
> petsc matrix.  I have two more questions.
>
> 1) On 1 processor, the linear solver is not converging in 1 iteration using
> -pc_type lu -pc_factor_mat_solver_package spooles. It converges in one
> iteration in parallel or on a single processor if I set -mat_type seqaij.
> Also, if I go back to the old way of constructing the matrix it DOES
> converge in 1 iteration on 1 processor. That is, if I replace the
> MatCreate/MatSetSizes/...  calls from the earlier email, with what I had
> originally:
>
>
> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N,1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>
> then the solver converges in 1 iteration as expected.  This wouldn't be a
> big deal except that we're trying to verify that our quadratic finite
> elements are working, and the behavior for p1 and p2 elements is different.
> The above observations are for p2, while all the cases work as expected for
> p1. Should we just not be using mpiaij at all on 1 processor?
>

No, you have a problem.


> 2) In the code snippet I took from the petsc example there are two
> preallocation calls (see below). Is that correct, or should I be checking
> the matrix type and calling only the preallocation function for the actual
> matrix type? i.e. something like
>
> MatCreate(...
> MatSetSizes(...
> MatSetFromOptions(...
> MatGetType(...
> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>

It does not hurt to call them all.

  Matt


> Chris
>
> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>
>  Chris, just a note,
>>
>> Perhaps I missed something here but do something similar to you, eg
>> overlapping partitions, and PETSc is setup very well to be intrusive in your
>> code (I sometimes write a little 'addvalue' wrapper) and assemble your FE
>> matrices directly into a global matrix.  You use the (global)
>> MatSetValues(...) and set the matrix option MAT_IGNORE_OFF_PROC_ENTRIES (or
>> something like that) so that these off processor matrix entries, which are
>> computed redundantly, are thrown away and you get the correct matrix.
>>
>> As Barry said you don't need exact non-zero counts to allocate storage in
>> PETSC matrices, just don't underestimate.
>>
>> Mark
>>
>> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>>
>>
>>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>>>
>>>  Thanks. Just to make sure I'm following, when I create the matrix I do:

   MatCreate(PETSC_COMM_WORLD,&self->m);
   MatSetSizes(self->m,n,n,N,N);
   MatSetType(self->m,MATMPIAIJ);
   MatSetFromOptions(self->m);

 MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)->A.colind,(double*)(SMP(L)->A.nzval));

 MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)->A.colind,(double*)(SMP(L)->A.nzval));

 and then I copy the matrix at each Newton iteration using the code in
 the first message below.  You said "...PreallocationCSR() does the copy 
 very
 efficiently", but I don't think I'm supposed to repeatedly call it, right?
  Looking at the output some more it does seem like building the
 preconditioner initially is taking much more time than the initial pass
 through MatSetValues.

>>>
>>>  If the matrix nonzero structure stays the same then yes you need call
>>> MatMPIAIJSetPreallocationCSR() only once and use the MatSetValues() calls
>>> each Newton step to update the nonzeros.
>>>
>>>  Barry
>>>
>>>
 Chris

 On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:


> Here's the deal. In parallel PETSc does not use a single CSR matrix on
> each process to hold the MPIAIJ matrix. Hence if you store the matrix on a
> process as a single CSR matrix then it has to be copied into the PETSc
> datastructure. MatMPIAIJSetPreallocationCSR() does the copy very
> efficiently. But you are right, until you free YOUR rowpt[], colind[] and
> a[] there are two copies of the matrix.
> One could argue that this is ok, because anyways any preconditioner
> worth anything will take up at least that much memory later anyways.  For
> example, if you use the PETSc default preconditioner, block Jacobi with
> ILU(0) on each block it will take pretty much the same amount of memory.
>
> For people who use SOR or some other preconditioner that requires very
> little memory this is not satisfactory because they feel they could run
> larger problems without the copy.
>
> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and live
> with any memory limitation. The reason is that doing more than that has
> orders of magnitude more pain with generally little payoff.
>

most efficient way to use local CSR matrix?

2009-10-29 Thread Chris Kees
Mark, Barry,

Thanks for the help.  For now I'm sticking with the approach of  
copying the csr matrix and using the csr data structures to do the  
preallocations. I'll eventually get around to writing the code for  
assembling directly into the petsc matrix.  I have two more questions.

1) On 1 processor, the linear solver is not converging in 1 iteration  
using -pc_type lu -pc_factor_mat_solver_package spooles. It converges  
in one iteration in parallel or on a single processor if I set - 
mat_type seqaij. Also, if I go back to the old way of constructing the  
matrix it DOES converge in 1 iteration on 1 processor. That is, if I  
replace the MatCreate/MatSetSizes/...  calls from the earlier email,  
with what I had originally:

MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);

then the solver converges in 1 iteration as expected.  This wouldn't  
be a big deal except that we're trying to verify that our quadratic  
finite elements are working, and the behavior for p1 and p2 elements  
is different. The above observations are for p2, while all the cases  
work as expected for p1. Should we just not be using mpiaij at all on  
1 processor?

2) In the code snippet I took from the petsc example there are two  
preallocation calls (see below). Is that correct, or should I be  
checking the matrix type and calling only the preallocation function  
for the actual matrix type? i.e. something like

MatCreate(...
MatSetSizes(...
MatSetFromOptions(...
MatGetType(...
if type==mpiaij: MatMPIAIJSetPreallocationCSR(...

Chris

On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:

> Chris, just a note,
>
> Perhaps I missed something here but do something similar to you, eg  
> overlapping partitions, and PETSc is setup very well to be intrusive  
> in your code (I sometimes write a little 'addvalue' wrapper) and  
> assemble your FE matrices directly into a global matrix.  You use  
> the (global) MatSetValues(...) and set the matrix option  
> MAT_IGNORE_OFF_PROC_ENTRIES (or something like that) so that these  
> off processor matrix entries, which are computed redundantly, are  
> thrown away and you get the correct matrix.
>
> As Barry said you don't need exact non-zero counts to allocate  
> storage in PETSC matrices, just don't underestimate.
>
> Mark
>
> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>
>>
>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>>
>>> Thanks. Just to make sure I'm following, when I create the matrix  
>>> I do:
>>>
>>>MatCreate(PETSC_COMM_WORLD,&self->m);
>>>MatSetSizes(self->m,n,n,N,N);
>>>MatSetType(self->m,MATMPIAIJ);
>>>MatSetFromOptions(self->m);
>>>MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>>> >A.colind,(double*)(SMP(L)->A.nzval));
>>>MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>>> >A.colind,(double*)(SMP(L)->A.nzval));
>>>
>>> and then I copy the matrix at each Newton iteration using the code  
>>> in the first message below.  You said "...PreallocationCSR() does  
>>> the copy very efficiently", but I don't think I'm supposed to  
>>> repeatedly call it, right?  Looking at the output some more it  
>>> does seem like building the preconditioner initially is taking  
>>> much more time than the initial pass through MatSetValues.
>>
>>  If the matrix nonzero structure stays the same then yes you need  
>> call MatMPIAIJSetPreallocationCSR() only once and use the  
>> MatSetValues() calls each Newton step to update the nonzeros.
>>
>>  Barry
>>
>>>
>>> Chris
>>>
>>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>>>

 Here's the deal. In parallel PETSc does not use a single CSR  
 matrix on each process to hold the MPIAIJ matrix. Hence if you  
 store the matrix on a process as a single CSR matrix then it has  
 to be copied into the PETSc datastructure.  
 MatMPIAIJSetPreallocationCSR() does the copy very efficiently.  
 But you are right, until you free YOUR rowpt[], colind[] and a[]  
 there are two copies of the matrix.
 One could argue that this is ok, because anyways any  
 preconditioner worth anything will take up at least that much  
 memory later anyways.  For example, if you use the PETSc default  
 preconditioner, block Jacobi with ILU(0) on each block it will  
 take pretty much the same amount of memory.

 For people who use SOR or some other preconditioner that requires  
 very little memory this is not satisfactory because they feel  
 they could run larger problems without the copy.

 I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
 live with any memory limitation. The reason is that doing more  
 than that has orders of magnitude more pain with generally little  
 payoff.
 If you like pain then you can
 1) change your code to not build directly into CSR, instead call  
 MatSetValuesLocal() as you compute the entries. With this you  
>>

most efficient way to use local CSR matrix?

2009-10-21 Thread Mark F. Adams
Chris, just a note,

Perhaps I missed something here but do something similar to you, eg  
overlapping partitions, and PETSc is setup very well to be intrusive  
in your code (I sometimes write a little 'addvalue' wrapper) and  
assemble your FE matrices directly into a global matrix.  You use the  
(global) MatSetValues(...) and set the matrix option  
MAT_IGNORE_OFF_PROC_ENTRIES (or something like that) so that these off  
processor matrix entries, which are computed redundantly, are thrown  
away and you get the correct matrix.

As Barry said you don't need exact non-zero counts to allocate storage  
in PETSC matrices, just don't underestimate.

Mark

On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:

>
> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>
>> Thanks. Just to make sure I'm following, when I create the matrix I  
>> do:
>>
>> MatCreate(PETSC_COMM_WORLD,&self->m);
>> MatSetSizes(self->m,n,n,N,N);
>> MatSetType(self->m,MATMPIAIJ);
>> MatSetFromOptions(self->m);
>> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>> >A.colind,(double*)(SMP(L)->A.nzval));
>> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
>> >A.colind,(double*)(SMP(L)->A.nzval));
>>
>> and then I copy the matrix at each Newton iteration using the code  
>> in the first message below.  You said "...PreallocationCSR() does  
>> the copy very efficiently", but I don't think I'm supposed to  
>> repeatedly call it, right?  Looking at the output some more it does  
>> seem like building the preconditioner initially is taking much more  
>> time than the initial pass through MatSetValues.
>
>   If the matrix nonzero structure stays the same then yes you need  
> call MatMPIAIJSetPreallocationCSR() only once and use the  
> MatSetValues() calls each Newton step to update the nonzeros.
>
>   Barry
>
>>
>> Chris
>>
>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>>
>>>
>>> Here's the deal. In parallel PETSc does not use a single CSR  
>>> matrix on each process to hold the MPIAIJ matrix. Hence if you  
>>> store the matrix on a process as a single CSR matrix then it has  
>>> to be copied into the PETSc datastructure.  
>>> MatMPIAIJSetPreallocationCSR() does the copy very efficiently. But  
>>> you are right, until you free YOUR rowpt[], colind[] and a[] there  
>>> are two copies of the matrix.
>>> One could argue that this is ok, because anyways any  
>>> preconditioner worth anything will take up at least that much  
>>> memory later anyways.  For example, if you use the PETSc default  
>>> preconditioner, block Jacobi with ILU(0) on each block it will  
>>> take pretty much the same amount of memory.
>>>
>>> For people who use SOR or some other preconditioner that requires  
>>> very little memory this is not satisfactory because they feel they  
>>> could run larger problems without the copy.
>>>
>>> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
>>> live with any memory limitation. The reason is that doing more  
>>> than that has orders of magnitude more pain with generally little  
>>> payoff.
>>> If you like pain then you can
>>> 1) change your code to not build directly into CSR, instead call  
>>> MatSetValuesLocal() as you compute the entries. With this you need  
>>> to figure out the preallocation which can be painful code.
>>> 2) write an entire matrix class that stores the matrix like you  
>>> store it, not like you store it. This is a huge project, since you  
>>> have to write your own matrix-vector, ILU factorization etc.
>>>
>>> Barry
>>>
>>>
>>>
>>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>>>
 Hi,

 Our unstructured finite element code does a fairly standard  
 overlapping decomposition that allows it to calculate all the non- 
 zero column entries for the rows owned by the processor (rows  
 0...ldim-1 in the local numbering).  We assemble them into a  
 local CSR matrix and then copy them into a PETSc matrix like this:

 for (int i=0;i>>> {
 irow[0] = i;
 MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
 rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
 }

 where rowptr and colind are the CSR indexing arrays in the local  
 numbering.

 I would like to eliminate the duplicate memory (and the copy  
 above). Is there a recommended way to let PETSc reference array  
 'a' directly or a way to get rid of 'a' and translate our CSR  
 assembly routines to use low level PETSc data structures?

 So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
 the first solve, but the documentation is clear that 'a' is not a  
 pointer to the AIJ storage so I still need the copy. I tried  
 using MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to  
 zero but I got indexing errors, and the documentation really  
 seems to imply that the local matrix is really not a standard CSR  
 anyway. If that's true

most efficient way to use local CSR matrix?

2009-10-21 Thread Barry Smith

On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:

> Thanks. Just to make sure I'm following, when I create the matrix I  
> do:
>
>  MatCreate(PETSC_COMM_WORLD,&self->m);
>  MatSetSizes(self->m,n,n,N,N);
>  MatSetType(self->m,MATMPIAIJ);
>  MatSetFromOptions(self->m);
>  MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>  MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>
> and then I copy the matrix at each Newton iteration using the code  
> in the first message below.  You said "...PreallocationCSR() does  
> the copy very efficiently", but I don't think I'm supposed to  
> repeatedly call it, right?  Looking at the output some more it does  
> seem like building the preconditioner initially is taking much more  
> time than the initial pass through MatSetValues.

If the matrix nonzero structure stays the same then yes you need  
call MatMPIAIJSetPreallocationCSR() only once and use the  
MatSetValues() calls each Newton step to update the nonzeros.

Barry

>
> Chris
>
> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>
>>
>>  Here's the deal. In parallel PETSc does not use a single CSR  
>> matrix on each process to hold the MPIAIJ matrix. Hence if you  
>> store the matrix on a process as a single CSR matrix then it has to  
>> be copied into the PETSc datastructure.  
>> MatMPIAIJSetPreallocationCSR() does the copy very efficiently. But  
>> you are right, until you free YOUR rowpt[], colind[] and a[] there  
>> are two copies of the matrix.
>> One could argue that this is ok, because anyways any preconditioner  
>> worth anything will take up at least that much memory later  
>> anyways.  For example, if you use the PETSc default preconditioner,  
>> block Jacobi with ILU(0) on each block it will take pretty much the  
>> same amount of memory.
>>
>>  For people who use SOR or some other preconditioner that requires  
>> very little memory this is not satisfactory because they feel they  
>> could run larger problems without the copy.
>>
>>  I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
>> live with any memory limitation. The reason is that doing more than  
>> that has orders of magnitude more pain with generally little payoff.
>> If you like pain then you can
>> 1) change your code to not build directly into CSR, instead call  
>> MatSetValuesLocal() as you compute the entries. With this you need  
>> to figure out the preallocation which can be painful code.
>> 2) write an entire matrix class that stores the matrix like you  
>> store it, not like you store it. This is a huge project, since you  
>> have to write your own matrix-vector, ILU factorization etc.
>>
>>  Barry
>>
>>
>>
>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>>
>>> Hi,
>>>
>>> Our unstructured finite element code does a fairly standard  
>>> overlapping decomposition that allows it to calculate all the non- 
>>> zero column entries for the rows owned by the processor (rows  
>>> 0...ldim-1 in the local numbering).  We assemble them into a local  
>>> CSR matrix and then copy them into a PETSc matrix like this:
>>>
>>> for (int i=0;i>> {
>>> irow[0] = i;
>>> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
>>> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
>>> }
>>>
>>> where rowptr and colind are the CSR indexing arrays in the local  
>>> numbering.
>>>
>>> I would like to eliminate the duplicate memory (and the copy  
>>> above). Is there a recommended way to let PETSc reference array  
>>> 'a' directly or a way to get rid of 'a' and translate our CSR  
>>> assembly routines to use low level PETSc data structures?
>>>
>>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
>>> the first solve, but the documentation is clear that 'a' is not a  
>>> pointer to the AIJ storage so I still need the copy. I tried using  
>>> MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but  
>>> I got indexing errors, and the documentation really seems to imply  
>>> that the local matrix is really not a standard CSR anyway. If  
>>> that's true then it seems like the options I have are to write a  
>>> shell matrix for parallel CSR storage or write some new assembly  
>>> routines that use MatSetValuesLocal. I'm more concerned about the  
>>> memory than the overhead of MatSetValuesLocal, but it would  
>>> certainly be easier on me to use the CSR-based assembly routines  
>>> we already have.
>>>
>>> Thanks,
>>> Chris
>>>
>>>
>>>
>>
>




most efficient way to use local CSR matrix?

2009-10-20 Thread Chris Kees
Thanks. Just to make sure I'm following, when I create the matrix I do:

   MatCreate(PETSC_COMM_WORLD,&self->m);
   MatSetSizes(self->m,n,n,N,N);
   MatSetType(self->m,MATMPIAIJ);
   MatSetFromOptions(self->m);
   MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
 >A.colind,(double*)(SMP(L)->A.nzval));
   MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
 >A.colind,(double*)(SMP(L)->A.nzval));

and then I copy the matrix at each Newton iteration using the code in  
the first message below.  You said "...PreallocationCSR() does the  
copy very efficiently", but I don't think I'm supposed to repeatedly  
call it, right?  Looking at the output some more it does seem like  
building the preconditioner initially is taking much more time than  
the initial pass through MatSetValues.

Chris

On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:

>
>   Here's the deal. In parallel PETSc does not use a single CSR  
> matrix on each process to hold the MPIAIJ matrix. Hence if you store  
> the matrix on a process as a single CSR matrix then it has to be  
> copied into the PETSc datastructure. MatMPIAIJSetPreallocationCSR()  
> does the copy very efficiently. But you are right, until you free  
> YOUR rowpt[], colind[] and a[] there are two copies of the matrix.
> One could argue that this is ok, because anyways any preconditioner  
> worth anything will take up at least that much memory later  
> anyways.  For example, if you use the PETSc default preconditioner,  
> block Jacobi with ILU(0) on each block it will take pretty much the  
> same amount of memory.
>
>   For people who use SOR or some other preconditioner that requires  
> very little memory this is not satisfactory because they feel they  
> could run larger problems without the copy.
>
>   I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
> live with any memory limitation. The reason is that doing more than  
> that has orders of magnitude more pain with generally little payoff.
> If you like pain then you can
> 1) change your code to not build directly into CSR, instead call  
> MatSetValuesLocal() as you compute the entries. With this you need  
> to figure out the preallocation which can be painful code.
> 2) write an entire matrix class that stores the matrix like you  
> store it, not like you store it. This is a huge project, since you  
> have to write your own matrix-vector, ILU factorization etc.
>
>   Barry
>
>
>
> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>
>> Hi,
>>
>> Our unstructured finite element code does a fairly standard  
>> overlapping decomposition that allows it to calculate all the non- 
>> zero column entries for the rows owned by the processor (rows  
>> 0...ldim-1 in the local numbering).  We assemble them into a local  
>> CSR matrix and then copy them into a PETSc matrix like this:
>>
>> for (int i=0;i> {
>>  irow[0] = i;
>>  MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
>> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
>> }
>>
>> where rowptr and colind are the CSR indexing arrays in the local  
>> numbering.
>>
>> I would like to eliminate the duplicate memory (and the copy  
>> above). Is there a recommended way to let PETSc reference array 'a'  
>> directly or a way to get rid of 'a' and translate our CSR assembly  
>> routines to use low level PETSc data structures?
>>
>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
>> the first solve, but the documentation is clear that 'a' is not a  
>> pointer to the AIJ storage so I still need the copy. I tried using  
>> MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but  
>> I got indexing errors, and the documentation really seems to imply  
>> that the local matrix is really not a standard CSR anyway. If  
>> that's true then it seems like the options I have are to write a  
>> shell matrix for parallel CSR storage or write some new assembly  
>> routines that use MatSetValuesLocal. I'm more concerned about the  
>> memory than the overhead of MatSetValuesLocal, but it would  
>> certainly be easier on me to use the CSR-based assembly routines we  
>> already have.
>>
>> Thanks,
>> Chris
>>
>>
>>
>




most efficient way to use local CSR matrix?

2009-10-20 Thread Barry Smith

Here's the deal. In parallel PETSc does not use a single CSR  
matrix on each process to hold the MPIAIJ matrix. Hence if you store  
the matrix on a process as a single CSR matrix then it has to be  
copied into the PETSc datastructure. MatMPIAIJSetPreallocationCSR()  
does the copy very efficiently. But you are right, until you free YOUR  
rowpt[], colind[] and a[] there are two copies of the matrix.
One could argue that this is ok, because anyways any preconditioner  
worth anything will take up at least that much memory later anyways.   
For example, if you use the PETSc default preconditioner, block Jacobi  
with ILU(0) on each block it will take pretty much the same amount of  
memory.

For people who use SOR or some other preconditioner that requires  
very little memory this is not satisfactory because they feel they  
could run larger problems without the copy.

I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
live with any memory limitation. The reason is that doing more than  
that has orders of magnitude more pain with generally little payoff.
If you like pain then you can
1) change your code to not build directly into CSR, instead call  
MatSetValuesLocal() as you compute the entries. With this you need to  
figure out the preallocation which can be painful code.
2) write an entire matrix class that stores the matrix like you store  
it, not like you store it. This is a huge project, since you have to  
write your own matrix-vector, ILU factorization etc.

Barry



On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:

> Hi,
>
> Our unstructured finite element code does a fairly standard  
> overlapping decomposition that allows it to calculate all the non- 
> zero column entries for the rows owned by the processor (rows  
> 0...ldim-1 in the local numbering).  We assemble them into a local  
> CSR matrix and then copy them into a PETSc matrix like this:
>
> for (int i=0;i {
>   irow[0] = i;
>   MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
> }
>
> where rowptr and colind are the CSR indexing arrays in the local  
> numbering.
>
> I would like to eliminate the duplicate memory (and the copy above).  
> Is there a recommended way to let PETSc reference array 'a' directly  
> or a way to get rid of 'a' and translate our CSR assembly routines  
> to use low level PETSc data structures?
>
> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
> the first solve, but the documentation is clear that 'a' is not a  
> pointer to the AIJ storage so I still need the copy. I tried using  
> MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but I  
> got indexing errors, and the documentation really seems to imply  
> that the local matrix is really not a standard CSR anyway. If that's  
> true then it seems like the options I have are to write a shell  
> matrix for parallel CSR storage or write some new assembly routines  
> that use MatSetValuesLocal. I'm more concerned about the memory than  
> the overhead of MatSetValuesLocal, but it would certainly be easier  
> on me to use the CSR-based assembly routines we already have.
>
> Thanks,
> Chris
>
>
>




most efficient way to use local CSR matrix?

2009-10-20 Thread Chris Kees
Hi,

Our unstructured finite element code does a fairly standard  
overlapping decomposition that allows it to calculate all the non-zero  
column entries for the rows owned by the processor (rows 0...ldim-1 in  
the local numbering).  We assemble them into a local CSR matrix and  
then copy them into a PETSc matrix like this:

for (int i=0;ihttp://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091020/8e825b38/attachment.html>