Fwd: People who need to solve large systems?

2009-10-21 Thread Matthew Knepley
This stuff is very cool, and based on very interesting mathematics. Anyone
have matrices
of this type?

  Thanks,

Matt

-- Forwarded message --
From: Gary Miller glmil...@cs.cmu.edu
Date: Tue, Oct 20, 2009 at 2:46 PM
Subject: People who need to solve large systems?
To: Matthew Knepley knepley at gmail.com
Cc: Yiannis Koutis jkoutis at cs.cmu.edu, Dave Tolliver tolliver at 
cs.cmu.edu


Hi Matt,

As you may know we have been developing a fast solver of symmetric
diagonally dominate systems.
We recent got some money for Microsoft to release a public version of the
solver and are looking for
users.  The soft will be free, for now, and we can help people come up to
speed on using it.

We are looking for people who are solving problems  with at lease 400,000
variables in 2D or  100,000 in 3D.
For smaller systems  other soft seems to work OK.

Given that you worked on Petsc you may have a better idea who we could help.


Thanks

Gary Miller
glmiller at cs.cmu.edu


-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091021/770a2325/attachment.html


most efficient way to use local CSR matrix?

2009-10-21 Thread Barry Smith

On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:

 Thanks. Just to make sure I'm following, when I create the matrix I  
 do:

  MatCreate(PETSC_COMM_WORLD,self-m);
  MatSetSizes(self-m,n,n,N,N);
  MatSetType(self-m,MATMPIAIJ);
  MatSetFromOptions(self-m);
  MatSeqAIJSetPreallocationCSR(self-m,SMP(L)-A.rowptr,SMP(L)- 
 A.colind,(double*)(SMP(L)-A.nzval));
  MatMPIAIJSetPreallocationCSR(self-m,SMP(L)-A.rowptr,SMP(L)- 
 A.colind,(double*)(SMP(L)-A.nzval));

 and then I copy the matrix at each Newton iteration using the code  
 in the first message below.  You said ...PreallocationCSR() does  
 the copy very efficiently, but I don't think I'm supposed to  
 repeatedly call it, right?  Looking at the output some more it does  
 seem like building the preconditioner initially is taking much more  
 time than the initial pass through MatSetValues.

If the matrix nonzero structure stays the same then yes you need  
call MatMPIAIJSetPreallocationCSR() only once and use the  
MatSetValues() calls each Newton step to update the nonzeros.

Barry


 Chris

 On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:


  Here's the deal. In parallel PETSc does not use a single CSR  
 matrix on each process to hold the MPIAIJ matrix. Hence if you  
 store the matrix on a process as a single CSR matrix then it has to  
 be copied into the PETSc datastructure.  
 MatMPIAIJSetPreallocationCSR() does the copy very efficiently. But  
 you are right, until you free YOUR rowpt[], colind[] and a[] there  
 are two copies of the matrix.
 One could argue that this is ok, because anyways any preconditioner  
 worth anything will take up at least that much memory later  
 anyways.  For example, if you use the PETSc default preconditioner,  
 block Jacobi with ILU(0) on each block it will take pretty much the  
 same amount of memory.

  For people who use SOR or some other preconditioner that requires  
 very little memory this is not satisfactory because they feel they  
 could run larger problems without the copy.

  I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
 live with any memory limitation. The reason is that doing more than  
 that has orders of magnitude more pain with generally little payoff.
 If you like pain then you can
 1) change your code to not build directly into CSR, instead call  
 MatSetValuesLocal() as you compute the entries. With this you need  
 to figure out the preallocation which can be painful code.
 2) write an entire matrix class that stores the matrix like you  
 store it, not like you store it. This is a huge project, since you  
 have to write your own matrix-vector, ILU factorization etc.

  Barry



 On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:

 Hi,

 Our unstructured finite element code does a fairly standard  
 overlapping decomposition that allows it to calculate all the non- 
 zero column entries for the rows owned by the processor (rows  
 0...ldim-1 in the local numbering).  We assemble them into a local  
 CSR matrix and then copy them into a PETSc matrix like this:

 for (int i=0;ildim;i++)
 {
 irow[0] = i;
 MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
 rowptr[i],colind[rowptr[i]],a[rowptr[i]],INSERT_VALUES);
 }

 where rowptr and colind are the CSR indexing arrays in the local  
 numbering.

 I would like to eliminate the duplicate memory (and the copy  
 above). Is there a recommended way to let PETSc reference array  
 'a' directly or a way to get rid of 'a' and translate our CSR  
 assembly routines to use low level PETSc data structures?

 So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
 the first solve, but the documentation is clear that 'a' is not a  
 pointer to the AIJ storage so I still need the copy. I tried using  
 MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but  
 I got indexing errors, and the documentation really seems to imply  
 that the local matrix is really not a standard CSR anyway. If  
 that's true then it seems like the options I have are to write a  
 shell matrix for parallel CSR storage or write some new assembly  
 routines that use MatSetValuesLocal. I'm more concerned about the  
 memory than the overhead of MatSetValuesLocal, but it would  
 certainly be easier on me to use the CSR-based assembly routines  
 we already have.

 Thanks,
 Chris









most efficient way to use local CSR matrix?

2009-10-21 Thread Mark F. Adams
Chris, just a note,

Perhaps I missed something here but do something similar to you, eg  
overlapping partitions, and PETSc is setup very well to be intrusive  
in your code (I sometimes write a little 'addvalue' wrapper) and  
assemble your FE matrices directly into a global matrix.  You use the  
(global) MatSetValues(...) and set the matrix option  
MAT_IGNORE_OFF_PROC_ENTRIES (or something like that) so that these off  
processor matrix entries, which are computed redundantly, are thrown  
away and you get the correct matrix.

As Barry said you don't need exact non-zero counts to allocate storage  
in PETSC matrices, just don't underestimate.

Mark

On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:


 On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:

 Thanks. Just to make sure I'm following, when I create the matrix I  
 do:

 MatCreate(PETSC_COMM_WORLD,self-m);
 MatSetSizes(self-m,n,n,N,N);
 MatSetType(self-m,MATMPIAIJ);
 MatSetFromOptions(self-m);
 MatSeqAIJSetPreallocationCSR(self-m,SMP(L)-A.rowptr,SMP(L)- 
 A.colind,(double*)(SMP(L)-A.nzval));
 MatMPIAIJSetPreallocationCSR(self-m,SMP(L)-A.rowptr,SMP(L)- 
 A.colind,(double*)(SMP(L)-A.nzval));

 and then I copy the matrix at each Newton iteration using the code  
 in the first message below.  You said ...PreallocationCSR() does  
 the copy very efficiently, but I don't think I'm supposed to  
 repeatedly call it, right?  Looking at the output some more it does  
 seem like building the preconditioner initially is taking much more  
 time than the initial pass through MatSetValues.

   If the matrix nonzero structure stays the same then yes you need  
 call MatMPIAIJSetPreallocationCSR() only once and use the  
 MatSetValues() calls each Newton step to update the nonzeros.

   Barry


 Chris

 On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:


 Here's the deal. In parallel PETSc does not use a single CSR  
 matrix on each process to hold the MPIAIJ matrix. Hence if you  
 store the matrix on a process as a single CSR matrix then it has  
 to be copied into the PETSc datastructure.  
 MatMPIAIJSetPreallocationCSR() does the copy very efficiently. But  
 you are right, until you free YOUR rowpt[], colind[] and a[] there  
 are two copies of the matrix.
 One could argue that this is ok, because anyways any  
 preconditioner worth anything will take up at least that much  
 memory later anyways.  For example, if you use the PETSc default  
 preconditioner, block Jacobi with ILU(0) on each block it will  
 take pretty much the same amount of memory.

 For people who use SOR or some other preconditioner that requires  
 very little memory this is not satisfactory because they feel they  
 could run larger problems without the copy.

 I strongly recommend you use MatMPIAIJSetPreallocationCSR() and  
 live with any memory limitation. The reason is that doing more  
 than that has orders of magnitude more pain with generally little  
 payoff.
 If you like pain then you can
 1) change your code to not build directly into CSR, instead call  
 MatSetValuesLocal() as you compute the entries. With this you need  
 to figure out the preallocation which can be painful code.
 2) write an entire matrix class that stores the matrix like you  
 store it, not like you store it. This is a huge project, since you  
 have to write your own matrix-vector, ILU factorization etc.

 Barry



 On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:

 Hi,

 Our unstructured finite element code does a fairly standard  
 overlapping decomposition that allows it to calculate all the non- 
 zero column entries for the rows owned by the processor (rows  
 0...ldim-1 in the local numbering).  We assemble them into a  
 local CSR matrix and then copy them into a PETSc matrix like this:

 for (int i=0;ildim;i++)
 {
 irow[0] = i;
 MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
 rowptr[i],colind[rowptr[i]],a[rowptr[i]],INSERT_VALUES);
 }

 where rowptr and colind are the CSR indexing arrays in the local  
 numbering.

 I would like to eliminate the duplicate memory (and the copy  
 above). Is there a recommended way to let PETSc reference array  
 'a' directly or a way to get rid of 'a' and translate our CSR  
 assembly routines to use low level PETSc data structures?

 So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
 the first solve, but the documentation is clear that 'a' is not a  
 pointer to the AIJ storage so I still need the copy. I tried  
 using MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to  
 zero but I got indexing errors, and the documentation really  
 seems to imply that the local matrix is really not a standard CSR  
 anyway. If that's true then it seems like the options I have are  
 to write a shell matrix for parallel CSR storage or write some  
 new assembly routines that use MatSetValuesLocal. I'm more  
 concerned about the memory than the overhead of  
 MatSetValuesLocal, but it would certainly be easier on me to use