MatMult_Scatter woes

2009-04-08 Thread Barry Smith

On Apr 8, 2009, at 3:11 AM, Jed Brown wrote:

 On Tue 2009-04-07 21:02, Barry Smith wrote:

   Jed,

Please remind me of what the simplest fix is? Can we just zero the
 vector before the scatter for the MatMult and MatMultTranspose?
 What about for MatMultAdd and MatMultTransposeAdd, does anything  
 need to
 be done?

 The simplest fix is to zero the vector before and always use  
 ADD_VALUES.

 Why ADD_VALUES? I don't see that as being any more correct then  
INSERT_VALUES
in the case when two different x locations are assigned to a single y  
location.

Perhaps the matrix class based on the scatter should have an  
option allowing the
user to determine add or insert? MatScatterSetType()?

Barry




 I pushed this a week ago and cited this thread in the commit message,
 see 6d3d60abcf7d.  It's just worth remembering that it can be  
 optimized
 if someone ever finds that this is a bottleneck.

 Jed




MatMult_Scatter woes

2009-03-31 Thread Jed Brown
On Mon 2009-03-30 21:28, Barry Smith wrote:

 On Mar 30, 2009, at 7:48 PM, Matthew Knepley wrote:

 On Mon, Mar 30, 2009 at 6:13 PM, Barry Smith bsmith at mcs.anl.gov  
 wrote:

   I don't think square has anything to do with it. It depends on  
 whether all the entries in y[i] are hit or if some are skipped, this  
 depends exactly on the scatter.

   The first time the mult is called, duplicate x and y temporarility, 
 fill a xdup with all 1s. Do the scatter, locate all entries of ydup 
 that are zero, keep a list of these.
 destroy xdup, ydup.

   Now do the MatMult() by zeroing all the entires in the list and then 
 do the scatter, now you have an efficient and correct MatMult. The 
 drawback of course is the setup time, but in fact that cost is only one
 scatter (MatMult()) so is likely worth it.

  One could do a similar setup for the transpose?

 I must misunderstand you. The problem above is that INSERT_VALUES is  
 used, but when two input locations
 map to the same output location, they should be added.

  No, when two map to the same output the scatter is not well defined and 
 hence the multiply is not defined.
 When building the scatter we could check for multiple inputs going to  
 the same output but we don't, it will end up
 just putting one value in and then the other and depending on the order 
 of the message passing it could be different.

There are two issues, the output index set may be not be injective
(multiple inputs map to the same output) or the output index set may not
be surjective (not every entry of the output vector is hit).

When the output index set is not injective, the scatter is ill-defined
with INSERT_VALUES, but is well-defined with ADD_VALUES.  It is not okay
to disallow such scatters, the transpose of the global-to-local scatter
(_p_DA::gtol) is necessarily this way.

The (forward) scatter is well-defined with INSERT_VALUES if (and only
if) ymax = 1 after

  VecSet(x,1);
  VecSet(y,0);
  VecScatterBegin(scat,x,y,ADD_VALUES,SCATTER_FORWARD);
  VecScatterEnd(scat,x,y,ADD_VALUES,SCATTER_FORWARD);
  VecMin(y,ymin);
  VecMax(y,ymax);

If this is not satisfied, then there is no choice but to use the
implementation

  VecSet(y,0);
  VecScatterBegin(scat,x,y,ADD_VALUES,SCATTER_FORWARD);
  VecScatterEnd(scat,x,y,ADD_VALUES,SCATTER_FORWARD);

If ymin == 0, the index set is not surjective so the scatter is not a
matrix unless these entries (the left null space) are set to 0.  In
this case, we could store the indices of the zero entries in y and do
the scatter as

  VecScatterBegin(scat,x,y,INSERT_VALUES,SCATTER_FORWARD);
  VecGetArray(y,yy);
  for (i=0; iscat-to_nnullidx; i++) yy[scat-to_nullidx[i]] = 0;
  VecRestoreArray(y,yy);
  VecScatterEnd(scat,x,y,INSERT_VALUES,SCATTER_FORWARD);

but this will only win if the number of such entries is small compared
to the size of y.  If the number of zero entries is large, then I don't
think we can beat

  VecSet(y,0);
  VecScatterBegin(scat,x,y,INSERT_VALUES,SCATTER_FORWARD);
  VecScatterEnd(scat,x,y,INSERT_VALUES,SCATTER_FORWARD);


We can do the same thing for the transpose.

So the question is, how much should we optimize this operation?  Is it
necessary to provide something like

  typedef enum {VECSCATTER_IS_INJECTIVE,VECSCATTER_IS_SURJECTIVE} 
VecScatterOption;

  PetscErrorCode 
VecScatterSetOption(VecScatter,Vec,Vec,ScatterMode,VecScatterOption,PetscTruth);

which would (analogous to ISSetPermutation()) test whether you are
telling the truth when in debug mode and believe you in optimized mode?
This seems excessive since it's fairly unlikely that the cost of the
VecSet(y,0), ADD_VALUES will be significant to applications.  Who is
using MATSCATTER anyway (the type name wasn't set prior to my commit
6d3d60abcf7d so it hasn't worked in a while)?

Jed
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090331/82fe64a5/attachment.pgp


MatMult_Scatter woes

2009-03-30 Thread Jed Brown
The current implementation of MatMult_Scatter and
MatMultTranspose_Scatter do not actually have matrix semantics unless
the index sets are permutations.  For example, it's generally desirable
that

  MatMult(A,x,y);

agrees with

  VecZeroEntries(y);
  MatMultAdd(A,x,y,y);

Similarly for MatMultTranspose.  In addition, MatMultTranspose should
actually be the transpose of MatMult.  Consider the scatter defined by

scatter : x - y
isx = [0 0]
isy = [0 1]

where x has length 1 and y has length 2.  This forward scatter is
equivalent to a 2x1 matrix A = [1;1]

Indeed A*[1] = [1;1]

but A'*[1;1] = [2]

where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1].

This can be corrected by changing the body of MatMultTranspose_Scatter
from

  ierr = 
VecScatterBegin(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr = 
VecScatterEnd(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

to

  ierr = VecZeroEntries(y);CHKERRQ(ierr);
  ierr = 
VecScatterBegin(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr = 
VecScatterEnd(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

and similarly for MatMult_Scatter.

Unfortunately, these modifications roughly double the cost of typical
sequential scatters and could be much worse (scattering from a small
vector to a very large one).  I think that correctness is more important
here and users would typically not use MatScatter when this would have
significant impact or when INSERT_VALUES semantics are desired.  Of
course some performance can be recovered by having MatScatter use
INSERT_VALUES when the destination index set is a permutation.


Also, what is the ssh url for the repo?  I've tried some variations on
ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success.

Jed
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090330/e5fffb2a/attachment.pgp


MatMult_Scatter woes

2009-03-30 Thread Matthew Knepley
On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote:

 The current implementation of MatMult_Scatter and
 MatMultTranspose_Scatter do not actually have matrix semantics unless
 the index sets are permutations.  For example, it's generally desirable
 that

  MatMult(A,x,y);

 agrees with

  VecZeroEntries(y);
  MatMultAdd(A,x,y,y);

 Similarly for MatMultTranspose.  In addition, MatMultTranspose should
 actually be the transpose of MatMult.  Consider the scatter defined by

 scatter : x - y
 isx = [0 0]
 isy = [0 1]

 where x has length 1 and y has length 2.  This forward scatter is
 equivalent to a 2x1 matrix A = [1;1]

 Indeed A*[1] = [1;1]

 but A'*[1;1] = [2]

 where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1].

 This can be corrected by changing the body of MatMultTranspose_Scatter
 from

  ierr =
 VecScatterBegin(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr =
 VecScatterEnd(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 to

  ierr = VecZeroEntries(y);CHKERRQ(ierr);
  ierr =
 VecScatterBegin(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr =
 VecScatterEnd(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 and similarly for MatMult_Scatter.

 Unfortunately, these modifications roughly double the cost of typical
 sequential scatters and could be much worse (scattering from a small
 vector to a very large one).  I think that correctness is more important
 here and users would typically not use MatScatter when this would have
 significant impact or when INSERT_VALUES semantics are desired.  Of
 course some performance can be recovered by having MatScatter use
 INSERT_VALUES when the destination index set is a permutation.


I would vote for checking for a square matrix, and otherwise use the
expensive form.



 Also, what is the ssh url for the repo?  I've tried some variations on
 ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success.


It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev

  Matt



 Jed

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090330/0a8fcfa6/attachment.html


MatMult_Scatter woes

2009-03-30 Thread Barry Smith

I don't think square has anything to do with it. It depends on  
whether all the entries in y[i] are hit or if some are skipped, this  
depends exactly on the scatter.

The first time the mult is called, duplicate x and y  
temporarility, fill a xdup with all 1s. Do the scatter, locate all  
entries of ydup that are zero, keep a list of these.
destroy xdup, ydup.

Now do the MatMult() by zeroing all the entires in the list and  
then do the scatter, now you have an efficient and correct MatMult.  
The drawback of course is the setup time, but in fact that cost is  
only one
scatter (MatMult()) so is likely worth it.

   One could do a similar setup for the transpose?

Barry

On Mar 30, 2009, at 7:04 PM, Matthew Knepley wrote:

 On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote:
 The current implementation of MatMult_Scatter and
 MatMultTranspose_Scatter do not actually have matrix semantics unless
 the index sets are permutations.  For example, it's generally  
 desirable
 that

  MatMult(A,x,y);

 agrees with

  VecZeroEntries(y);
  MatMultAdd(A,x,y,y);

 Similarly for MatMultTranspose.  In addition, MatMultTranspose should
 actually be the transpose of MatMult.  Consider the scatter defined by

 scatter : x - y
 isx = [0 0]
 isy = [0 1]

 where x has length 1 and y has length 2.  This forward scatter is
 equivalent to a 2x1 matrix A = [1;1]

 Indeed A*[1] = [1;1]

 but A'*[1;1] = [2]

 where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1].

 This can be corrected by changing the body of MatMultTranspose_Scatter
 from

  ierr = VecScatterBegin(scatter- 
 scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr = VecScatterEnd(scatter- 
 scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 to

  ierr = VecZeroEntries(y);CHKERRQ(ierr);
  ierr = VecScatterBegin(scatter- 
 scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr = VecScatterEnd(scatter- 
 scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 and similarly for MatMult_Scatter.

 Unfortunately, these modifications roughly double the cost of typical
 sequential scatters and could be much worse (scattering from a small
 vector to a very large one).  I think that correctness is more  
 important
 here and users would typically not use MatScatter when this would have
 significant impact or when INSERT_VALUES semantics are desired.  Of
 course some performance can be recovered by having MatScatter use
 INSERT_VALUES when the destination index set is a permutation.

 I would vote for checking for a square matrix, and otherwise use the  
 expensive form.


 Also, what is the ssh url for the repo?  I've tried some variations on
 ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success.

 It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev

   Matt


 Jed
 -- 
 What most experimenters take for granted before they begin their  
 experiments is infinitely more interesting than any results to which  
 their experiments lead.
 -- Norbert Wiener




MatMult_Scatter woes

2009-03-30 Thread Matthew Knepley
On Mon, Mar 30, 2009 at 6:13 PM, Barry Smith bsmith at mcs.anl.gov wrote:


   I don't think square has anything to do with it. It depends on whether
 all the entries in y[i] are hit or if some are skipped, this depends exactly
 on the scatter.

   The first time the mult is called, duplicate x and y temporarility, fill
 a xdup with all 1s. Do the scatter, locate all entries of ydup that are
 zero, keep a list of these.
 destroy xdup, ydup.

   Now do the MatMult() by zeroing all the entires in the list and then do
 the scatter, now you have an efficient and correct MatMult. The drawback of
 course is the setup time, but in fact that cost is only one
 scatter (MatMult()) so is likely worth it.

  One could do a similar setup for the transpose?


I must misunderstand you. The problem above is that INSERT_VALUES is used,
but when two input locations
map to the same output location, they should be added. You are right, that
square matrices can still have this
property. However, I do not see how the above sequence fixes this. I can't
see how you would get around an
addition.

   Matt



   Barry


 On Mar 30, 2009, at 7:04 PM, Matthew Knepley wrote:

  On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote:
 The current implementation of MatMult_Scatter and
 MatMultTranspose_Scatter do not actually have matrix semantics unless
 the index sets are permutations.  For example, it's generally desirable
 that

  MatMult(A,x,y);

 agrees with

  VecZeroEntries(y);
  MatMultAdd(A,x,y,y);

 Similarly for MatMultTranspose.  In addition, MatMultTranspose should
 actually be the transpose of MatMult.  Consider the scatter defined by

 scatter : x - y
 isx = [0 0]
 isy = [0 1]

 where x has length 1 and y has length 2.  This forward scatter is
 equivalent to a 2x1 matrix A = [1;1]

 Indeed A*[1] = [1;1]

 but A'*[1;1] = [2]

 where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1].

 This can be corrected by changing the body of MatMultTranspose_Scatter
 from

  ierr =
 VecScatterBegin(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr =
 VecScatterEnd(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 to

  ierr = VecZeroEntries(y);CHKERRQ(ierr);
  ierr =
 VecScatterBegin(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr =
 VecScatterEnd(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 and similarly for MatMult_Scatter.

 Unfortunately, these modifications roughly double the cost of typical
 sequential scatters and could be much worse (scattering from a small
 vector to a very large one).  I think that correctness is more important
 here and users would typically not use MatScatter when this would have
 significant impact or when INSERT_VALUES semantics are desired.  Of
 course some performance can be recovered by having MatScatter use
 INSERT_VALUES when the destination index set is a permutation.

 I would vote for checking for a square matrix, and otherwise use the
 expensive form.


 Also, what is the ssh url for the repo?  I've tried some variations on
 ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success.

 It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev

  Matt


 Jed
 --
 What most experimenters take for granted before they begin their
 experiments is infinitely more interesting than any results to which their
 experiments lead.
 -- Norbert Wiener





-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090330/ea22e9e4/attachment.html


MatMult_Scatter woes

2009-03-30 Thread Barry Smith

On Mar 30, 2009, at 7:48 PM, Matthew Knepley wrote:

 On Mon, Mar 30, 2009 at 6:13 PM, Barry Smith bsmith at mcs.anl.gov  
 wrote:

   I don't think square has anything to do with it. It depends on  
 whether all the entries in y[i] are hit or if some are skipped, this  
 depends exactly on the scatter.

   The first time the mult is called, duplicate x and y  
 temporarility, fill a xdup with all 1s. Do the scatter, locate all  
 entries of ydup that are zero, keep a list of these.
 destroy xdup, ydup.

   Now do the MatMult() by zeroing all the entires in the list and  
 then do the scatter, now you have an efficient and correct MatMult.  
 The drawback of course is the setup time, but in fact that cost is  
 only one
 scatter (MatMult()) so is likely worth it.

  One could do a similar setup for the transpose?

 I must misunderstand you. The problem above is that INSERT_VALUES is  
 used, but when two input locations
 map to the same output location, they should be added.

  No, when two map to the same output the scatter is not well defined  
and hence the multiply is not defined.
When building the scatter we could check for multiple inputs going to  
the same output but we don't, it will end up
just putting one value in and then the other and depending on the  
order of the message passing it could be different.


Barry

 You are right, that square matrices can still have this
 property. However, I do not see how the above sequence fixes this. I  
 can't see how you would get around an
 addition.

Matt


   Barry


 On Mar 30, 2009, at 7:04 PM, Matthew Knepley wrote:

 On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote:
 The current implementation of MatMult_Scatter and
 MatMultTranspose_Scatter do not actually have matrix semantics unless
 the index sets are permutations.  For example, it's generally  
 desirable
 that

  MatMult(A,x,y);

 agrees with

  VecZeroEntries(y);
  MatMultAdd(A,x,y,y);

 Similarly for MatMultTranspose.  In addition, MatMultTranspose should
 actually be the transpose of MatMult.  Consider the scatter defined by

 scatter : x - y
 isx = [0 0]
 isy = [0 1]

 where x has length 1 and y has length 2.  This forward scatter is
 equivalent to a 2x1 matrix A = [1;1]

 Indeed A*[1] = [1;1]

 but A'*[1;1] = [2]

 where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1].

 This can be corrected by changing the body of MatMultTranspose_Scatter
 from

  ierr = VecScatterBegin(scatter- 
 scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr = VecScatterEnd(scatter- 
 scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 to

  ierr = VecZeroEntries(y);CHKERRQ(ierr);
  ierr = VecScatterBegin(scatter- 
 scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);
  ierr = VecScatterEnd(scatter- 
 scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr);

 and similarly for MatMult_Scatter.

 Unfortunately, these modifications roughly double the cost of typical
 sequential scatters and could be much worse (scattering from a small
 vector to a very large one).  I think that correctness is more  
 important
 here and users would typically not use MatScatter when this would have
 significant impact or when INSERT_VALUES semantics are desired.  Of
 course some performance can be recovered by having MatScatter use
 INSERT_VALUES when the destination index set is a permutation.

 I would vote for checking for a square matrix, and otherwise use the  
 expensive form.


 Also, what is the ssh url for the repo?  I've tried some variations on
 ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success.

 It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev

  Matt


 Jed
 -- 
 What most experimenters take for granted before they begin their  
 experiments is infinitely more interesting than any results to which  
 their experiments lead.
 -- Norbert Wiener




 -- 
 What most experimenters take for granted before they begin their  
 experiments is infinitely more interesting than any results to which  
 their experiments lead.
 -- Norbert Wiener