MatMult_Scatter woes
On Apr 8, 2009, at 3:11 AM, Jed Brown wrote: On Tue 2009-04-07 21:02, Barry Smith wrote: Jed, Please remind me of what the simplest fix is? Can we just zero the vector before the scatter for the MatMult and MatMultTranspose? What about for MatMultAdd and MatMultTransposeAdd, does anything need to be done? The simplest fix is to zero the vector before and always use ADD_VALUES. Why ADD_VALUES? I don't see that as being any more correct then INSERT_VALUES in the case when two different x locations are assigned to a single y location. Perhaps the matrix class based on the scatter should have an option allowing the user to determine add or insert? MatScatterSetType()? Barry I pushed this a week ago and cited this thread in the commit message, see 6d3d60abcf7d. It's just worth remembering that it can be optimized if someone ever finds that this is a bottleneck. Jed
MatMult_Scatter woes
On Mon 2009-03-30 21:28, Barry Smith wrote: On Mar 30, 2009, at 7:48 PM, Matthew Knepley wrote: On Mon, Mar 30, 2009 at 6:13 PM, Barry Smith bsmith at mcs.anl.gov wrote: I don't think square has anything to do with it. It depends on whether all the entries in y[i] are hit or if some are skipped, this depends exactly on the scatter. The first time the mult is called, duplicate x and y temporarility, fill a xdup with all 1s. Do the scatter, locate all entries of ydup that are zero, keep a list of these. destroy xdup, ydup. Now do the MatMult() by zeroing all the entires in the list and then do the scatter, now you have an efficient and correct MatMult. The drawback of course is the setup time, but in fact that cost is only one scatter (MatMult()) so is likely worth it. One could do a similar setup for the transpose? I must misunderstand you. The problem above is that INSERT_VALUES is used, but when two input locations map to the same output location, they should be added. No, when two map to the same output the scatter is not well defined and hence the multiply is not defined. When building the scatter we could check for multiple inputs going to the same output but we don't, it will end up just putting one value in and then the other and depending on the order of the message passing it could be different. There are two issues, the output index set may be not be injective (multiple inputs map to the same output) or the output index set may not be surjective (not every entry of the output vector is hit). When the output index set is not injective, the scatter is ill-defined with INSERT_VALUES, but is well-defined with ADD_VALUES. It is not okay to disallow such scatters, the transpose of the global-to-local scatter (_p_DA::gtol) is necessarily this way. The (forward) scatter is well-defined with INSERT_VALUES if (and only if) ymax = 1 after VecSet(x,1); VecSet(y,0); VecScatterBegin(scat,x,y,ADD_VALUES,SCATTER_FORWARD); VecScatterEnd(scat,x,y,ADD_VALUES,SCATTER_FORWARD); VecMin(y,ymin); VecMax(y,ymax); If this is not satisfied, then there is no choice but to use the implementation VecSet(y,0); VecScatterBegin(scat,x,y,ADD_VALUES,SCATTER_FORWARD); VecScatterEnd(scat,x,y,ADD_VALUES,SCATTER_FORWARD); If ymin == 0, the index set is not surjective so the scatter is not a matrix unless these entries (the left null space) are set to 0. In this case, we could store the indices of the zero entries in y and do the scatter as VecScatterBegin(scat,x,y,INSERT_VALUES,SCATTER_FORWARD); VecGetArray(y,yy); for (i=0; iscat-to_nnullidx; i++) yy[scat-to_nullidx[i]] = 0; VecRestoreArray(y,yy); VecScatterEnd(scat,x,y,INSERT_VALUES,SCATTER_FORWARD); but this will only win if the number of such entries is small compared to the size of y. If the number of zero entries is large, then I don't think we can beat VecSet(y,0); VecScatterBegin(scat,x,y,INSERT_VALUES,SCATTER_FORWARD); VecScatterEnd(scat,x,y,INSERT_VALUES,SCATTER_FORWARD); We can do the same thing for the transpose. So the question is, how much should we optimize this operation? Is it necessary to provide something like typedef enum {VECSCATTER_IS_INJECTIVE,VECSCATTER_IS_SURJECTIVE} VecScatterOption; PetscErrorCode VecScatterSetOption(VecScatter,Vec,Vec,ScatterMode,VecScatterOption,PetscTruth); which would (analogous to ISSetPermutation()) test whether you are telling the truth when in debug mode and believe you in optimized mode? This seems excessive since it's fairly unlikely that the cost of the VecSet(y,0), ADD_VALUES will be significant to applications. Who is using MATSCATTER anyway (the type name wasn't set prior to my commit 6d3d60abcf7d so it hasn't worked in a while)? Jed -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090331/82fe64a5/attachment.pgp
MatMult_Scatter woes
The current implementation of MatMult_Scatter and MatMultTranspose_Scatter do not actually have matrix semantics unless the index sets are permutations. For example, it's generally desirable that MatMult(A,x,y); agrees with VecZeroEntries(y); MatMultAdd(A,x,y,y); Similarly for MatMultTranspose. In addition, MatMultTranspose should actually be the transpose of MatMult. Consider the scatter defined by scatter : x - y isx = [0 0] isy = [0 1] where x has length 1 and y has length 2. This forward scatter is equivalent to a 2x1 matrix A = [1;1] Indeed A*[1] = [1;1] but A'*[1;1] = [2] where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1]. This can be corrected by changing the body of MatMultTranspose_Scatter from ierr = VecScatterBegin(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); to ierr = VecZeroEntries(y);CHKERRQ(ierr); ierr = VecScatterBegin(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); and similarly for MatMult_Scatter. Unfortunately, these modifications roughly double the cost of typical sequential scatters and could be much worse (scattering from a small vector to a very large one). I think that correctness is more important here and users would typically not use MatScatter when this would have significant impact or when INSERT_VALUES semantics are desired. Of course some performance can be recovered by having MatScatter use INSERT_VALUES when the destination index set is a permutation. Also, what is the ssh url for the repo? I've tried some variations on ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success. Jed -- next part -- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: not available URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090330/e5fffb2a/attachment.pgp
MatMult_Scatter woes
On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote: The current implementation of MatMult_Scatter and MatMultTranspose_Scatter do not actually have matrix semantics unless the index sets are permutations. For example, it's generally desirable that MatMult(A,x,y); agrees with VecZeroEntries(y); MatMultAdd(A,x,y,y); Similarly for MatMultTranspose. In addition, MatMultTranspose should actually be the transpose of MatMult. Consider the scatter defined by scatter : x - y isx = [0 0] isy = [0 1] where x has length 1 and y has length 2. This forward scatter is equivalent to a 2x1 matrix A = [1;1] Indeed A*[1] = [1;1] but A'*[1;1] = [2] where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1]. This can be corrected by changing the body of MatMultTranspose_Scatter from ierr = VecScatterBegin(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); to ierr = VecZeroEntries(y);CHKERRQ(ierr); ierr = VecScatterBegin(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); and similarly for MatMult_Scatter. Unfortunately, these modifications roughly double the cost of typical sequential scatters and could be much worse (scattering from a small vector to a very large one). I think that correctness is more important here and users would typically not use MatScatter when this would have significant impact or when INSERT_VALUES semantics are desired. Of course some performance can be recovered by having MatScatter use INSERT_VALUES when the destination index set is a permutation. I would vote for checking for a square matrix, and otherwise use the expensive form. Also, what is the ssh url for the repo? I've tried some variations on ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success. It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev Matt Jed -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090330/0a8fcfa6/attachment.html
MatMult_Scatter woes
I don't think square has anything to do with it. It depends on whether all the entries in y[i] are hit or if some are skipped, this depends exactly on the scatter. The first time the mult is called, duplicate x and y temporarility, fill a xdup with all 1s. Do the scatter, locate all entries of ydup that are zero, keep a list of these. destroy xdup, ydup. Now do the MatMult() by zeroing all the entires in the list and then do the scatter, now you have an efficient and correct MatMult. The drawback of course is the setup time, but in fact that cost is only one scatter (MatMult()) so is likely worth it. One could do a similar setup for the transpose? Barry On Mar 30, 2009, at 7:04 PM, Matthew Knepley wrote: On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote: The current implementation of MatMult_Scatter and MatMultTranspose_Scatter do not actually have matrix semantics unless the index sets are permutations. For example, it's generally desirable that MatMult(A,x,y); agrees with VecZeroEntries(y); MatMultAdd(A,x,y,y); Similarly for MatMultTranspose. In addition, MatMultTranspose should actually be the transpose of MatMult. Consider the scatter defined by scatter : x - y isx = [0 0] isy = [0 1] where x has length 1 and y has length 2. This forward scatter is equivalent to a 2x1 matrix A = [1;1] Indeed A*[1] = [1;1] but A'*[1;1] = [2] where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1]. This can be corrected by changing the body of MatMultTranspose_Scatter from ierr = VecScatterBegin(scatter- scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter- scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); to ierr = VecZeroEntries(y);CHKERRQ(ierr); ierr = VecScatterBegin(scatter- scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter- scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); and similarly for MatMult_Scatter. Unfortunately, these modifications roughly double the cost of typical sequential scatters and could be much worse (scattering from a small vector to a very large one). I think that correctness is more important here and users would typically not use MatScatter when this would have significant impact or when INSERT_VALUES semantics are desired. Of course some performance can be recovered by having MatScatter use INSERT_VALUES when the destination index set is a permutation. I would vote for checking for a square matrix, and otherwise use the expensive form. Also, what is the ssh url for the repo? I've tried some variations on ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success. It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev Matt Jed -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
MatMult_Scatter woes
On Mon, Mar 30, 2009 at 6:13 PM, Barry Smith bsmith at mcs.anl.gov wrote: I don't think square has anything to do with it. It depends on whether all the entries in y[i] are hit or if some are skipped, this depends exactly on the scatter. The first time the mult is called, duplicate x and y temporarility, fill a xdup with all 1s. Do the scatter, locate all entries of ydup that are zero, keep a list of these. destroy xdup, ydup. Now do the MatMult() by zeroing all the entires in the list and then do the scatter, now you have an efficient and correct MatMult. The drawback of course is the setup time, but in fact that cost is only one scatter (MatMult()) so is likely worth it. One could do a similar setup for the transpose? I must misunderstand you. The problem above is that INSERT_VALUES is used, but when two input locations map to the same output location, they should be added. You are right, that square matrices can still have this property. However, I do not see how the above sequence fixes this. I can't see how you would get around an addition. Matt Barry On Mar 30, 2009, at 7:04 PM, Matthew Knepley wrote: On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote: The current implementation of MatMult_Scatter and MatMultTranspose_Scatter do not actually have matrix semantics unless the index sets are permutations. For example, it's generally desirable that MatMult(A,x,y); agrees with VecZeroEntries(y); MatMultAdd(A,x,y,y); Similarly for MatMultTranspose. In addition, MatMultTranspose should actually be the transpose of MatMult. Consider the scatter defined by scatter : x - y isx = [0 0] isy = [0 1] where x has length 1 and y has length 2. This forward scatter is equivalent to a 2x1 matrix A = [1;1] Indeed A*[1] = [1;1] but A'*[1;1] = [2] where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1]. This can be corrected by changing the body of MatMultTranspose_Scatter from ierr = VecScatterBegin(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter-scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); to ierr = VecZeroEntries(y);CHKERRQ(ierr); ierr = VecScatterBegin(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter-scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); and similarly for MatMult_Scatter. Unfortunately, these modifications roughly double the cost of typical sequential scatters and could be much worse (scattering from a small vector to a very large one). I think that correctness is more important here and users would typically not use MatScatter when this would have significant impact or when INSERT_VALUES semantics are desired. Of course some performance can be recovered by having MatScatter use INSERT_VALUES when the destination index set is a permutation. I would vote for checking for a square matrix, and otherwise use the expensive form. Also, what is the ssh url for the repo? I've tried some variations on ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success. It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev Matt Jed -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20090330/ea22e9e4/attachment.html
MatMult_Scatter woes
On Mar 30, 2009, at 7:48 PM, Matthew Knepley wrote: On Mon, Mar 30, 2009 at 6:13 PM, Barry Smith bsmith at mcs.anl.gov wrote: I don't think square has anything to do with it. It depends on whether all the entries in y[i] are hit or if some are skipped, this depends exactly on the scatter. The first time the mult is called, duplicate x and y temporarility, fill a xdup with all 1s. Do the scatter, locate all entries of ydup that are zero, keep a list of these. destroy xdup, ydup. Now do the MatMult() by zeroing all the entires in the list and then do the scatter, now you have an efficient and correct MatMult. The drawback of course is the setup time, but in fact that cost is only one scatter (MatMult()) so is likely worth it. One could do a similar setup for the transpose? I must misunderstand you. The problem above is that INSERT_VALUES is used, but when two input locations map to the same output location, they should be added. No, when two map to the same output the scatter is not well defined and hence the multiply is not defined. When building the scatter we could check for multiple inputs going to the same output but we don't, it will end up just putting one value in and then the other and depending on the order of the message passing it could be different. Barry You are right, that square matrices can still have this property. However, I do not see how the above sequence fixes this. I can't see how you would get around an addition. Matt Barry On Mar 30, 2009, at 7:04 PM, Matthew Knepley wrote: On Mon, Mar 30, 2009 at 5:52 AM, Jed Brown jed at 59a2.org wrote: The current implementation of MatMult_Scatter and MatMultTranspose_Scatter do not actually have matrix semantics unless the index sets are permutations. For example, it's generally desirable that MatMult(A,x,y); agrees with VecZeroEntries(y); MatMultAdd(A,x,y,y); Similarly for MatMultTranspose. In addition, MatMultTranspose should actually be the transpose of MatMult. Consider the scatter defined by scatter : x - y isx = [0 0] isy = [0 1] where x has length 1 and y has length 2. This forward scatter is equivalent to a 2x1 matrix A = [1;1] Indeed A*[1] = [1;1] but A'*[1;1] = [2] where as MatMultTranspose_Scatter(A,y=[1;1],x) gives x=[1]. This can be corrected by changing the body of MatMultTranspose_Scatter from ierr = VecScatterBegin(scatter- scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter- scatter,x,y,INSERT_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); to ierr = VecZeroEntries(y);CHKERRQ(ierr); ierr = VecScatterBegin(scatter- scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); ierr = VecScatterEnd(scatter- scatter,x,y,ADD_VALUES,SCATTER_REVERSE);CHKERRQ(ierr); and similarly for MatMult_Scatter. Unfortunately, these modifications roughly double the cost of typical sequential scatters and could be much worse (scattering from a small vector to a very large one). I think that correctness is more important here and users would typically not use MatScatter when this would have significant impact or when INSERT_VALUES semantics are desired. Of course some performance can be recovered by having MatScatter use INSERT_VALUES when the destination index set is a permutation. I would vote for checking for a square matrix, and otherwise use the expensive form. Also, what is the ssh url for the repo? I've tried some variations on ssh://petsc at petsc.cs.iit.edu/petsc/petsc-dev without success. It is ssh://petsc at petsc.cs.iit.edu//hg/petsc/petsc-dev Matt Jed -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener