On Thu, Jan 14, 2016 at 5:04 AM, Hoang Giang Bui <hgbk2...@gmail.com> wrote:
> This is a very interesting thread because use of block matrix improves the > performance of AMG a lot. In my case is the elasticity problem. > > One more question I like to ask, which is more on the performance of the > solver. That if I have a coupled problem, says the point block is [u_x u_y > u_z p] in which entries of p block in stiffness matrix is in a much smaller > scale than u (p~1e-6, u~1e+8), then AMG with hypre in PETSc still scale? > Also, is there a utility in PETSc which does automatic scaling of variables? > You could use PC Jacobi, or perhaps http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/KSP/KSPSetDiagonalScale.html Thanks, Matt > Giang > > On Thu, Jan 14, 2016 at 7:13 AM, Justin Chang <jychan...@gmail.com> wrote: > >> Okay that makes sense, thanks >> >> On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >> >>> >>> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychan...@gmail.com> >>> wrote: >>> > >>> > Thanks Barry, >>> > >>> > 1) So for block matrices, the ja array is smaller. But what's the >>> "hardware" explanation for this performance improvement? Does it have to do >>> with spatial locality where you are more likely to reuse data in that ja >>> array, or does it have to do with the fact that loading/storing smaller >>> arrays are less likely to invoke a cache miss, thus reducing the amount of >>> bandwidth? >>> >>> There are two distinct reasons for the improvement: >>> >>> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware" >>> savings is that you have to load something that is much smaller than >>> before. Cache/spatial locality have nothing to do with this particular >>> improvement. >>> >>> 2) The other improvement comes from the reuse of each x[j] value >>> multiplied by 5 values (a column) of the little block. The hardware >>> explanation is that x[j] can be reused in a register for the 5 multiplies >>> (while otherwise it would have to come from cache to register 5 times and >>> sometimes might even have been flushed from the cache so would have to come >>> from memory). This is why we have code like >>> >>> for (j=0; j<n; j++) { >>> xb = x + 5*(*idx++); >>> x1 = xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4]; >>> sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3 + v[15]*x4 + v[20]*x5; >>> sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3 + v[16]*x4 + v[21]*x5; >>> sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3 + v[17]*x4 + v[22]*x5; >>> sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3 + v[18]*x4 + v[23]*x5; >>> sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3 + v[19]*x4 + v[24]*x5; >>> v += 25; >>> } >>> >>> to do the block multiple. >>> >>> > >>> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation >>> of more than one dof per point) then using the BAIJ format is highly >>> advisable. But if I want to form a nested matrix, say I am solving Stokes >>> equation, then each "submatrix" is of AIJ format? Can these sub matrices >>> also be BAIJ? >>> >>> Sure, but if you have separated all the variables of pressure, >>> velocity_x, velocity_y, etc into there own regions of the vector then the >>> block size for the sub matrices would be 1 so BAIJ does not help. >>> >>> There are Stokes solvers that use Vanka smoothing that keep the >>> variables interlaced and hence would use BAIJ and NOT use fieldsplit >>> >>> >>> > >>> > Thanks, >>> > Justin >>> > >>> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsm...@mcs.anl.gov> >>> wrote: >>> > >>> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychan...@gmail.com> >>> wrote: >>> > > >>> > > Hi all, >>> > > >>> > > 1) I am guessing MATMPIBAIJ could theoretically have better >>> performance than simply using MATMPIAIJ. Why is that? Is it similar to the >>> reasoning that block (dense) matrix-vector multiply is "faster" than simple >>> matrix-vector? >>> > >>> > See for example table 1 in >>> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668&rep=rep1&type=pdf >>> > >>> > > >>> > > 2) I am looking through the manual and online documentation and it >>> seems the term "block" used everywhere. In the section on "block matrices" >>> (3.1.3 of the manual), it refers to field splitting, where you could either >>> have a monolithic matrix or a nested matrix. Does that concept have >>> anything to do with MATMPIBAIJ? >>> > >>> > Unfortunately the numerical analysis literature uses the term block >>> in multiple ways. For small blocks, sometimes called "point-block" with >>> BAIJ and for very large blocks (where the blocks are sparse themselves). I >>> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc. >>> > > >>> > > It makes sense to me that one could create a BAIJ where if you have >>> 5 dofs of the same type of physics (e.g., five different primary species of >>> a geochemical reaction) per grid point, you could create a block size of 5. >>> And if you have different physics (e.g., velocity and pressure) you would >>> ideally want to separate them out (i.e., nested matrices) for better >>> preconditioning. >>> > >>> > Sometimes you put them together with BAIJ and sometimes you keep >>> them separate with nested matrices. >>> > >>> > > >>> > > Thanks, >>> > > Justin >>> > >>> > >>> >>> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener