Also, note that the problem, that I am running, has inactive processors (ie, empty processors) on non-coarsest (ie, grids with smoothers). I don't know if my old problems that (I recall) ASM worked had that.
On Fri, Jun 24, 2016 at 4:37 PM, Mark Adams <mfad...@lbl.gov> wrote: > >>> > Just to be clear: ASM used to work. Did the semantics of ASM change? >>> >> >> Hi Mark, >> >> Assume that you said GASM used to work, and now is not working any more. >> >> > I see everyone thinks that, but (intended to) said ASM, not GASM, used to > work. > > >> GASM was originally written by Dmitry. The basic idea is to allow >> mulit-rank blocks, that is, a mulit-rank subdomain problem could be solved >> using a small number of processor cores in parallel. This is different from >> ASM. >> >> I was involved into the development of GASM last summer. There are some >> changes: >> >> (1) Added a function to increase overlap of the multi-rank subdomains. >> The function is called by GASM in default. >> >> (2) Added a hierarchical partitioning to optimize data exchange. Ensure >> that small subdomains in a multi-rank subdomain are geometrically >> connected. GASM does not use this functionality in default. >> >> Any way, if you have an example (like Barry asked) showing the broken >> GASM, I will debug into it (of course, if Barry does not mind). >> >> Fande Kong, >> >> >>> >>> Show me a commit where ASM worked! Do you mean that the GASM worked? >>> The code has GASM calls in it, not ASM so how could ASM have previously >>> worked? It is possible that something changed in GASM that broke GAMG's >>> usage of GASM. Once you tell me how to reproduce the problem with GASM I >>> can try to track down the problem. >>> >>> > >>> > But before that please please tell me the command line argument >>> and example you use where the GASM crashes so I can get that fixed. Then I >>> will look at using ASM instead after I have the current GASM code running >>> again. >>> > >>> > In branch mark/gamg-agg-asm in ksp ex56, 'make runex56': >>> >>> I don't care about this! This is where you have tried to change from >>> GASM to ASM which I told you is non-trivial. Give me the example and >>> command line where the GASM version in master (or maint) doesn't work where >>> the error message includes ** Max-trans not allowed because matrix is >>> distributed >>> >>> We are not communicating very well, you jumped from stating GASM >>> crashed to monkeying with ASM and now refuse to tell me how to reproduce >>> the GASM crash. We have to start by fixing the current code to work with >>> GASM (if it ever worked) and then move on to using ASM (which is just an >>> optimization of the GASM usage.) >>> >>> >>> Barry >>> >>> >>> > >>> > 14:12 nid00495 ~/petsc/src/ksp/ksp/examples/tutorials$ make runex56 >>> > [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > [0]PETSC ERROR: Petsc has generated inconsistent data >>> > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code >>> lines) on different processors >>> > [0]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.2-633-g4f88208 >>> GIT Date: 2016-06-23 18:53:31 +0200 >>> > [0]PETSC ERROR: >>> /global/u2/m/madams/petsc/src/ksp/ksp/examples/tutorials/./ex56 on a >>> arch-xc30-dbg64-intel named nid00495 by madams Thu Jun 23 14:12:57 2016 >>> > [0]PETSC ERROR: Configure options --COPTFLAGS="-no-ipo -g -O0" >>> --CXXOPTFLAGS="-no-ipo -g -O0" --FOPTFLAGS="-fast -no-ipo -g -O0" >>> --download-parmetis --download-metis --with-ssl=0 --with-cc=cc >>> --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 >>> --with-debugging=1 --with-fc=0 --with-shared-libraries=0 --with-x=0 >>> --with-mpiexec=srun LIBS=-lstdc++ --with-64-bit-indices >>> PETSC_ARCH=arch-xc30-dbg64-intel >>> > [0]PETSC ERROR: #1 MatGetSubMatrices_MPIAIJ() li >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > Barry >>> > >>> > >>> > > On Jun 23, 2016, at 4:19 PM, Mark Adams <mfad...@lbl.gov> wrote: >>> > > >>> > > The question boils down to, for empty processors do we: >>> > > >>> > > ierr = ISCreateGeneral(PETSC_COMM_SELF, 0, NULL, PETSC_COPY_VALUES, >>> &is);CHKERRQ(ierr); >>> > > ierr = PCASMSetLocalSubdomains(subpc, 1, &is, >>> NULL);CHKERRQ(ierr); >>> > > ierr = ISDestroy(&is);CHKERRQ(ierr); >>> > > >>> > > or >>> > > >>> > > PCASMSetLocalSubdomains(subpc, 0, NULL, NULL); >>> > > >>> > > The later gives and error that one domain is need and the later >>> gives an error (appended). >>> > > >>> > > I've checked in the code for this second error in ksp (make runex56) >>> > > >>> > > Thanks, >>> > > >>> > > [0]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > > [0]PETSC ERROR: Petsc has generated inconsistent data >>> > > [0]PETSC ERROR: MPI_Allreduce() called in different locations (code >>> lines) on different processors >>> > > [0]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > > [0]PETSC ERROR: Petsc Development GIT revision: v3.7.2-633-g4f88208 >>> GIT Date: 2016-06-23 18:53:31 +0200 >>> > > [0]PETSC ERROR: >>> /global/u2/m/madams/petsc/src/ksp/ksp/examples/tutorials/./ex56 on a >>> arch-xc30-dbg64-intel named nid00495 by madams Thu Jun 23 14:12:57 2016 >>> > > [0]PETSC ERROR: Configure options --COPTFLAGS="-no-ipo -g -O0" >>> --CXXOPTFLAGS="-no-ipo -g -O0" --FOPTFLAGS="-fast -no-ipo -g -O0" >>> --download-parmetis --download-metis --with-ssl=0 --with-cc=cc >>> --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0 >>> --with-debugging=1 --with-fc=0 --with-shared-libraries=0 --with-x=0 >>> --with-mpiexec=srun LIBS=-lstdc++ --with-64-bit-indices >>> PETSC_ARCH=arch-xc30-dbg64-intel >>> > > [0]PETSC ERROR: #1 MatGetSubMatrices_MPIAIJ() line 1147 in >>> /global/u2/m/madams/petsc/src/mat/impls/aij/mpi/mpiov.c >>> > > [0]PETSC ERROR: #2 MatGetSubMatrices_MPIAIJ() line 1147 in >>> /global/u2/m/madams/petsc/src/mat/impls/aij/mpi/mpiov.c >>> > > [1]PETSC ERROR: --------------------- Error Message >>> -------------------------------------------------------------- >>> > > >>> > > On Thu, Jun 23, 2016 at 8:05 PM, Barry Smith <bsm...@mcs.anl.gov> >>> wrote: >>> > > >>> > > Where is the command line that generates the error? >>> > > >>> > > >>> > > > On Jun 23, 2016, at 12:08 AM, Mark Adams <mfad...@lbl.gov> wrote: >>> > > > >>> > > > [adding Garth] >>> > > > >>> > > > On Thu, Jun 23, 2016 at 12:52 AM, Barry Smith <bsm...@mcs.anl.gov> >>> wrote: >>> > > > >>> > > > Mark, >>> > > > >>> > > > I think there is a misunderstanding here. With GASM an >>> individual block problem is __solved__ (via a parallel KSP) in parallel by >>> several processes, with ASM each block is "owned" by and solved on a single >>> process. >>> > > > >>> > > > Ah, OK, so this is for multiple processors in a block. Yes, we are >>> looking at small, smother, blocks. >>> > > > >>> > > > >>> > > > With both the "block" can come from any unknowns on any >>> processes. You can have, for example a block that comes from a region >>> snaking across several processes if you like (or it makes sense due to >>> coupling in the matrix). >>> > > > >>> > > > By default if you use ASM it will create one non-overlapping >>> block defined by all unknowns owned by a single process and then extend it >>> by "one level" (defined by the nonzero structure of the matrix) to get >>> overlapping. >>> > > > >>> > > > The default in ASM is one level of overlap? That is new. (OK, I >>> have not looked at ASM in like over 10 years) >>> > > > >>> > > > If you use multiple blocks per process it defines the >>> non-overlapping blocks within a single process's unknowns >>> > > > >>> > > > I assume this still chops the matrix and does not call a >>> partitioner. >>> > > > >>> > > > and extends each of them to have overlap (again by the non-zero >>> structure of the matrix). The default is simple because the user only need >>> indicate the number of blocks per process, the drawback is of course that >>> it does depend on the process layout, number of processes etc and does not >>> take into account particular "coupling information" that the user may know >>> about with their problem. >>> > > > >>> > > > If the user wishes to defined the blocks themselves that is also >>> possible with PCASMSetSubLocalSubdomains(). Each process provides 1 or more >>> index sets for the subdomains it will solve on. Note that the index sets >>> can contain any unknowns in the entire problem so the blocks do not have to >>> "line up" with the parallel decomposition at all. >>> > > > >>> > > > Oh, OK, this is what I want. (I thought this worked). >>> > > > >>> > > > Of course determining and providing good such subdomains may not >>> always be clear. >>> > > > >>> > > > In smoothed aggregation there is an argument that the aggregates >>> are good, but the scale is fixed obviously. On a regular grid smoothed >>> aggregation wants 3^D sized aggregates, which is obviously wonderful for >>> AMS. And for anisotropy you want your ASM blocks to be on strongly >>> connected components, which is what smoothed aggregation wants (not that I >>> do this very well). >>> > > > >>> > > > >>> > > > I see in GAMG you have PCGAMGSetUseASMAggs >>> > > > >>> > > > But the code calls PCGASMSetSubdomains and the command line is >>> -pc_gamg_use_agg_gasm, so this is all messed up. (more below) >>> > > > >>> > > > which sadly does not have an explanation in the users manual and >>> sadly does not have a matching options data base name >>> -pc_gamg_use_agg_gasm following the rule of drop the word set, all lower >>> case, and put _ between words the option should be -pc_gamg_use_asm_aggs. >>> > > > >>> > > > BUT, THIS IS THE WAY IT WAS! It looks like someone hijacked this >>> code and made it gasm. I never did this. >>> > > > >>> > > > Barry: you did this apparently in 2013. >>> > > > >>> > > > >>> > > > In addition to this one you could also have one that uses the >>> aggs but use the PCASM to manage the solves instead of GASM, it would >>> likely be less buggy and more efficient. >>> > > > >>> > > > yes >>> > > > >>> > > > >>> > > > Please tell me exactly what example you tried to run with what >>> options and I will debug it. >>> > > > >>> > > > We got an error message: >>> > > > >>> > > > ** Max-trans not allowed because matrix is distributed >>> > > > >>> > > > Garth: is this from your code perhaps? I don't see it in PETSc. >>> > > > >>> > > > Note that ALL functionality that is included in PETSc should have >>> tests that test that functionality then we will find out immediately when >>> it is broken instead of two years later when it is much harder to debug. If >>> this -pc_gamg_use_agg_gasm had had a test we won't be in this mess now. >>> (Jed's damn code reviews sure don't pick up this stuff). >>> > > > >>> > > > First we need to change gasm to asm. >>> > > > >>> > > > We could add this argument pc_gamg_use_agg_asm to ksp/ex56 >>> (runex56 or make a new test). The SNES version (also ex56) is my current >>> test that I like to refer to as recommended parameters for elasticity. So >>> I'd like to keep that clean, but we can add junk to ksp/ex56. >>> > > > >>> > > > I've done this in a branch mark/gamg-agg-asm. I get an error >>> (appended). It looks like the second coarsest grid, which has 36 dof on one >>> processor has an index 36 in the block on every processor. Strange. I can >>> take a look at it later. >>> > > > >>> > > > Mark >>> > > > >>> > > > > [3]PETSC ERROR: [4]PETSC ERROR: --------------------- Error >>> Message -------------------------------------------------------------- >>> > > > > [4]PETSC ERROR: Petsc has generated inconsistent data >>> > > > > [4]PETSC ERROR: ith 0 block entry 36 not owned by any process, >>> upper bound 36 >>> > > > > [4]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > > > > [4]PETSC ERROR: Petsc Development GIT revision: >>> v3.7.2-630-g96e0c40 GIT Date: 2016-06-22 10:03:02 -0500 >>> > > > > [4]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named >>> MarksMac-3.local by markadams Thu Jun 23 06:53:27 2016 >>> > > > > [4]PETSC ERROR: Configure options COPTFLAGS="-g -O0" >>> CXXOPTFLAGS="-g -O0" FOPTFLAGS="-g -O0" --download-hypre=1 >>> --download-parmetis=1 --download-metis=1 --download-ml=1 --download-p4est=1 >>> --download-exodus=1 --download-triangle=1 >>> --with-hdf5-dir=/Users/markadams/Codes/hdf5 --with-x=0 --with-debugging=1 >>> PETSC_ARCH=arch-macosx-gnu-g --download-chaco >>> > > > > [4]PETSC ERROR: #1 VecScatterCreate_PtoS() line 2348 in >>> /Users/markadams/Codes/petsc/src/vec/vec/utils/vpscat.c >>> > > > > [4]PETSC ERROR: #2 VecScatterCreate() line 1552 in >>> /Users/markadams/Codes/petsc/src/vec/vec/utils/vscat.c >>> > > > > [4]PETSC ERROR: Petsc has generated inconsistent data >>> > > > > [3]PETSC ERROR: ith 0 block entry 36 not owned by any process, >>> upper bound 36 >>> > > > > [3]PETSC ERROR: See >>> http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble >>> shooting. >>> > > > > [3]PETSC ERROR: Petsc Development GIT revision: >>> v3.7.2-630-g96e0c40 GIT Date: 2016-06-22 10:03:02 -0500 >>> > > > > [3]PETSC ERROR: ./ex56 on a arch-macosx-gnu-g named >>> MarksMac-3.local by markadams Thu Jun 23 06:53:27 2016 >>> > > > > [3]PETSC ERROR: Configure options COPTFLAGS="-g -O0" >>> CXXOPTFLAGS="-g -O0" FOPTFLAGS="-g -O0" --download-hypre=1 >>> --download-parmetis=1 --download-metis=1 --download-ml=1 --download-p4est=1 >>> --download-exodus=1 --download-triangle=1 >>> --with-hdf5-dir=/Users/markadams/Codes/hdf5 --with-x=0 --with-debugging=1 >>> PETSC_ARCH=arch-macosx-gnu-g --download-chaco >>> > > > > [3]PETSC ERROR: #1 VecScatterCreate_PtoS() line 2348 in >>> /Users/markadams/Codes/petsc/src/vec/vec/utils/vpscat.c >>> > > > > [3]PETSC ERROR: #2 VecScatterCreate() line 1552 in >>> /Users/markadams/Codes/petsc/src/vec/vec/utils/vscat.c >>> > > > > [3]PETSC ERROR: #3 PCSetUp_ASM() line 279 in >>> /Users/markadams/Codes/petsc/src/ksp/pc/impls/asm/asm.c >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > Barry >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > > > > On Jun 22, 2016, at 5:20 PM, Mark Adams <mfad...@lbl.gov> wrote: >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Wed, Jun 22, 2016 at 8:06 PM, Barry Smith <bsm...@mcs.anl.gov> >>> wrote: >>> > > > > >>> > > > > I suggest focusing on asm. >>> > > > > >>> > > > > OK, I will switch gasm to asm, this does not work anyway. >>> > > > > >>> > > > > Having blocks that span multiple processes seems like over kill >>> for a smoother ? >>> > > > > >>> > > > > No, because it is a pain to have the math convolved with the >>> parallel decompositions strategy (ie, I can't tell an application how to >>> partition their problem). If an aggregate spans processor boundaries, which >>> is fine and needed, and let's say we have a pretty uniform problem, then if >>> the block gets split up, H is small in part of the domain and convergence >>> could suffer along processor boundaries. And having the math change as the >>> parallel decomposition changes is annoying. >>> > > > > >>> > > > > (Major league overkill) in fact doesn't one want multiple blocks >>> per process, ie. pretty small blocks. >>> > > > > >>> > > > > No, it is just doing what would be done in serial. If the cost >>> of moving the data across the processor is a problem then that is a >>> tradeoff to consider. >>> > > > > >>> > > > > And I think you are misunderstanding me. There are lots of >>> blocks per process (the aggregates are say 3^D in size). And many of the >>> aggregates/blocks along the processor boundary will be split between >>> processors, resulting is mall blocks and weak ASM PC on processor >>> boundaries. >>> > > > > >>> > > > > I can understand ASM not being general and not letting blocks >>> span processor boundaries, but I don't think the extra matrix communication >>> costs are a big deal (done just once) and the vector communication costs >>> are not bad, it probably does not include (too many) new processors to >>> communicate with. >>> > > > > >>> > > > > >>> > > > > Barry >>> > > > > >>> > > > > > On Jun 22, 2016, at 7:51 AM, Mark Adams <mfad...@lbl.gov> >>> wrote: >>> > > > > > >>> > > > > > I'm trying to get block smoothers to work for gamg. We >>> (Garth) tried this and got this error: >>> > > > > > >>> > > > > > >>> > > > > > - Another option is use '-pc_gamg_use_agg_gasm true' and use >>> '-mg_levels_pc_type gasm'. >>> > > > > > >>> > > > > > >>> > > > > > Running in parallel, I get >>> > > > > > >>> > > > > > ** Max-trans not allowed because matrix is distributed >>> > > > > > ---- >>> > > > > > >>> > > > > > First, what is the difference between asm and gasm? >>> > > > > > >>> > > > > > Second, I need to fix this to get block smoothers. This used >>> to work. Did we lose the capability to have blocks that span processor >>> subdomains? >>> > > > > > >>> > > > > > gamg only aggregates across processor subdomains within one >>> layer, so maybe I could use one layer of overlap in some way? >>> > > > > > >>> > > > > > Thanks, >>> > > > > > Mark >>> > > > > > >>> > > > > >>> > > > > >>> > > > >>> > > > >>> > > >>> > > >>> > >>> > >>> >>> >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> petsc-dev mailing list >>> petsc-dev@mcs.anl.gov >>> https://lists.mcs.anl.gov/mailman/listinfo/petsc-dev >>> >>> >>> End of petsc-dev Digest, Vol 90, Issue 30 >>> ***************************************** >>> >> >> >