Ok, I've figured out that we are definitely messing something up in our split index set generation. For process 1 our Jac/PMat local size is 14307, but split 0 IS local size is 4129 and split 1 IS local size is 10170, so that leaves us 8 dofs short.
I know now where I need to dig in our field decomposition. Thanks Matt for helping me process through this stuff! On Tue, Nov 8, 2022 at 4:53 PM Alexander Lindsay <alexlindsay...@gmail.com> wrote: > This is from our DMCreateFieldDecomposition_Moose routine. The IS size on > process 1 (which is the process from which I took the error in the original > post) is reported as 4129 which is consistent with the row size of A00. > > Split '0' has local size 4129 on processor 1 > Split '0' has local size 4484 on processor 6 > Split '0' has local size 4471 on processor 12 > Split '0' has local size 4040 on processor 14 > Split '0' has local size 3594 on processor 20 > Split '0' has local size 4423 on processor 22 > Split '0' has local size 2791 on processor 27 > Split '0' has local size 3014 on processor 29 > Split '0' has local size 3183 on processor 30 > Split '0' has local size 3328 on processor 3 > Split '0' has local size 4689 on processor 4 > Split '0' has local size 8016 on processor 8 > Split '0' has local size 6367 on processor 10 > Split '0' has local size 5973 on processor 17 > Split '0' has local size 4431 on processor 18 > Split '0' has local size 7564 on processor 25 > Split '0' has local size 12504 on processor 9 > Split '0' has local size 10081 on processor 11 > Split '0' has local size 13808 on processor 24 > Split '0' has local size 14049 on processor 31 > Split '0' has local size 15324 on processor 7 > Split '0' has local size 15337 on processor 15 > Split '0' has local size 14849 on processor 19 > Split '0' has local size 15660 on processor 23 > Split '0' has local size 14728 on processor 26 > Split '0' has local size 15724 on processor 28 > Split '0' has local size 17249 on processor 5 > Split '0' has local size 15519 on processor 13 > Split '0' has local size 16511 on processor 16 > Split '0' has local size 16496 on processor 21 > Split '0' has local size 18291 on processor 2 > Split '0' has local size 18042 on processor 0 > > On Mon, Nov 7, 2022 at 6:04 PM Matthew Knepley <knep...@gmail.com> wrote: > >> On Mon, Nov 7, 2022 at 5:48 PM Alexander Lindsay < >> alexlindsay...@gmail.com> wrote: >> >>> My understanding looking at PCFieldSplitSetDefaults is that our >>> implementation of `createfielddecomposition` should get called, we'll set >>> `fields` and then (ignoring possible user setting of >>> -pc_fieldsplit_%D_fields flag) PCFieldSplitSetIS will get called with >>> whatever we did to `fields`. So yea I guess that just looking over that I >>> would assume we're not supplying two different index sets for rows and >>> columns, or put more precisely we (MOOSE) are not really afforded the >>> opportunity to. But my interpretation could very well be wrong. >>> >> >> Oh wait. I read the error message again. It does not say that the whole >> selection is rectangular. It says >> >> Local columns of A10 4137 do not equal local rows of A00 4129 >> >> So this is a parallel partitioning thing. Since A00 has 4129 local rows, >> it should have this many columns as well. >> However A10 has 4137 local columns. How big is IS_0, on each process, >> that you pass in to PCFIELDSPLIT? >> >> Thanks, >> >> Matt >> >> >>> On Mon, Nov 7, 2022 at 12:33 PM Matthew Knepley <knep...@gmail.com> >>> wrote: >>> >>>> On Mon, Nov 7, 2022 at 2:09 PM Alexander Lindsay < >>>> alexlindsay...@gmail.com> wrote: >>>> >>>>> The libMesh/MOOSE specific code that identifies dof indices for >>>>> ISCreateGeneral is in DMooseGetEmbedding_Private. I can share that >>>>> function >>>>> (it's quite long) or more details if that could be helpful. >>>>> >>>> >>>> Sorry, I should have written more. The puzzling thing for me is that >>>> somehow it looks like the row and column index sets are not the same. I did >>>> not think >>>> PCFIELDSPLIT could do that. The PCFieldSplitSetIS() interface does not >>>> allow it. I was wondering how you were setting the ISes. >>>> >>>> Thanks, >>>> >>>> Matt >>>> >>>> >>>>> On Mon, Nov 7, 2022 at 10:55 AM Alexander Lindsay < >>>>> alexlindsay...@gmail.com> wrote: >>>>> >>>>>> I'm not sure exactly what you mean, but I'll try to give more >>>>>> details. We have our own DM class (DM_Moose) and we set our own field and >>>>>> domain decomposition routines: >>>>>> >>>>>> dm->ops->createfielddecomposition = >>>>>> DMCreateFieldDecomposition_Moose; >>>>>> >>>>>> dm->ops->createdomaindecomposition = >>>>>> DMCreateDomainDecomposition_Moose; >>>>>> >>>>>> >>>>>> The field and domain decomposition routines are as follows (can see >>>>>> also at >>>>>> https://github.com/idaholab/moose/blob/next/framework/src/utils/PetscDMMoose.C >>>>>> ): >>>>>> >>>>>> static PetscErrorCode >>>>>> DMCreateFieldDecomposition_Moose( >>>>>> DM dm, PetscInt * len, char *** namelist, IS ** islist, DM ** >>>>>> dmlist) >>>>>> { >>>>>> PetscErrorCode ierr; >>>>>> DM_Moose * dmm = (DM_Moose *)(dm->data); >>>>>> >>>>>> PetscFunctionBegin; >>>>>> /* Only called after DMSetUp(). */ >>>>>> if (!dmm->_splitlocs) >>>>>> PetscFunctionReturn(0); >>>>>> *len = dmm->_splitlocs->size(); >>>>>> if (namelist) >>>>>> { >>>>>> ierr = PetscMalloc(*len * sizeof(char *), namelist); >>>>>> CHKERRQ(ierr); >>>>>> } >>>>>> if (islist) >>>>>> { >>>>>> ierr = PetscMalloc(*len * sizeof(IS), islist); >>>>>> CHKERRQ(ierr); >>>>>> } >>>>>> if (dmlist) >>>>>> { >>>>>> ierr = PetscMalloc(*len * sizeof(DM), dmlist); >>>>>> CHKERRQ(ierr); >>>>>> } >>>>>> for (const auto & dit : *(dmm->_splitlocs)) >>>>>> { >>>>>> unsigned int d = dit.second; >>>>>> std::string dname = dit.first; >>>>>> DM_Moose::SplitInfo & dinfo = (*dmm->_splits)[dname]; >>>>>> if (!dinfo._dm) >>>>>> { >>>>>> ierr = DMCreateMoose(((PetscObject)dm)->comm, *dmm->_nl, >>>>>> &dinfo._dm); >>>>>> CHKERRQ(ierr); >>>>>> ierr = PetscObjectSetOptionsPrefix((PetscObject)dinfo._dm, >>>>>> ((PetscObject)dm)->prefix); >>>>>> CHKERRQ(ierr); >>>>>> std::string suffix = std::string("fieldsplit_") + dname + "_"; >>>>>> ierr = PetscObjectAppendOptionsPrefix((PetscObject)dinfo._dm, >>>>>> suffix.c_str()); >>>>>> CHKERRQ(ierr); >>>>>> } >>>>>> ierr = DMSetFromOptions(dinfo._dm); >>>>>> CHKERRQ(ierr); >>>>>> ierr = DMSetUp(dinfo._dm); >>>>>> CHKERRQ(ierr); >>>>>> if (namelist) >>>>>> { >>>>>> ierr = PetscStrallocpy(dname.c_str(), (*namelist) + d); >>>>>> CHKERRQ(ierr); >>>>>> } >>>>>> if (islist) >>>>>> { >>>>>> if (!dinfo._rembedding) >>>>>> { >>>>>> IS dembedding, lembedding; >>>>>> ierr = DMMooseGetEmbedding_Private(dinfo._dm, &dembedding); >>>>>> CHKERRQ(ierr); >>>>>> if (dmm->_embedding) >>>>>> { >>>>>> // Create a relative embedding into the parent's index >>>>>> space. >>>>>> ierr = ISEmbed(dembedding, dmm->_embedding, PETSC_TRUE, >>>>>> &lembedding); >>>>>> CHKERRQ(ierr); >>>>>> const PetscInt * lindices; >>>>>> PetscInt len, dlen, llen, *rindices, off, i; >>>>>> ierr = ISGetLocalSize(dembedding, &dlen); >>>>>> CHKERRQ(ierr); >>>>>> ierr = ISGetLocalSize(lembedding, &llen); >>>>>> CHKERRQ(ierr); >>>>>> if (llen != dlen) >>>>>> SETERRQ1(((PetscObject)dm)->comm, PETSC_ERR_PLIB, "Failed >>>>>> to embed split %D", d); >>>>>> ierr = ISDestroy(&dembedding); >>>>>> CHKERRQ(ierr); >>>>>> // Convert local embedding to global (but still relative) >>>>>> embedding >>>>>> ierr = PetscMalloc(llen * sizeof(PetscInt), &rindices); >>>>>> CHKERRQ(ierr); >>>>>> ierr = ISGetIndices(lembedding, &lindices); >>>>>> CHKERRQ(ierr); >>>>>> ierr = PetscMemcpy(rindices, lindices, llen * >>>>>> sizeof(PetscInt)); >>>>>> CHKERRQ(ierr); >>>>>> ierr = ISDestroy(&lembedding); >>>>>> CHKERRQ(ierr); >>>>>> // We could get the index offset from a corresponding >>>>>> global vector, but subDMs don't yet >>>>>> // have global vectors >>>>>> ierr = ISGetLocalSize(dmm->_embedding, &len); >>>>>> CHKERRQ(ierr); >>>>>> >>>>>> ierr = MPI_Scan(&len, >>>>>> &off, >>>>>> 1, >>>>>> #ifdef PETSC_USE_64BIT_INDICES >>>>>> MPI_LONG_LONG_INT, >>>>>> #else >>>>>> MPI_INT, >>>>>> #endif >>>>>> MPI_SUM, >>>>>> ((PetscObject)dm)->comm); >>>>>> CHKERRQ(ierr); >>>>>> >>>>>> off -= len; >>>>>> for (i = 0; i < llen; ++i) >>>>>> rindices[i] += off; >>>>>> ierr = ISCreateGeneral( >>>>>> ((PetscObject)dm)->comm, llen, rindices, >>>>>> PETSC_OWN_POINTER, &(dinfo._rembedding)); >>>>>> CHKERRQ(ierr); >>>>>> } >>>>>> else >>>>>> { >>>>>> dinfo._rembedding = dembedding; >>>>>> } >>>>>> } >>>>>> ierr = PetscObjectReference((PetscObject)(dinfo._rembedding)); >>>>>> CHKERRQ(ierr); >>>>>> (*islist)[d] = dinfo._rembedding; >>>>>> } >>>>>> if (dmlist) >>>>>> { >>>>>> ierr = PetscObjectReference((PetscObject)dinfo._dm); >>>>>> CHKERRQ(ierr); >>>>>> (*dmlist)[d] = dinfo._dm; >>>>>> } >>>>>> } >>>>>> PetscFunctionReturn(0); >>>>>> } >>>>>> >>>>>> static PetscErrorCode >>>>>> DMCreateDomainDecomposition_Moose( >>>>>> DM dm, PetscInt * len, char *** namelist, IS ** innerislist, IS >>>>>> ** outerislist, DM ** dmlist) >>>>>> { >>>>>> PetscErrorCode ierr; >>>>>> >>>>>> PetscFunctionBegin; >>>>>> /* Use DMCreateFieldDecomposition_Moose() to obtain everything but >>>>>> outerislist, which is currently >>>>>> * PETSC_NULL. */ >>>>>> if (outerislist) >>>>>> *outerislist = PETSC_NULL; /* FIX: allow mesh-based overlap. */ >>>>>> ierr = DMCreateFieldDecomposition_Moose(dm, len, namelist, >>>>>> innerislist, dmlist); >>>>>> CHKERRQ(ierr); >>>>>> PetscFunctionReturn(0); >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Nov 3, 2022 at 5:19 PM Matthew Knepley <knep...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> On Thu, Nov 3, 2022 at 7:52 PM Alexander Lindsay < >>>>>>> alexlindsay...@gmail.com> wrote: >>>>>>> >>>>>>>> I have errors on quite a few (but not all) processes of the like >>>>>>>> >>>>>>>> [1]PETSC ERROR: --------------------- Error Message >>>>>>>> -------------------------------------------------------------- >>>>>>>> [1]PETSC ERROR: Nonconforming object sizes >>>>>>>> [1]PETSC ERROR: Local columns of A10 4137 do not equal local rows >>>>>>>> of A00 4129 >>>>>>>> >>>>>>>> when performing field splits. We (MOOSE) have some code for >>>>>>>> identifying the index sets for each split. However, the code was >>>>>>>> written by >>>>>>>> some authors who are no longer with us. Normally I would chase this >>>>>>>> down in >>>>>>>> a debugger, but this error only seems to crop up for pretty complex and >>>>>>>> large meshes. If anyone has an idea for what we might be doing wrong, >>>>>>>> that >>>>>>>> might help me chase this down faster. I guess intuitively I'm pretty >>>>>>>> perplexed that we could get ourselves into this pickle as it almost >>>>>>>> appears >>>>>>>> that we have two different local dof index counts for a given block (0 >>>>>>>> in >>>>>>>> this case). More background, if helpful, can be found in >>>>>>>> https://github.com/idaholab/moose/issues/22359 as well as >>>>>>>> https://github.com/idaholab/moose/discussions/22468. >>>>>>>> >>>>>>> >>>>>>> How are you specifying the blocks? I would not have thought this was >>>>>>> possible. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> I should note that we are currently running with 3.16.6 as our >>>>>>>> PETSc submodule hash (we are talking about updating to 3.18 soon). >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>> >>>>>> >>>> >>>> -- >>>> What most experimenters take for granted before they begin their >>>> experiments is infinitely more interesting than any results to which their >>>> experiments lead. >>>> -- Norbert Wiener >>>> >>>> https://www.cse.buffalo.edu/~knepley/ >>>> <http://www.cse.buffalo.edu/~knepley/> >>>> >>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >