On Sep 11, 2013, at 3:47 AM, Matthew Knepley <knep...@gmail.com> wrote:
> On Tue, Sep 10, 2013 at 11:25 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > On Sep 10, 2013, at 8:55 PM, Jed Brown <jedbr...@mcs.anl.gov> wrote: > > > Matthew Knepley <knep...@gmail.com> writes: > >>> PetscSF cannot currently scatter individual bytes (4 bytes minimum), and > >>> even if it could, it's a horribly inefficient representation (4 or 8 > >>> bytes of metadata for each byte of payload). The quick fix at that > >>> moment was to send in units of size PetscInt (the struct was always > >>> going to be divisible by that size). > > Since when are PETSc people interested in quick fixes, we are only > concerned with doing things right; quick fixes lead to bad code and bad ideas. > > >>> > >> > >> The way I understand it, he never had a single MPI_BYTE, but always a > >> bunch. > >> Shouldn't that packing handle that case? > > > > In DMPlexDistributeData, your fieldSF is blown up so that every unit has > > its own metadata (rank, offset). That's a terrible mechanism for moving > > structs with hundreds of bytes. > > > > I would rather not make PetscSF deal with fully-heterogeneous data > > (where each node can have a different size) because it makes indexing a > > nightmare (you need a PetscSection or similar just to get in the door; I > > don't want PetscSF to depend on PetscSection). > > Having PetscSF depend on PetscSection would be insane (after all > PetscSection has information about constrained variables and other silly > nonsense) but having PetscSF depend on XX makes perfect sense. > > A review: > > (current situation; with unimportant stuff removed) > > + sf - star forest > . nroots - number of root vertices on the current process (these are > possible targets for other process to attach leaves) > . nleaves - number of leaf vertices on the current process, each of these > references a root on any process > . ilocal - locations of leaves in leafdata buffers, pass NULL for contiguous > storage > . iremote - remote locations of root vertices for each leaf on the current > process > > PetscSFSetGraph(PetscSF sf,PetscInt nroots,PetscInt nleaves,PetscInt > *ilocal,,PetscSFNode *iremote,) > > typedef struct { > PetscInt rank; /* Rank of owner */ (why is this not > PetscMPIInt?) > PetscInt index; /* Index of node on rank */ > } PetscSFNode; > > abstractly nleaves, ilocal are the indices (into an array) you are copying > too; iremote are the indices (into an array) you are copying from > > Now if PETSc had an abstract concept of indexing (funny no one ever put that > in PETSc decades ago) it would look like > > PetscSFSetGraph(PetscSF sf,PetscInt nroots, toIndices , fromIndices) > > but wait; PETSc does have have an abstraction for indexing (into regular > arrays) called IS, come to think of it, PETSc has another abstraction for > indexing (into arrays with different sized items) called XX. Actually > thinking a tiny bit more one realizes that there is a third: PetscSFNode is a > way of indexing (into regular arrays) on a bunch of different processes. We > come up with a couple more related, but inconsistent syntaxes for indexing, > we'll have code has good as Trilinos. > > So let's fix this up! One abstraction for indexing that handles both > regular arrays, arrays with different size items and both kinds of arrays on > a bunch of different processes; with simple enough syntax so it can be used > whenever indexing is needed including passing to PetscSF! (We can ignore the > need for MPI datatypes in communication for now). > > Concrete examples to demonstrate this need not be so difficult. A > completely general set of indices for this situation could be handled by > > typedef struct { > PetscInt nindices; > PetscMPIInt *ranks; > PetscInt *offsets; > PetscInt *sizes; > } XXData; Hardly more than PetscSFNode. Or could be handled as an array of > structs. > > special cases would include: > all on one process so ranks is not needed > all items the same size so sizes not needed > contiquous memory (or strided) so offsets not needed > > If we decided not to go with an abstract object but instead just a concrete > data structure that could handle all cases it could look something like > > typedef struct { > PetscInt nindices; > PetscMPIInt *ranks; > PetscInt *offsets; > PetscInt *sizes; > PetscInt chunksize,strideoffset; > } XXData; > > For completeness I note with IS (for example in VecScatter creating with > parallel vectors) we don't use rank,offset notation we use global offset == > rstart[rank] + offset but that is something I'm sure we can deal with. > > > QED > > Comments? > > This sounds like a good idea. So > > - we always store indices in blocks (get rid of regular IS) > - we can optimize for strided (get rid of strided IS) > - they can be local or nonlocal (get rid of PetscSFNode) > > Right now, I have a GlobalSection, which uses XX but puts in different > values. It has negative offsets > and sizes for nonlocal indices. Now I could just allocate the ranks array and > check for different rank > rather than negative offset. > > This would increase the size of GlobalSection, but then I could shared > indices with PetscSF, so overall > we could have a decrease. So it seems like PetscSection and PetscSF could > share a substantial > amount of code if we go to a common data structure. Sometimes it is more convenient to use "global indices" than rank/local offset. Thus we need a "scalable" utility to convert back and forth between them. Possibly having interfaces to supply the information in either format. For example in VecScatterCreate_PtoS() j = 0; nsends = 0; for (i=0; i<nx; i++) { idx = bs*inidx[i]; if (idx < owners[j]) j = 0; for (; j<size; j++) { if (idx < owners[j+1]) { if (!nprocs[j]++) nsends++; owner[i] = j; break; } } if (j == size) SETERRQ3(PETSC_COMM_SELF,PETSC_ERR_PLIB,"ith %D block entry %D not owned by any process, upper bound %D",i,idx,owners[size]); } Call the new unified beast PetscIS? Barry > > I believe all this is orthogonal to the packing issue, which I think can be > done before we unify indices. > In PetscSectionCreateFieldSF() I choose a granularity for communication. It > would be easy to check > for a common multiple of sizes here, and only check if MPI_Type is small. > This is likely to be what we > want to do in general, even creating our own packed types for communication. > I will try and code it up. > > Matt > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener