As you said, for 10 sparse axes in J64 each datum requires 80 bytes for the indices. If you encode the indices as columns in a literal matrix, then the indices for each datum requires 10 bytes. The computations would be even simpler if you also encode each of the "high cardinality dimensions" (400) as 2 columns in this literal matrix.
That is, you'd have a literal matrix I with 16=+/ 10 3#1 2 columns, and a corresponding vector v of the numeric values. (Each a mapped file?) Both selection and summation are not too complicated, with the latter depending heavily on the key adverb (/.). The complications are not too bad compared to the alternatives. ----- Original Message ----- From: david alis <[email protected]> Date: Sunday, July 3, 2011 6:35 Subject: [Jprogramming] Sparse arrays with 13 dimensions To: Programming forum <[email protected]> > Does anyone have advice about how to tackle a problem where > sparse arrays > would be a good implementation in principle, but not in practice? > > This particular problem comes from a group of colleagues that compiles > statistics. > In the proposed data model there will be about 13 dimensions. > > The cardinalities of three of these dimensions is around 400. > These dimensions represent countries - either individually or grouped. > The remaining dimensions have cardinalities of between 3 and 30. > > The data is very sparse - probably only 3 dimensions will be dense. > None of the high cardinality dimensions are dense. > > Time period,(i.e. year and month) is additional dimension but > does not > present an issue because data for each period can quite naturally > be held in its own file. > > > The types of operations are simple - > (i) storage and retrieval of selections for display in Excel etc > (ii) totaling and subtotaling up most of the dimensions (e.g. > aggregatingcountries). > > A J-sparse array implementation would have 10 sparse axis. > This means that for every observation there would 10 extra > numbers (i.e. > integers). > i.e. for each 8 bytes of useful data there needs to be 800 bytes > of support > (J64). > > The problem comes from the fact that for each period there may be > between 10 and 50 million observations. > > Assuming that each element in the index array for a sparse noun > uses 8 bytes then this implies a memory requirement of 800 - > 4000 Mb for > each period. > > If it's really true that an element for each index in each > sparse dimension > needs 8 bytes then the sparse implementation is quite inefficient. > > A way around this could be to combine several dimensions > using #. and #: (something old-time APL programmers did > using code and decode). > > Using this trick the number of sparse dimensions could be > reduced to 3 or 4. > While this would reduce space requirements it introduces lots of > complexity. > As things stand, sparse arrays are not supported by mapped nouns. > > Given that the source is now available, how practical would it > be to > implement > mapped noun support for sparse arrays? And if it was, are we > talking days or > months? ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
