As you said, for 10 sparse axes in J64 each datum
requires 80 bytes for the indices.  If you encode the
indices as columns in a literal matrix, then the indices 
for each datum requires 10 bytes.  The computations
would be even simpler if you also encode each of
the "high cardinality dimensions" (400) as 2 columns
in this literal matrix.

That is, you'd have a literal matrix I with 16=+/ 10 3#1 2
columns,  and a corresponding vector v of the numeric 
values.  (Each a mapped file?)  Both selection and 
summation are not too complicated, with the latter 
depending heavily on the key adverb (/.).  The complications 
are not too bad compared to the alternatives.   



----- Original Message -----
From: david alis <[email protected]>
Date: Sunday, July 3, 2011 6:35
Subject: [Jprogramming] Sparse arrays with 13 dimensions
To: Programming forum <[email protected]>

> Does anyone have advice about how to tackle a problem where 
> sparse arrays
> would be a good implementation in principle, but not in practice?
> 
> This particular problem comes from a group of colleagues that compiles
> statistics.
> In the proposed data model there will be about 13 dimensions.
> 
> The cardinalities of three of these dimensions is around 400.
> These dimensions represent countries - either individually or grouped.
> The remaining dimensions have cardinalities of between 3 and 30.
> 
> The data is very sparse - probably only 3 dimensions will be dense.
> None of the high cardinality dimensions are dense.
> 
> Time period,(i.e. year and month) is additional dimension but 
> does not
> present an issue because data for each period can quite naturally
> be held in its own file.
> 
> 
> The types of operations are simple -
> (i) storage and retrieval of selections for display in Excel etc
> (ii) totaling and subtotaling up most of the dimensions (e.g. 
> aggregatingcountries).
> 
> A J-sparse array implementation would have 10 sparse axis.
> This means that for every observation there would 10 extra 
> numbers (i.e.
> integers).
> i.e. for each 8 bytes of useful data there needs to be 800 bytes 
> of support
> (J64).
> 
> The problem comes from the fact that for each period there may be
> between 10 and 50 million observations.
> 
> Assuming that each element in the index array for a sparse noun
> uses 8 bytes then this implies a memory requirement of 800 - 
> 4000 Mb for
> each period.
> 
> If it's really true that an element for each index in each 
> sparse dimension
> needs 8 bytes then the sparse implementation is quite inefficient.
> 
> A way around this could be to combine several dimensions
> using #. and #: (something old-time APL programmers did
> using code and decode).
> 
> Using this trick the number of sparse dimensions could be 
> reduced to 3 or 4.
> While this would reduce space requirements it introduces lots of 
> complexity.
> As things stand, sparse arrays are not supported by mapped nouns.
> 
> Given that the source is now available, how practical would it 
> be to
> implement
> mapped noun support for sparse arrays? And if it was, are we 
> talking days or
> months?

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to