Re: [Jgeneral] Compressing a data set

Mike Day Sat, 09 Jun 2007 03:46:35 -0700

Two minor variations on Raul's approach avoid the
gerund and the second allows choice of the
key column.


(Apologies for my gratuitous line-throws, as usual)

The kernel of the approach is the same as Raul's:

  ({."1  </.]) raw   NB. partition by column 0

+---------+---------+
|+--+--+-+|+--+--+-+|
||t1|f1|5|||t2|f7|7||
|+--+--+-+|+--+--+-+|
||t1|f2|5|||t2|f8|7||
|+--+--+-+|+--+--+-+|
||t1|f3|5||         |
|+--+--+-+|         |
+---------+---------+

Reduce each sub-table to unique elements:

  ({."1  <@~."1 @|:/.]) raw

+----+----------+---+
|+--+|+--+--+--+|+-+|
||t1|||f1|f2|f3|||5||
|+--+|+--+--+--+|+-+|
+----+----------+---+
|+--+|+--+--+   |+-+|
||t2|||f7|f8|   ||7||
|+--+|+--+--+   |+-+|
+----+----------+---+

This is similar to your specification, except
that all elements are boxed.

The following form might be more useful:

  ]S:0 each ({."1  <@~."1 @|:/.]) raw

+--+--+-+
|t1|f1|5|
|  |f2| |
|  |f3| |
+--+--+-+
|t2|f7|7|
|  |f8| |
+--+--+-+

A small change allows you to choose your key:

  0 ( {"1 <@~."1@|:/. ]) raw  NB. with column 0

+----+----------+---+
|+--+|+--+--+--+|+-+|
||t1|||f1|f2|f3|||5||
|+--+|+--+--+--+|+-+|
+----+----------+---+
|+--+|+--+--+   |+-+|
||t2|||f7|f8|   ||7||
|+--+|+--+--+   |+-+|
+----+----------+---+

An index may also be used with Raul's method:

 0({"1 {.`<`{.&spread@|:/. ])raw

+--+----------+-+
|t1|+--+--+--+|5|
|  ||f1|f2|f3|| |
|  |+--+--+--+| |
+--+----------+-+
|t2|+--+--+   |7|
|  ||f7|f8|   | |
|  |+--+--+   | |
+--+----------+-+

Again. we can simplify things up a bit:

  ]S:0 each 0 ( {"1 <@~."1@|:/. ]) raw

+--+--+-+
|t1|f1|5|
|  |f2| |
|  |f3| |
+--+--+-+
|t2|f7|7|
|  |f8| |
+--+--+-+

  ]S:0 each 1({"1 <@~."1@|:/.])raw NB.column 1 !!!

+--+--+-+
|t1|f1|5|
+--+--+-+
|t1|f2|5|
+--+--+-+
|t1|f3|5|
+--+--+-+
|t2|f7|7|
+--+--+-+
|t2|f8|7|
+--+--+-+

NB. All this loses any association between elements

in columns 1 & 2. eg if

rawa=:('t1 t1 t1 t2 t2',.&;:'f1 f2 f3 f7 f8'),.]&.>5 5 6 7 7

  ]S:0 each ({."1  <@~."1 @|:/.]) rawa

+--+--+---+
|t1|f1|5 6|
|  |f2|   |
|  |f3|   |
+--+--+---+
|t2|f7|7  |
|  |f8|   |
+--+--+---+

... so you've lost the association of 5 with f1 & f2 but not f3,
and 6 with f3 but not f1 or f2.  Your opening remarks suggest
this isn't a problem,  but it could be for more general data.

I haven't explored timings; the transpose (|:) might be expensive
for large sub-tables.

Mike



Raul Miller wrote:

On 6/8/07, Terrence Brannon <[EMAIL PROTECTED]> wrote:

't1' 'f1' 5
't1' 'f2' 5
't1' 'f3' 5
't2' 'f7' 7
't2' 'f8' 7

...

Let's say that you want the compression of this data to be searchableby thefirst field, hence you want a list in which each item is 3 boxes likeso:
't1' ; (<('f1';'f2';'f3')) ; 5
't2' ; (<('f7';'f8')) ; 7


You have not specified the original format of the data, but let's
say it's a boxed two dimensional array, with as many columns
as sql columns and as many rows as sql rows.
  raw=:('t1 t1 t1 t2 t2',.&;:'f1 f2 f3 f7 f8'),.]&.>5 5 5 7 7

First off, observe that you can group things the way you want using a
/. dyad:
  ({."1 </. ])raw

Now we just need a verb to rearrange things the way you want.
  spread=:4 :'x`:6 y'"0 _1
  ({."1 {.`<`{.&spread@|:/. ])raw
+--+----------+-+
|t1|+--+--+--+|5|
|  ||f1|f2|f3|| |
|  |+--+--+--+| |
+--+----------+-+
|t2|+--+--+   |7|
|  ||f7|f8|   | |
|  |+--+--+   | |
+--+----------+-+

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jgeneral] Compressing a data set

Reply via email to