Re: [HACKERS] Aggregate rollup

2003-03-06 Thread Merlin Moncure
 -Original Message-
 From: mlw [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, March 05, 2003 3:47 PM
 To: [EMAIL PROTECTED]
 Subject: [HACKERS] Aggregate rollup
 
 I had written a piece of code about two years ago that used the
 aggregate feature of PostgreSQL to create an array of integers from an
 aggregate, as:
 
 select int_array_aggregate( column ) from table group by column
 

Do I understand correctly that this still follows the normal rules for
grouping, so that only like values are put in the array?

Example: column has values 1,1,1,2,2 spread over 5 rows.
Your query returns two rows with row1={1,1,1} and row2 = {2,2}...is this
correct?

Also, what if your aggregate column is different from the group column: 
Table t with columns c1, c2 with 5 rows:
C1 C2
1, 1
1, 2
1, 3
2, 1
2, 2

Does select C1, int_array_aggregate( C2 ) from table group by C1 return

1, {1, 2, 3}
2, {1, 2}
??

FWIW, I think that's a pretty cool function.  This allows the backend to
telescope 1 dimension (only) out of a dataset, the most detailed one.
In certain situations with large datasets over slow connections, this
could be a big payoff.

Also, all this talk about XML has got me thinking about how to allow
basic query features to provide simple nesting services.  consider:

select C1, C2 from t for xml;  returns:
t 
C11/C1C21/C2
C11/C1C22/C2
C11/C1C23/C2
C12/C1C21/C2
C12/C1C22/C2
/t
select C1, xml_aggregate(C2) from t for xml; returns:
t
C1 value=1C21/C2C22/C2C23/C2/C1
C1 value=2C21/C2C22/C2C23/C2/C1
/t 

 create table fast_lookup as select reference,
 int_array_aggregate(result) from table group by result
 
 The question is, would a more comprehensive solution be wanted?
 Possible? Something like:
 
 
 Any thoughts? I think I need to fix the code in the current
 /contrib/intagg anyway, so is it worth doing the extra work to
included
 multiple data types?

Yes.

Just a thought.
Merlin


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Aggregate rollup

2003-03-06 Thread mlw


Merlin Moncure wrote:

-Original Message-
From: mlw [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 05, 2003 3:47 PM
To: [EMAIL PROTECTED]
Subject: [HACKERS] Aggregate rollup
I had written a piece of code about two years ago that used the
aggregate feature of PostgreSQL to create an array of integers from an
aggregate, as:
select int_array_aggregate( column ) from table group by column

   

Do I understand correctly that this still follows the normal rules for
grouping, so that only like values are put in the array?
Example: column has values 1,1,1,2,2 spread over 5 rows.
Your query returns two rows with row1={1,1,1} and row2 = {2,2}...is this
correct?
Actually, it is more intended to put all the entries in a one to many 
table in a single column, as:

create table classic_one_to_many
(
   leftsideinteger,
   rightsideinteger
);
create table fast_lookup as select leftside, 
int_array_aggregate(rightside) from classic_one_to_many group by leftside;



---(end of broadcast)---
TIP 6: Have you searched our list archives?
http://archives.postgresql.org


Re: [HACKERS] Aggregate rollup

2003-03-05 Thread Joe Conway
mlw wrote:
I had written a piece of code about two years ago that used the 
aggregate feature of PostgreSQL to create an array of integers from an 
aggregate, as:

select int_array_aggregate( column ) from table group by column

While it seems pointless to create an array on a select, it has a 
purpose in OLAP. For instance, suppose you do this:

create table fast_lookup as select reference, 
int_array_aggregate(result) from table group by result

The fast_lookup table now has all the result entries as an array in a 
single row. In the systems that I have used this, it has provided a 
dramatic improvement, especially when you have a high number of 
identical reference entries in a classic one to many table.

The question is, would a more comprehensive solution be wanted? 
Possible? Something like:

create table fast_lookup as select reference, aggregate_array( field ) 
from table group by field

Where the function aggregate_array takes any number of data types.

Any thoughts? I think I need to fix the code in the current 
/contrib/intagg anyway, so is it worth doing the extra work to included 
multiple data types?
It's also useful in conjunction with statistically processing. There is 
a array_accum function in PL/R; I just made a post to the SQL list the 
other day on this.
(http://archives.postgresql.org/pgsql-sql/2003-03/msg00124.php)
Here's the meat of it:

CREATE OR REPLACE FUNCTION array_accum (_name, name)
RETURNS name[]
AS '$libdir/plr','array_accum'
LANGUAGE 'C';
CREATE AGGREGATE accumulate (
  sfunc = array_accum,
  basetype = name,
  stype = _name
);
regression=# SELECT accumulate(tablename) as cruft FROM pg_tables WHERE 
tablename LIKE 'c%';
cruft
---
{connectby_int,connectby_text,ct,cth}
(1 row)

See:
  http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
and download at:
  http://www.joeconway.com/plr/
I'd be happy to split the array functions out of PL/R and sumbit them to 
PATCHES if there is any interest.

Joe

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Aggregate rollup

2003-03-05 Thread Greg Stark

mlw [EMAIL PROTECTED] writes:

 I had written a piece of code about two years ago that used the aggregate
 feature of PostgreSQL to create an array of integers from an aggregate, as:
 
 select int_array_aggregate( column ) from table group by column

I found this and am using it extensively. It's extremely helpful, thank you.

It goes well with either the *= operators in contrib/array or the gist
indexing in contrib/intarray.

One problem I've found though is that the optimizer doesn't have any good
statistics for estimating the number of matches of such operators. It seems
like fixing that would require a lot of changes to the statistics gathered.

 create table fast_lookup as select reference, aggregate_array( field ) from
 table group by field
 
 Where the function aggregate_array takes any number of data types.

Sure, that seems logical. Actually I already bumped into a situation where I
wanted an array of char(1). I just kludged it to use ascii() of that first
character, but it would be cleaner and perhaps better for unicode later to use
the actual character.

Someone else on the list already asked for an function that gave an array of
varchar. I think they were pointed at a general purpose function from plr.

--
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster