Re: [HACKERS] an aggregate array function

2003-07-29 Thread Merlin Moncure
Joe Conway wrote:
  Do you think there would be any use for an aggregate which returns
an
  array of the aggregated (usually simple) type?  

What exactly have you looked at? In current cvs there is array_append 
and array_cat. There *was* array_accum, but that was yanked due to an 
objection that it was redundant with the other two.

Actually, I was looking at array_set, which has a provision to grow a 1d
array.  I was looking at the 7.3.2 sources, so it's pretty clear I would
have to look at something newer.  I'll wait for 7.4.

BTW, I tried array_accum() (which is not really much different than 
array_append()) with groups of about 10,000 elements and feeding the 
array into a plr final function for a mean calculation. It was for sure

slow compared to a native AVG() aggregate, but it wasn't that bad 
either. I don't remember the specifics, but it would be easy enough to 
try it out for yourself.

Well, if dynamic growth is expected, there are some easy optimizations
that could reduce the time spent reallocating the array.  If there was a
detectable difference vs. the avg() function, which performs zero
reallocations, its probably worthwhile.

Based on what I saw with the older source, I assumed there was little or
no dynamic growth in normal use.

 No one is currently working on it that I'm aware of, but I was 
 considering working on it for 7.5

By all means!  What do you think about the other question about an
'array creating aggregate', is that a useful contribution?

Joe



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] an aggregate array function

2003-07-29 Thread Joe Conway
Merlin Moncure wrote:
What do you think about the other question about an
'array creating aggregate', is that a useful contribution?
Hmm, either I'm not understanding you, or you're not understanding me ;-)

First, see contrib/intagg.
Second, the following works in 7.4devel:
-- create test data for polymorphic aggregates
create table t(f1 int, f2 float, f3 float);
insert into t values(1,11.1,21.1);
insert into t values(1,11.2,21.2);
insert into t values(1,11.3,21.3);
insert into t values(2,12.1,22.1);
insert into t values(2,12.2,22.2);
insert into t values(2,12.3,22.3);
insert into t values(3,13.1,23.1);
insert into t values(3,13.2,23.2);
CREATE AGGREGATE myagg1
(
  BASETYPE = float8,
  SFUNC = array_append,
  STYPE = float8[],
  INITCOND = '{}'
);
CREATE AGGREGATE myagg2
(
  BASETYPE = float8[],
  SFUNC = array_cat,
  STYPE = float8[],
  INITCOND = '{}'
);
regression=# select f1, myagg1(f2) from t group by f1;
 f1 |  myagg1
+--
  3 | {13.1,13.2}
  2 | {12.1,12.2,12.3}
  1 | {11.1,11.2,11.3}
(3 rows)
regression=# select f1, myagg2(array[f2,f3]) from t group by f1;
 f1 |myagg2
+---
  3 | {{13.1,23.1},{13.2,23.2}}
  2 | {{12.1,22.1},{12.2,22.2},{12.3,22.3}}
  1 | {{11.1,21.1},{11.2,21.2},{11.3,21.3}}
(3 rows)
Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] an aggregate array function

2003-07-29 Thread Dani Oderbolz
Merlin Moncure wrote:

Dear hackers,

 

Do you think there would be any use for an aggregate which returns an 
array of the aggregated (usually simple) type?  Has this already been 
done by anyone?  I looked at the source and noticed that for each 
inserted item, the array utility functions perform a deep copy of the 
entire array (plus a reallocation).  Normally, this is no big deal, 
but if executed in the query stage, it could be kind of slow.  I also 
noticed that null values inside is an item on the todo list.  Is 
anybody currently working on this?

 

Merlin

I got another question to this:
why does PostgreSQL support Arrays in the first place?
For my taste, its a very messy concept...
Regards,
Dani
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] an aggregate array function

2003-07-29 Thread Andrew Dunstan
It's in the SQL99 standard. There's nothing forcing you to use them - I 
am a (possibly) old-fashioned data architect, so I never use them ;-)

SQL99 actually allows you to use more or less arbitrary composite types 
as columns (although Pg currently doesn't) - many would argue that this 
violates first normal form. OTOH there's probably a good case to make 
that all this is necessary to provide good support to ObjectRelational 
mappings, and other OO stuff.

andrew

Dani Oderbolz wrote:

I got another question to this:
why does PostgreSQL support Arrays in the first place?
For my taste, its a very messy concept...
Regards,
Dani



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] an aggregate array function

2003-07-29 Thread Merlin Moncure
Andrew wrote:
 It's in the SQL99 standard. There's nothing forcing you to use them -
I 
 am a (possibly) old-fashioned data architect, so I never use them ;-)

 SQL99 actually allows you to use more or less arbitrary composite
types 
 as columns (although Pg currently doesn't) - many would argue that
this 
 violates first normal form. 
snip

I would (strongly) disagree with your last statement: I believe the
opposite to be true. I do agree that they are a violation of the first
normal form when used a mechanism for general data storage;  however the
intent here is for arrays to help get around sql's difficulty (largely
due to the lack of recursive queries in postgres, but also in a more
general way) in dealing with post-query related data.  

Arrays (as a result of a query) help to enhance relational use of the
database by indirectly allowing a more natural method of storage by
giving you more power to query the data.  The main problem is sql's
general treatment of results sets as two dimensional tables when in fact
they are just possible branch points in a 'n' dimensional space
(specially expressed by arrays in limited, but useful circumstances).

In other words, arrays for input = bad, arrays for output = not so bad.
When recursive queries become available, I'll probably use them
instead(I've never had the luxury), but in the mean time...

p.s.  
Joe, you were right I did misunderstand both you and postgres's
capabilities at the present time.  The functionality in your example was
exactly what I was looking for.  I still hold to my point that if the
array is performing deep copies upon growth, there can be vast speed
improvements in cases (such as during the array aggregation) when
aggressive growth can be predicted in advance.  The worst case of
'reallocate after each aggregation' can be particularly bad.  In any
case, I'll shut up now :)

Regards,
Merlin




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] an aggregate array function

2003-07-29 Thread Andrew Dunstan
well, (smile) I didn't say *I* saw violation of FNF as an objection. I 
think my statement is true - many would see it as a violation of FNF. 
Many others like you might argue differently.

I first got into this business nearly 20 years ago when I came to 
realise the severe limitations of the relational algebra. Nothing much 
has changed about that - it can still be very cumbersome, to say the 
least. So I'm not dogmatic about obeying some design law that some 
theorist has laid down. It's more like the English grammar rule about 
split infinitives - I understand why some people disagree with the rule 
(or deny it exists), but I still (almost) never use split infinitives 
myself - I just don't find the need.

*shrug* Never mind - I'll go back to wrestling with Ant now.

cheers

andrew

Merlin Moncure wrote:

Andrew wrote:
 

It's in the SQL99 standard. There's nothing forcing you to use them -
   

I 
 

am a (possibly) old-fashioned data architect, so I never use them ;-)
   

 

SQL99 actually allows you to use more or less arbitrary composite
   

types 
 

as columns (although Pg currently doesn't) - many would argue that
   

this 
 

violates first normal form. 
   

snip

I would (strongly) disagree with your last statement: I believe the
opposite to be true. I do agree that they are a violation of the first
normal form when used a mechanism for general data storage;  however the
intent here is for arrays to help get around sql's difficulty (largely
due to the lack of recursive queries in postgres, but also in a more
general way) in dealing with post-query related data.  

Arrays (as a result of a query) help to enhance relational use of the
database by indirectly allowing a more natural method of storage by
giving you more power to query the data.  The main problem is sql's
general treatment of results sets as two dimensional tables when in fact
they are just possible branch points in a 'n' dimensional space
(specially expressed by arrays in limited, but useful circumstances).
In other words, arrays for input = bad, arrays for output = not so bad.
When recursive queries become available, I'll probably use them
instead(I've never had the luxury), but in the mean time...
p.s.  
Joe, you were right I did misunderstand both you and postgres's
capabilities at the present time.  The functionality in your example was
exactly what I was looking for.  I still hold to my point that if the
array is performing deep copies upon growth, there can be vast speed
improvements in cases (such as during the array aggregation) when
aggressive growth can be predicted in advance.  The worst case of
'reallocate after each aggregation' can be particularly bad.  In any
case, I'll shut up now :)

Regards,
Merlin
 



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] an aggregate array function

2003-07-28 Thread Joe Conway
Merlin Moncure wrote:
Dear hackers,
 
Do you think there would be any use for an aggregate which returns an
array of the aggregated (usually simple) type?  Has this already been
done by anyone?  I looked at the source and noticed that for each
inserted item, the array utility functions perform a deep copy of the
entire array (plus a reallocation).  Normally, this is no big deal, but
if executed in the query stage, it could be kind of slow.
What exactly have you looked at? In current cvs there is array_append 
and array_cat. There *was* array_accum, but that was yanked due to an 
objection that it was redundant with the other two.

There is a contrib (intagg) that avoids the deep copy by passing 
pointers as integers, but I don't think that would be accepted as a 
builtin solution. I've considered maybe using a hash table to register 
valid pointers, but have not thought too hard about it yet. In any case 
it won't happen for 7.4.

BTW, I tried array_accum() (which is not really much different than 
array_append()) with groups of about 10,000 elements and feeding the 
array into a plr final function for a mean calculation. It was for sure 
slow compared to a native AVG() aggregate, but it wasn't that bad 
either. I don't remember the specifics, but it would be easy enough to 
try it out for yourself.

I also noticed that null values inside is an item on the todo list.  Is anybody
currently working on this?
 
No one is currently working on it that I'm aware of, but I was 
considering working on it for 7.5

Joe



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly