subject:"\[HACKERS\] proposal\: array utility functions phase 1"

Re: [HACKERS] proposal: array utility functions phase 1

2002-12-11 Thread Joe Conway

Tom Lane wrote:

It seems like somehow we need a level of FROM/WHERE producing some base
rows, and then a set of table function calls to apply to each of the
base rows, and then another level of WHERE to filter the results of the
function calls (in particular to provide join conditions to identify
which rows to match up in the function outputs).  I don't see any way to
do this without inventing new SELECT clauses out of whole cloth
... unless SQL99's WITH clause helps, but I don't think it does ...


Well, maybe this is a start. It allows a table function's input parameter to 
be declared with setof. The changes involved primarily:

1) a big loop in ExecMakeTableFunctionResult so that functions with set 
returning arguments get called for each row of the argument,
  and
2) aways initializing the tuplestore in ExecMakeTableFunctionResult and 
passing that to the function, even when SFRM_Materialize mode is used.

The result looks like:

create table foot(f1 text, f2 text);
insert into foot values('a','b');
insert into foot values('c','d');
insert into foot values('e','f');

create or replace function test2() returns setof foot as 'select * from foot 
order by 1 asc' language 'sql';
create or replace function test(setof foot) returns foot as 'select $1.f1, 
$1.f2' language 'sql';

regression=# select * from test(test2());
 f1 | f2
+
 a  | b
 c  | d
 e  | f
(3 rows)

I know it doesn't solve all the issues discussed, but is it a step forward? 
Suggestions?

Joe
Index: contrib/tablefunc/tablefunc.c
===
RCS file: /opt/src/cvs/pgsql-server/contrib/tablefunc/tablefunc.c,v
retrieving revision 1.11
diff -c -r1.11 tablefunc.c
*** contrib/tablefunc/tablefunc.c   23 Nov 2002 01:54:09 -  1.11
--- contrib/tablefunc/tablefunc.c   11 Dec 2002 22:07:01 -
***
*** 53,59 
  int max_depth,
  bool show_branch,
  MemoryContext per_query_ctx,
! AttInMetadata *attinmeta);
  static Tuplestorestate *build_tuplestore_recursively(char *key_fld,
 char *parent_key_fld,
 char *relname,
--- 53,60 
  int max_depth,
  bool show_branch,
  MemoryContext per_query_ctx,
! AttInMetadata *attinmeta,
! Tuplestorestate *tupstore);
  static Tuplestorestate *build_tuplestore_recursively(char *key_fld,
 char *parent_key_fld,
 char *relname,
***
*** 641,646 
--- 642,648 
char   *branch_delim = NULL;
boolshow_branch = false;
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+   Tuplestorestate *tupstore;
TupleDesc   tupdesc;
AttInMetadata *attinmeta;
MemoryContext per_query_ctx;
***
*** 673,678 
--- 675,681 
 "allowed in this context");
  
/* OK, go to work */
+   tupstore = rsinfo->setResult;
rsinfo->returnMode = SFRM_Materialize;
rsinfo->setResult = connectby(relname,
  key_fld,
***
*** 682,688 
  max_depth,
  show_branch,
  per_query_ctx,
! attinmeta);
rsinfo->setDesc = tupdesc;
  
MemoryContextSwitchTo(oldcontext);
--- 685,692 
  max_depth,
  show_branch,
  per_query_ctx,
! attinmeta,
! tupstore);
rsinfo->setDesc = tupdesc;
  
MemoryContextSwitchTo(oldcontext);
***
*** 709,732 
  int max_depth,
  bool show_branch,
  MemoryContext per_query_ctx,
! AttInMetadata *attinmeta)
  {
-   Tuplestorestate *tupstore = NULL;
int ret;
-   MemoryContext oldcontext;
  
/* Connect to SPI manager */
if ((ret = SPI_connect()) < 0)
elog(ERROR, "connectby: SPI_connect returned %d", ret);
  
-   /* switch to longer term context to create the tuple store */
-   oldcontext = MemoryContextSwitchTo(per_query_ctx);
- 
-   /* initialize our tuplestore */
-   tupstore = tuplestore_begin_heap(tru

Re: [HACKERS] proposal: array utility functions phase 1

2002-12-09 Thread Joe Conway

Tom Lane wrote:

This seems fairly unworkable to me as-is.  By definition, WHERE selects
from a cross-product of the FROM tables; to make the above do what you
want, you'd have to break that fundamental semantics.  The semantics of
explicit JOIN cases would be broken too.

What we need is some kind of explicit multi-level SELECT operation.
Perhaps it would help to think about the analogy of aggregates of
aggregate functions, which are impossible to express properly in a
single SELECT but work nicely given subselect-in-FROM.
Subselect-in-FROM doesn't seem to get this job done though.

Right offhand I don't see any reasonable syntax other than
function-in-the-SELECT-list, which shoots us right back into the
messinesses of the Berkeley implementation.  However, we do now have the
precedent of the table-function AS clause.  Does it help any to do
something like

	SELECT grosysid, array_values(grolist) AS (array_index,member_id)
	FROM pg_group where groname = 'g2';


After further thought, and ignoring the difficulty of implementation, what 
seems ideal is to be able to specify 'setof ' or 'setof 
' as an input to the function, and fire the function once for 
each row of the input. Basically, allow anything that now qualifies as a FROM 
item -- a table reference, a subselect with AS clause, another table function, 
or maybe even a join clause. Some (totally contrived) examples of how it would 
look:

create table foo1(f1 int, f2 text);
insert into foo1 values(1,'a');
insert into foo1 values(2,'b');
insert into foo1 values(3,'c');

create table foo2(f1 int, f2 text);
insert into foo2 values(1,'w');
insert into foo2 values(1,'x');
insert into foo2 values(2,'y');
insert into foo2 values(2,'z');

create function funcfoo1(setof foo1) returns setof foo2 as 'select * from foo2 
where foo2.f1 = $1.f1' language 'sql';
select * from funcfoo1(foo1);
 f1   f2
+-
 1  | w
 1  | x
 2  | y
 2  | z

select * from funcfoo1((select * from foo1 where f1=1) as t);
 f1   f2
+-
 1  | w
 1  | x


What do you think?


(Again you could wrap this in an outer SELECT to transform the
member_ids to member_names.)

The real problem with the Berkeley approach shows up when you consider
what happens with multiple table functions called in a single SELECT.
The code we currently have produces the cross-product of the implied
rows (or at least it tries to, I seem to recall that it doesn't
necessarily get it right).  That's pretty unpleasant, and though you can
filter the rows in an outer SELECT, there's no way to optimize the
implementation into a smarter-than-nested-loop join.


What if there was a way to declare that a table function returns sorted 
results, and on which column(s)?


It seems like somehow we need a level of FROM/WHERE producing some base
rows, and then a set of table function calls to apply to each of the
base rows, and then another level of WHERE to filter the results of the
function calls (in particular to provide join conditions to identify
which rows to match up in the function outputs).  I don't see any way to
do this without inventing new SELECT clauses out of whole cloth
... unless SQL99's WITH clause helps, but I don't think it does ...


Is this still needed given my approach above?



How ugly/difficult would it be to allow the planner to interrogate the 
function and let the function report back a tupledesc based on the actual 
runtime input parameters?


Parse-time, not run-time.  It could be done --- IIRC, the auxiliary
"function info" call we introduced in the V1 fmgr protocol was
deliberately designed to allow expansion in this sort of direction.
But it would have to take a tupledesc (or some similar static
description) and return another one.


Nice! I'll dig in to that a bit.

Thanks,

Joe




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] proposal: array utility functions phase 1

2002-12-09 Thread Tom Lane

Joe Conway <[EMAIL PROTECTED]> writes:
> Yes, this is exactly what I was yearning to do. Was there a spec or technical
> reason (or both) for not allowing the following?

>select * from array_values(g.grolist), pg_group g where g.groname = 'g2';

This seems fairly unworkable to me as-is.  By definition, WHERE selects
from a cross-product of the FROM tables; to make the above do what you
want, you'd have to break that fundamental semantics.  The semantics of
explicit JOIN cases would be broken too.

What we need is some kind of explicit multi-level SELECT operation.
Perhaps it would help to think about the analogy of aggregates of
aggregate functions, which are impossible to express properly in a
single SELECT but work nicely given subselect-in-FROM.
Subselect-in-FROM doesn't seem to get this job done though.

Right offhand I don't see any reasonable syntax other than
function-in-the-SELECT-list, which shoots us right back into the
messinesses of the Berkeley implementation.  However, we do now have the
precedent of the table-function AS clause.  Does it help any to do
something like

SELECT grosysid, array_values(grolist) AS (array_index,member_id)
FROM pg_group where groname = 'g2';

(Again you could wrap this in an outer SELECT to transform the
member_ids to member_names.)

The real problem with the Berkeley approach shows up when you consider
what happens with multiple table functions called in a single SELECT.
The code we currently have produces the cross-product of the implied
rows (or at least it tries to, I seem to recall that it doesn't
necessarily get it right).  That's pretty unpleasant, and though you can
filter the rows in an outer SELECT, there's no way to optimize the
implementation into a smarter-than-nested-loop join.

It seems like somehow we need a level of FROM/WHERE producing some base
rows, and then a set of table function calls to apply to each of the
base rows, and then another level of WHERE to filter the results of the
function calls (in particular to provide join conditions to identify
which rows to match up in the function outputs).  I don't see any way to
do this without inventing new SELECT clauses out of whole cloth
... unless SQL99's WITH clause helps, but I don't think it does ...

> How ugly/difficult would it be to allow the planner to interrogate the 
> function and let the function report back a tupledesc based on the actual 
> runtime input parameters?

Parse-time, not run-time.  It could be done --- IIRC, the auxiliary
"function info" call we introduced in the V1 fmgr protocol was
deliberately designed to allow expansion in this sort of direction.
But it would have to take a tupledesc (or some similar static
description) and return another one.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] proposal: array utility functions phase 1

2002-12-08 Thread Joe Conway

Tom Lane wrote:

This crystallizes something that has been bothering me for awhile: the
table function syntax is severely hobbled (not to say crippled :-() by
the fact that the function arguments have to be constants.  You really
don't want to have to invent intermediate functions every time you want
a slightly different query --- yet this technique seems to require *two*
bespoke functions for every query, one on each end of the array_values()
function.


It did for me too. I was thinking along these lines while working on the 
connectby function, but this work really makes it clear.

The original Berkeley syntax, messy as it was, at least avoided this
problem.  For example, I believe this same problem could be solved
(approximately) with

	select array_values(grolist) from pg_group where groname = 'g2'


Yes, this is exactly what I was yearning to do. Was there a spec or technical 
reason (or both) for not allowing the following?

  select * from array_values(g.grolist), pg_group g where g.groname = 'g2';

It seems like you could treat it like a one-to-many join between pg_group and 
the function. I'm sure this is a bad idea and breaks down for more complex 
examples, but I often have found myself wishing I could do exactly that.


I think we ought to try to find a way that table functions could be used
with inputs that are taken from tables.  In a narrow sense you can do
this already, with a sub-SELECT:

	select * from my_table_func((select x from ...));

but (a) the sub-select can only return a single value, and (b) you can't
get at any of the other columns in the row the sub-select is selecting.
For instance it won't help me much to do

	select * from
	array_values((select grolist from pg_group where groname = 'g2'))

if I want to show the group's grosysid as well.


You could do something like:
  select * from array_values('pg_group','grolist') ...
and repeat the rest of pg_group's columns for each row produced from grolist 
in the output (this is closer to what Nigel did, IIRC). This even works in the 
current table function implementation. It does not get around the issue of 
specifying querytime column refs though.

I know I'm not explaining this very well (I'm only firing on one
cylinder today :-(), but basically I think we need to step back and take
another look at the mechanism before we start inventing tons of helper
functions to make up for the lack of adequate mechanism.


Nope, you're explaining it just fine -- it's what I've been thinking for a 
while, but couldn't articulate myself.


As for array_values() itself, it seems fairly inelegant to rely on the
user to get the input and output types to match up.  Now that we have
an "anyarray" pseudotype, I think it wouldn't be unreasonable to hack up
some kluge in the parser to allow reference to the element type of such
an argument --- that is, you'd say something like

	create function array_values(anyarray) returns setof anyarray_element

and the parser would automatically understand what return type to assign
to any particular use of array_values.  (Since type resolution is done
bottom-up, I see no logical difficulty here, though the implementation
might be a bit of a wart...)


That doesn't quite work as written (you'd have to account for the array index 
column or lose it -- which loses any ability to get position in the array), 
and has even more problems with the array_values('pg_group','grolist') approach.

How ugly/difficult would it be to allow the planner to interrogate the 
function and let the function report back a tupledesc based on the actual 
runtime input parameters? Kind of a special mode of function call that the 
function could detect and respond to differently than during execution (to 
avoid excessive runtime an/or side effects -- just form a tupledesc and return 
it). Then the planner could move forward without requiring a specific declared 
return composite type or a return type of record with a runtime query column 
definition.

Joe


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] proposal: array utility functions phase 1

2002-12-08 Thread Tom Lane

Joe Conway <[EMAIL PROTECTED]> writes:
> [ much snipped ]
> The first function borrows from an idea Nigel Andrews had -- i.e. expand an 
> array into rows (and possibly columns). It currently works like this:

> -- 1D array
> test=# select * from array_values('{101,102,103,104}'::int[]) as (a int, b1 int);

> Now on to the TODO item. Given the array_values() function, here's what I was
> thinking of to implement listing members of a group:

> CREATE OR REPLACE FUNCTION pg_get_grolist(text) RETURNS INT[] AS 'SELECT 
> grolist FROM pg_group WHERE groname = $1' LANGUAGE 'sql';

> CREATE TYPE pg_grolist_rec AS (array_index int, member_name text);

> CREATE OR REPLACE FUNCTION group_list(text) RETURNS SETOF pg_grolist_rec AS 
> 'SELECT g.id, pg_get_userbyid(g.usesysid)::text AS member_name FROM 
> array_values(pg_get_grolist($1)) AS g(id int, usesysid int)' LANGUAGE 'sql';

This crystallizes something that has been bothering me for awhile: the
table function syntax is severely hobbled (not to say crippled :-() by
the fact that the function arguments have to be constants.  You really
don't want to have to invent intermediate functions every time you want
a slightly different query --- yet this technique seems to require *two*
bespoke functions for every query, one on each end of the array_values()
function.

The original Berkeley syntax, messy as it was, at least avoided this
problem.  For example, I believe this same problem could be solved
(approximately) with

select array_values(grolist) from pg_group where groname = 'g2'

--- getting the users shown as names instead of numbers would take an
extra level of SELECT, but I leave the details to the reader.

I think we ought to try to find a way that table functions could be used
with inputs that are taken from tables.  In a narrow sense you can do
this already, with a sub-SELECT:

select * from my_table_func((select x from ...));

but (a) the sub-select can only return a single value, and (b) you can't
get at any of the other columns in the row the sub-select is selecting.
For instance it won't help me much to do

select * from
array_values((select grolist from pg_group where groname = 'g2'))

if I want to show the group's grosysid as well.

I know I'm not explaining this very well (I'm only firing on one
cylinder today :-(), but basically I think we need to step back and take
another look at the mechanism before we start inventing tons of helper
functions to make up for the lack of adequate mechanism.

As for array_values() itself, it seems fairly inelegant to rely on the
user to get the input and output types to match up.  Now that we have
an "anyarray" pseudotype, I think it wouldn't be unreasonable to hack up
some kluge in the parser to allow reference to the element type of such
an argument --- that is, you'd say something like

create function array_values(anyarray) returns setof anyarray_element

and the parser would automatically understand what return type to assign
to any particular use of array_values.  (Since type resolution is done
bottom-up, I see no logical difficulty here, though the implementation
might be a bit of a wart...)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] proposal: array utility functions phase 1

2002-12-08 Thread Joe Conway

I'm working on the TODO item "Allow easy display of usernames in a group" in 
the context of a slightly larger effort to improve usability of arrays. I'm 
far enough down the road to have a better idea of where I want to go with 
this, but I'd like to vet those ideas with the list so I don't waste too much 
effort if everyone hates them ;-)

The first function borrows from an idea Nigel Andrews had -- i.e. expand an 
array into rows (and possibly columns). It currently works like this:

-- 1D array
test=# select * from array_values('{101,102,103,104}'::int[]) as (a int, b1 int);
 a | b1
---+-
 1 | 101
 2 | 102
 3 | 103
 4 | 104
(4 rows)

CREATE TABLE arr_text(f1 int, f2 text[]);
INSERT INTO arr_text VALUES (1, '{"a","b","c"}');
UPDATE arr_text SET f2[-2:0] = '{"x","y","z"}' WHERE f1 = 1;
CREATE OR REPLACE FUNCTION get_arr_text(int) RETURNS text[] AS 'SELECT f2 FROM 
arr_text WHERE f1 = $1' LANGUAGE 'sql';

test=# select * from array_values(get_arr_text(1)) as (a int, b1 text);
 a  | b1
+
 -2 | x
 -1 | y
  0 | z
  1 | a
  2 | b
  3 | c
(6 rows)

-- 2D array
test=# select * from array_values('{{1,2,3,4},{5,6,7,8}}'::int[]) as (a int, 
b1 int, b2 int, b3 int, b4 int);
 a | b1 | b2 | b3 | b4
---++++
 1 |  1 |  2 |  3 |  4
 2 |  5 |  6 |  7 |  8
(2 rows)

It accepts type anyarray, and returns record. The first column preserves the 
array subscript for the 1st dimension.

One question I have is this: what, if anything, should be done with 3 (and 
higher) dimension arrays? I was considering returning 2 columns -- the 1st 
dimension array subscript, and a 2nd column containing the sub-array left 
over. E.g.:

array_values('{{{111,112},{121,122}},{{211,212},{221,222}}}'::int[]) would become:

 a  |  b1
+---
 1  | {{111,112},{121,122}}
 2  | {{211,212},{221,222}}

Does this make sense, or is something else better, or would it be better not 
to support 3 dim arrays and up?


Now on to the TODO item. Given the array_values() function, here's what I was 
thinking of to implement listing members of a group:

CREATE OR REPLACE FUNCTION pg_get_grolist(text) RETURNS INT[] AS 'SELECT 
grolist FROM pg_group WHERE groname = $1' LANGUAGE 'sql';

CREATE TYPE pg_grolist_rec AS (array_index int, member_name text);

CREATE OR REPLACE FUNCTION group_list(text) RETURNS SETOF pg_grolist_rec AS 
'SELECT g.id, pg_get_userbyid(g.usesysid)::text AS member_name FROM 
array_values(pg_get_grolist($1)) AS g(id int, usesysid int)' LANGUAGE 'sql';

test=# select * from pg_group;
 groname | grosysid |grolist
-+--+---
 g1  |  100 | {100,101}
 g2  |  101 | {100,101,102}
(2 rows)

test=# select * from group_list('g2');
 array_index | member_name
-+-
   1 | user1
   2 | user2
   3 | user3


pg_get_grolist(text) is intended for internal use, as is the pg_grolist_rec 
composite type. group_list() is intended as the user facing table function. I 
would implement this by running the three statements above during initdb.

Any comments or objections WRT object names or the method of implementation? I 
don't think this is a very speed critical application, but even using the sql 
functions it is very fast:
test=# explain analyze select * from group_list('g2');
 QUERY PLAN

 Function Scan on group_list  (cost=0.00..12.50 rows=1000 width=36) (actual 
time=1.49..1.50 rows=3 loops=1)
 Total runtime: 1.55 msec
(2 rows)


I have more planned beyond the above as outlined in an earlier post (see 
http://archives.postgresql.org/pgsql-hackers/2002-11/msg01213.php).

Next on my list will be a split() function (as discussed in early September) 
that creates an array from an input string by splitting on a given delimiter. 
This is similar to functions in perl, php, and undoubtedly other languages. It 
should work nicely in conjunction with array_values().

Sorry for the long mail and thanks for any feedback!

Joe


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] proposal: array utility functions phase 1

Re: [HACKERS] proposal: array utility functions phase 1

Re: [HACKERS] proposal: array utility functions phase 1

Re: [HACKERS] proposal: array utility functions phase 1

Re: [HACKERS] proposal: array utility functions phase 1

[HACKERS] proposal: array utility functions phase 1

6 matches

Site Navigation

Mail list logo

Footer information