Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

Boszormenyi Zoltan Mon, 19 Aug 2013 09:25:17 -0700

Hi,

2013-08-13 15:54 keltezéssel, Andrew Gierth írta:

Summary:


This patch implements a method for expanding multiple SRFs in parallel
that does not have the surprising LCM behaviour of SRFs-in-select-list.
(Functions returning fewer rows are padded with nulls instead.)

It then uses this method combined with a parse-time hack to implement
the (intended to be) spec-conforming behaviour of UNNEST with multiple
parameters, including flattening of composite results.

The upshot is that given a table like this:

postgres=# select * from t1;
        a       |         b         |                      c
---------------+-------------------+----------------------------------------------
  {11,12,13}    | {wombat}          |
  {5,10}        | {foo,bar}         | 
{"(123,xyzzy)","(456,plugh)","(789,plover)"}
  {21,31,41,51} | {fred,jim,sheila} | {"(111,xyzzy)","(222,plugh)"}
(3 rows)

(where column "c" is an array of a composite type with 2 cols, "x" and "y")

You can do this:

postgres=# select u.* from t1, unnest(a,b,c) with ordinality as u;
  ?column? | ?column? |  x  |   y    | ordinality
----------+----------+-----+--------+------------
        11 | wombat   |     |        |          1
        12 |          |     |        |          2
        13 |          |     |        |          3
         5 | foo      | 123 | xyzzy  |          1
        10 | bar      | 456 | plugh  |          2
           |          | 789 | plover |          3
        21 | fred     | 111 | xyzzy  |          1
        31 | jim      | 222 | plugh  |          2
        41 | sheila   |     |        |          3
        51 |          |     |        |          4
(10 rows)

Or for an example of general combination of functions:

postgres=# select * from table(generate_series(10,20,5), 
unnest(array['fred','jim']));
  ?column? | ?column?
----------+----------
        10 | fred
        15 | jim
        20 |
(3 rows)

Implementation Details:

The spec syntax for table function calls, <table function derived table>
in <table reference>, looks like TABLE(func(args...)) AS ...

This patch implements that, plus an extension: it allows multiple
functions, TABLE(func1(...), func2(...), func3(...)) [WITH ORDINALITY]
and defines this as meaning that the functions are to be evaluated in
parallel.

This is implemented by changing RangeFunction, function RTEs, and the
FunctionScan node to take lists of function calls rather than a single
function. The calling convention for SRFs is completely unchanged; each
function returns its own rows (or a tuplestore in materialize mode) just
as before, and FunctionScan combines the results into a single output
tuple (keeping track of which functions are exhausted in order to
correctly fill in nulls on a backwards scan).

Then, a hack in the parser converts unnest(...) appearing as a
func_table (and only there) into a list of unnest() calls, one for each
parameter.  So

    select ... from unnest(a,b,c)

is converted to

    select ... from TABLE(unnest(a),unnest(b),unnest(c))

and if unnest appears as part of an existing list inside TABLE(), it's
expanded to multiple entries there too.

This parser hackery is of course somewhat ugly. But given the objective
of implementing the spec's unnest syntax, it seems to be the least ugly
of the possible approaches. (The hard part of doing it any other way
would be generating the description of the result type; composite array
parameters expand into multiple result columns.)


Harder maybe but it may still be cleaner in the long run.

Overall, it's my intention here to remove as many as feasible of the old
reasons why one might use an SRF in the select list.


Indeed, it's a big nail in the coffin for SRFs-in-targetlist. Having
WITH ORDINALITY and this feature, I would vote for removing
SRF-in-targetlist and call the release PostgreSQL 10.0.

  This should also
address the points that Josh brought up in discussion of ORDINALITY
regarding use of SRF-in-select to unnest multiple arrays.

(As a side issue, this patch also sets up pathkeys for ordinality along
the lines of a patch I suggested to Greg a while back in response to
his.)

Current patch status:

This is a first working cut: no docs, no tests, not enough comments, the
deparse logic probably needs more work (it deparses correctly but the
formatting may be suboptimal). However all the functionality is believed
to be in place.


With this last paragraph in mind, I am trying a little review.

* Is the patch in a patch format which has context? (eg: context diff format)

Yes.

* Does it apply cleanly to the current git master?

Applies with some offsets on a few files but without fuzz.

* Does it include reasonable tests, necessary doc patches, etc?

No, as told by the patch author.

* Does the patch actually implement what it's supposed to do?

Yes.

* Do we want that?

Yes.

* Do we already have it?

No.

* Does it follow SQL spec, or the community-agreed behavior?

The SQL spec says these:

In 7.6 <table reference>

<table primary> ::=
...

| <table function derived table> [ AS ] <correlation name> [ <left paren> <derived columnlist> <right paren> ]


also later in the same section:

<table function derived table> ::=
        TABLE <left paren> <collection value expression> <right paren>

In 6.26 <value expression>

<collection value expression> ::=
        <array value expression> | <multiset value expression>

In 6.36 <array value expression>

<array value expression> ::= <array concatenation> | <array primary>
<array concatenation> ::= <array value expression 1> <concatenation operator> 
<array primary>
<array value expression 1> ::= <array value expression>
<array primary> ::= <array value function> | <value expression primary>

6.3 <value expression primary>

<value expression primary> ::=
        <parenthesized value expression>
        | <nonparenthesized value expression primary>

<parenthesized value expression> ::= <left paren> <value expression> <right 
paren>

<nonparenthesized value expression primary> ::=
        <unsigned value specification>
        | <column reference>
        | <set function specification>
        | <window function>
        | <nested window function>
        | <scalar subquery>
        | <case expression>
        | <cast specification>
        | <field reference>
        | <subtype treatment>
        | <method invocation>
        | <static method invocation>
        | <new specification>
        | <attribute or method reference>
        | <reference resolution>
        | <collection value constructor>
        | <array element reference>
        | <multiset element reference>
        | <next value expression>
        | <routine invocation>

collection value constructor> ::=
        | <array value constructor>
        | <multiset value constructor>

So, the FROM TABLE(...) AS (...) syntax is a big can of worms and
I haven't even quoted <multiset value expression>.

As far as I can tell, these should also be allowed but isn't:

zozo=# select * from table('a'::text) as x;
ERROR:  syntax error at or near "'a'"
LINE 1: select * from table('a'::text) as x;
                            ^
zozo=# select x.* from t1, table(t1.a) as x;
ERROR:  syntax error at or near ")"
LINE 1: select x.* from t1, table(t1.a) as x;
                                      ^
zozo=# select x.* from table((6)) as x(a int4);
ERROR:  syntax error at or near "("
LINE 1: select x.* from table((6)) as x(a int4);
                              ^
zozo=# select x.* from table(values (6)) as x(a int4);
ERROR:  syntax error at or near "("
LINE 1: select x.* from table(values (6)) as x(a int4);
                                     ^
zozo=# select x.* from table(values(6)) as x(a int4);
ERROR:  syntax error at or near "("
LINE 1: select x.* from table(values(6)) as x(a int4);
                                    ^

What the patch implements is only the last choice for
<nonparenthesized value expression primary>: <routine invocation>

When you add documentation, it would be nice to mention it.

Also, the grammar extension is a start for adding all the other
standard choices for TABLE().

* Does it include pg_dump support (if applicable)?

n/a

* Are there dangers?

I can't see any. 8-)

* Have all the bases been covered?

My previous comments about the TABLE() syntax says it all.
You can interpret it either way. :-)

* Does the feature work as advertised?

Yes.

* Are there corner cases the author has failed to consider?

I don't know.

* Are there any assertion failures or crashes?

No.

* Does the patch slow down simple tests?

No.

* If it claims to improve performance, does it?

It certainly improves writing queries, as functions inside
unnest() get processed in one scan.

* Does it slow down other things?

I don't think so.

* Does it follow the project coding guidelines?

Yes.

* Are there portability issues?

No.

* Will it work on Windows/BSD etc?

It should, the code uses standard internal PostgreSQL APIs
and extends them. No new system call.

* Are the comments sufficient and accurate?

According to the author, no.

* Does it do what it says, correctly?

Yes.

* Does it produce compiler warnings?

No.

* Can you make it crash?

No.

* Is everything done in a way that fits together coherently with other 
features/modules?

I think so

* Are there interdependencies that can cause problems?

I don't know.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
     http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

Reply via email to