Values list-of-targetlists patch for comments (was Re: [PATCHES] [HACKERS] 8.2 features?)

Joe Conway Sun, 23 Jul 2006 21:57:38 -0700

Tom Lane wrote:

Joe Conway <[EMAIL PROTECTED]> writes:

I'm liking this too. But when you say "jointree node", are you saying to model the new node type after NestLoop/MergeJoin/HashJoin nodes? These are referred to as "join nodes" in ExecInitNode. Or as you mentioned a couple of times, should this look more like an Append node?



No, I guess I confused you by talking about the executor representation
at the same time.  This is really unrelated to the executor.  The join
tree I'm thinking of here is the data structure that dangles off
Query.jointree --- it's a representation of the query's FROM clause,
and (at present) can contain RangeTblRef, FromExpr, and JoinExpr nodes.
See the last hundred or so lines of primnodes.h for some details.
The jointree is used by the planner to compute the plan node tree that
the executor will run, but it's not the same thing.

There are basically two ways you could go about this:
1. Make a new jointree leaf node type to represent a VALUES construct,
   and dangle the list of lists of expressions off that.
2. Make a new RangeTblEntry type to represent a VALUES construct, and
   just put a RangeTblRef to it into the jointree.  The expressions
   dangle off the RangeTblEntry.

Offhand I'm not certain which of these would be cleanest.  The second
way has some similarities to the way we handle set operation trees
(UNION et al), so it might be worth looking at that stuff.  However,
being a RangeTblEntry has a lot of baggage (eg, various routines expect
to find an RTE alias, column names, column types, etc) and maybe we
don't need all that for VALUES.

Since the feature freeze is only about a week off, I wanted to post this patch even though it is not yet ready to be applied.


Executive summary:
==================
1. The patch is now large and invasive based on adding new node
   types and associated infrastructure. I modelled the nodes largely
   on RangeFunction and FunctionScan.
2. Performance is close enough to mysql to not be a big issue (I think,
   more data below) as long as the machine does not get into a memory
   swapping regime. Memory usage is now better, but not as good as
   mysql.
3. I specifically coded with the intent of preserving current insert
   statement behavior and code paths for current functionality. So there
   *should* be no performance degradation or subtle semantics changes
   for "INSERT DEFAULT VALUES", "INSERT ... VALUES (with one target
   list)", "INSERT ... SELECT ...". Even Tom's recently discovered
   "insert into foo values (tenk1.*)" still works ;-)

Performance:
============

On my development machine (dual core amd64, 2GB RAM) I get the following results using the php script posted earlier:


Postgres:
---------
$loopcount = 100000;
multi-INSERT-at-once Elapsed time is 1 second

$loopcount = 300000;
multi-INSERT-at-once Elapsed time is 5 seconds

$loopcount = 500000;
multi-INSERT-at-once Elapsed time is 9 seconds

$loopcount = 800000;
multi-INSERT-at-once Elapsed time is 14 seconds

$loopcount = 900000;
multi-INSERT-at-once Elapsed time is 17 seconds

$loopcount = 1000000;
multi-INSERT-at-once Elapsed time is 42 seconds

$loopcount = 2000000;
killed after 5 minutes due to swapping

MySQL:
------
$loopcount = 100000;
multi-INSERT-at-once Elapsed time is 2 seconds

$loopcount = 300000;
INSERT failed:Got a packet bigger than 'max_allowed_packet' bytes
changed max_allowed_packet=64M
multi-INSERT-at-once Elapsed time is 5 seconds

$loopcount = 500000;
multi-INSERT-at-once Elapsed time is 8 seconds

$loopcount = 800000;
multi-INSERT-at-once Elapsed time is 13 seconds

$loopcount = 900000;
multi-INSERT-at-once Elapsed time is 15 seconds

$loopcount = 1000000;
multi-INSERT-at-once Elapsed time is 17 seconds

$loopcount = 2000000;
multi-INSERT-at-once Elapsed time is 36 seconds

$loopcount = 3000000;
multi-INSERT-at-once Elapsed time is 54 seconds

$loopcount = 4000000;
multi-INSERT-at-once Elapsed time is 134 seconds

<table value constructor>:
==========================

Included in this patch is support for <table value constructor> in the FROM clause, e.g.:


regression=# select * from {values (1,array[1,2]),(2,array[3,4])};
 ?column? | array
----------+-------
        1 | {1,2}
        2 | {3,4}
(2 rows)

The strange syntax is a temporary hack to eliminate shift/reduce conflicts. I'm not entirely sure we want to try to support this (or something like it) for 8.2, but much of what is needed is now readily available. More on known issues next.


Known Issues:
=============

General:
--------
1. Several comments in the patch are marked "FIXME". These are areas
   where I was uncertain what was the "right thing to do". Any advice
   on these specific spots would be very much appreciated.
2. I broke the rules regression test -- still need to look at what I
   did to mess that up. Somewhere in the reconstruction of "VALUES ..."
   according to the diff.

VALUES multi-targetlist INSERTS:
--------------------------------
3. Not yet quite sure how to get DEFAULT to work for "INSERT ...
   multi-values". As noted above, works fine if there is only
   one targetlist.

<table value constructor>:
--------------------------
4. I'm getting shift/reduce conflicts that are not easily eliminated.
   Making VALUES fully reserved only made it 1 shift/reduce conflict.
5. Column aliases are still not working correctly. Haven't really looked
   closely at this yet.
6. Data types are being deduced currently based on the first row,
   and not currently getting checked on subsequent rows. So it is
   easy to induce a crash:

regression=# select * from {values (1,array[1,2]),(2,3)};
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

7. In general, <table value constructor> in the FROM clause needs
   more discussion -- among other things, how should we determine and
   enforce column types? I think this could be a very useful feature,
   but I'm not comfortable I understand it yet.

=================
As usual, review, advise, comments, flames, etc. requested

Joe

multi-insert-r6a.diff.gz
Description: GNU Zip compressed data

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Values list-of-targetlists patch for comments (was Re: [PATCHES] [HACKERS] 8.2 features?)

Reply via email to