Hello Gregory,

Gregory Stark wrote:
I've been hacking on the idea of an Append node which maintains the ordering
of its subtables merging their records in order. This is important for
partitioned tables since otherwise a lot of plans are not available such as
merge joins.

Cool!

Some time ago, I've been trying something very similar, but didn't get very far. I'd like to share my thoughts anyway.

First of, I envisioned that append node to be applicable also to unordered reading from multiple partitions, simply interleaving between the partitions.

The logic I've followed is to do as follows:

1) Go through all subrels asking for any interesting pathkey lists. Gather up
   the union of all of these.

I also tried to modify the Append node first, then figured that it might be better to base on the merge join node instead. While this seems farther away, I had the hope that a binary tree of such 'plain merge' nodes would require less comparisons in total.

Plus, I found it a lot simpler to cope with exactly two input relations instead of n, as with the append node. :-)

2) Go back through all the subrels and for each accumulated pathkey list ask
for the best path for that subrel for that pathkey list.
3) Generate an two append paths for each of these pathkey lists, one using the
   best startup_cost subpath and one with the best total_cost subpath
   available for the pathkey list for each child rel. If there is none
   available take the best unordered path and put a sort node above it.

4) Additionally advertise the traditional unordered append node which our
   parent could choose to put a sort node above same as ever.

5) Append plan nodes look a lot like Sort plan nodes glued onto an Append plan
   node, with sort function oids and so on.

6) Append exec state nodes look a lot like a (a few fields from)
   tuplesortstate glued onto an Append node. They have the ScanKey array and
   the fmgr sort functions. They also have an array of TupleTableSlot and a
   heap of indexes into that array.

8) ExecAppend does something similar to tuplesort's bounded sort (I fear I'm
   going to get typecasted) or more to the point, similar to the final merge
   of a merge sort. It directly returns the slot the child rel last returned
   to it.

Uh.. well, yeah. I guess you have a better understanding of the planner and executor that I do.

Open questions:

1) I still haven't completely figured out what to do with equivalence classes.
   My hack of just stuffing all the append subrel vars into there seems to
   work fine. I need to understand what's going on to see if there's really a
   problem with it or not.

2) I'm not sure this code will work when the append rel is a target (ie UPDATE
   and DELETE stmts).

3) It does seem to work when the columns in the subrels don't line up but I
   didn't do anything special to handle this case.

Uh.. is there any use case for that? WRT partitioning certainly not, is there?

I'll have a look at your patch.

Regards

Markus


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to