Ordered Partitioned Table Scans

David Rowley Thu, 25 Oct 2018 19:51:02 -0700

RANGE partitioning of time-series data is quite a common range to use
partitioning, and such tables tend to grow fairly large.  I thought
since we always store RANGE partitioned tables in the PartitionDesc in
ascending range order that it might be useful to make use of this and
when the required pathkeys match the order of the range, then we could
make use of an Append node instead of uselessly using a MergeAppend,
since the MergeAppend will just exhaust each subplan one at a time, in
order.


It does not seem very hard to implement this and it does not add much
in the way of additional processing to the planner.

Performance wise it seems to give a good boost to getting sorted
results from a partitioned table. I performed a quick test just on my
laptop with:

Setup:
CREATE TABLE partbench (id BIGINT NOT NULL, i1 INT NOT NULL, i2 INT
NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL) PARTITION
BY RANGE (id);
select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench
FOR VALUES FROM (' || (x*100000)::text || ') TO (' ||
((x+1)*100000)::text || ');' from generate_Series(0,299) x;
\gexec
\o
INSERT INTO partbench SELECT x,1,2,3,4,5 from generate_Series(0,29999999) x;
create index on partbench (id);
vacuum analyze;

Test:
select * from partbench order by id limit 1 offset 29999999;

Results Patched:

Time: 4234.807 ms (00:04.235)
Time: 4237.928 ms (00:04.238)
Time: 4241.289 ms (00:04.241)
Time: 4234.030 ms (00:04.234)
Time: 4244.197 ms (00:04.244)
Time: 4266.000 ms (00:04.266)

Unpatched:

Time: 5917.288 ms (00:05.917)
Time: 5937.775 ms (00:05.938)
Time: 5911.146 ms (00:05.911)
Time: 5906.881 ms (00:05.907)
Time: 5918.309 ms (00:05.918)

(about 39% faster)

The implementation is fairly simple. One thing I don't like about is
I'd rather build_partition_pathkeys() performed all the checks to know
if the partition should support a natural pathkey, but as of now, I
have the calling code ensuring that there are no sub-partitioned
tables. These could cause tuples to be output in the wrong order.

Does this idea seem like something we'd want?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

v1-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-f.patch
Description: Binary data

Ordered Partitioned Table Scans

Reply via email to