Re: [HACKERS] On Scalability

Greg Smith Fri, 30 Jul 2010 13:38:51 -0700

Vincenzo Romano wrote:

By using PREPARE I run the query planned sooner and I should use
the plan with the later execution.
You can bet that some of the PREPAREd query variables will
pertain to either the child table's CHECK contraints (for table partitions)
or to the partial index's WHERE condition (for index partitioning).

Prepared statements are not necessarily a cure for long query planningtime, because the sort of planning decisions made with partitioned childtables and index selection can need to know the parameter values toexecute well; that's usually the situation rather than the exceptionwith partitions. You run the risk that the generic prepared plan willend up looking at all the partitions, because at preparation plan timeit can't figure out which can be excluded. Can only figure that outonce they're in there for some types of queries.

I think you aren't quite lined up with the people suggesting "test it"in terms of what that means. The idea is not that you should build afull on application test case yet, which can be very expensive. Theidea is that you might explore things like "when I partition this wayincreasing the partitions from 1 to n, does query time go up linearly?"by measuring with fake data and a machine-generated schema. What'shappened in some of these cases is that, despite the theoretical, someconstant or external overhead ends up dominating behavior for lowernumbers. As an example, it was recognized that the amount of statisticsfor a table collected with default_statistics_target had a quadraticimpact on some aspects of performance. But it turned out that for therange of interesting values to most people, the measured runtime did notgo up with the square as feared. Only way that was sorted out was tobuild a simple simulation.

Here's a full example from that discussion that shows the sort of testsyou probably want to try, and comments on the perils of guessing basedon theory rather than testing:


http://archives.postgresql.org/pgsql-hackers/2008-12/msg00601.php
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00687.php

generate_series can be very helpful here, and you can even use that togenerate timestamps if you need them in the data set.

That said, anecdotally everyone agrees that partitions don't scale wellinto even the very low hundreds for most people, and doing multi-levelones won't necessarily normally drop query planning time--just the costof maintaining the underlying tables and indexes. My opinion is thatbuilding a simple partitioned case and watching how the EXPLAIN planschange as you adjust things will be more instructive for you than eitherasking about it or reading the source. Vary the parameters, watch theplans, measure things and graph them if you want to visualize thebehavior better. Same thing goes for large numbers of partial indexes,which have a similar query planning impact, but unlike partitions Ihaven't seen anyone analyze them via benchmarks. I'm sure you could gethelp here (probably the performance list is a better spot though) withgetting your test case right if you wanted to try and nail that down.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On Scalability

Reply via email to