subject:"\[PERFORM\] Transparent table partitioning in future version of PG\?"

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread Robert Haas

On Thu, May 7, 2009 at 10:52 PM,  da...@lang.hm wrote:
 Hopefully, notions of partitioning won't be directly tied to chunking of
 data for parallel query access. Most queries access recent data and
 hence only a single partition (or stripe), so partitioning and
 parallelism and frequently exactly orthogonal.

 Yes, I think those things are unrelated.

 I'm not so sure (warning, I am relativly inexperianced in this area)

 it sounds like you can take two basic approaches to partition a database

 1. The Isolation Plan
[...]
 2. The Load Balancing Plan

Well, even if the table is not partitioned at all, I don't see that it
should preclude parallel query access.  If I've got a 1 GB table that
needs to be sequentially scanned for rows meeting some restriction
clause, and I have two CPUs and plenty of I/O bandwidth, ISTM it
should be possible to have them each scan half of the table and
combine the results.  Now, this is not easy and there are probably
substantial planner and executor changes required to make it work, but
I don't know that it would be particularly easier if I had two 500 MB
partitions instead of a single 1 GB table.

IOW, I don't think you should need to partition if all you want is
load balancing.  Partitioning should be for isolation, and load
balancing should happen when appropriate, whether there is
partitioning involved or not.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread Scott Carey


On 5/7/09 7:52 PM, da...@lang.hm da...@lang.hm wrote:

 
 
 I believe that the isolation plan is probably more common than the load
 balancing plan, but I don't see them as being that different for the
 database engine point of view. To tune a system that can handle the
 isolation plan for load balancing, the key thing to do would be to have a
 knob to disable the partition planning, and just blindly send the search
 out to every partition.

Lots of good points.  However, implicit in the above is that the process of
identifying which partitions contain the data is expensive.
Right now it is (1.5 sec if 6000 partitions with the most simple possible
constraint (column = CONSTANT).

But identifying which partitions can contain a value is really nothing more
than an index.  If you constrain the possible partitioning functions to
those where a single partition key can only exist in one partition, then
this index and its look up should be very fast even for large partition
counts.  From what I can tell empirically, the current system does this in
more of a sequential scan, running the constraint checks for each
possibility.   
Furthremore, the actual tables don't have to contain the data if the key is
a column identity function (date = X ) rather than a range or hash.

At the core, partitioning is really just a form of 'chunky' indexing that
doesn't fragment, or need re-indexing, or have much MVCC complexity.

 
 David Lang
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread david


On Fri, 8 May 2009, Robert Haas wrote:


On Thu, May 7, 2009 at 10:52 PM,  da...@lang.hm wrote:

Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal.


Yes, I think those things are unrelated.


I'm not so sure (warning, I am relativly inexperianced in this area)

it sounds like you can take two basic approaches to partition a database

1. The Isolation Plan

[...]

2. The Load Balancing Plan


Well, even if the table is not partitioned at all, I don't see that it
should preclude parallel query access.  If I've got a 1 GB table that
needs to be sequentially scanned for rows meeting some restriction
clause, and I have two CPUs and plenty of I/O bandwidth, ISTM it
should be possible to have them each scan half of the table and
combine the results.  Now, this is not easy and there are probably
substantial planner and executor changes required to make it work, but
I don't know that it would be particularly easier if I had two 500 MB
partitions instead of a single 1 GB table.

IOW, I don't think you should need to partition if all you want is
load balancing.  Partitioning should be for isolation, and load
balancing should happen when appropriate, whether there is
partitioning involved or not.


actually, I will contridict myself slightly.

with the Isolation Plan there is not nessasarily a need to run the query 
on each parition in parallel.


  if parallel queries are possible, it will benifit Isolation Plan 
paritioning, but the biggest win with this plan is just reducing the 
number of paritions that need to be queried.


with the Load Balancing Plan there is no benifit in partitioning unless 
you have the ability to run queries on each parition in parallel



using a seperate back-end process to do a query on a seperate partition is 
a fairly straightforward, but not trivial thing to do (there are 
complications in merging the result sets, including the need to be able to 
do part of a query, merge the results, then use those results for the next 
step in the query)


  I would also note that there does not seem to be a huge conceptual 
difference between doing these parallel queries on one computer and 
shipping the queries off to other computers.



however, trying to split the work on a single table runs into all sorts of 
'interesting' issues with things needing to be shared between the multiple 
processes (they both need to use the same indexes, for example)


so I think that it is much easier for the database engine to efficiantly 
search two 500G tables instead of one 1T table.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread Scott Carey


On 5/8/09 11:20 AM, da...@lang.hm da...@lang.hm wrote:
 
 with the Load Balancing Plan there is no benifit in partitioning unless
 you have the ability to run queries on each parition in parallel
 

I think there is a benefit to partitioning in this case.  If the statistics
on other columns are highly skewed WRT the column(s) partitioned, the
planner statistics will be better.  It may have to access every partition,
but it doesn't have to access every partition in the same way.

Perhaps something like:  user_id = 'FOO' is one of the most common vals in
date partition A, and one of the  least common vals in B, so a where clause
with user_id = 'FOO' will sequential scan one and index scan another.

For really large tables with data correlation that varies significantly,
this can be a huge performance gain even if all partitions are accessed.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread Robert Haas

 IOW, I don't think you should need to partition if all you want is
 load balancing.  Partitioning should be for isolation, and load
 balancing should happen when appropriate, whether there is
 partitioning involved or not.

 actually, I will contridict myself slightly.

[...]
 however, trying to split the work on a single table runs into all sorts of
 'interesting' issues with things needing to be shared between the multiple
 processes (they both need to use the same indexes, for example)

I disagree with this part of your email.  It is already the case that
tables and indexes need to support concurrent access by multiple
Postgres processes.  I don't see why that part of the problem would be
any more difficult for parallel query execution than it would be for
executing two different and unrelated queries on the same table.

 so I think that it is much easier for the database engine to efficiantly
 search two 500G tables instead of one 1T table.

And that leads me to the opposite conclusion on this point.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 so I think that it is much easier for the database engine to efficiantly
 search two 500G tables instead of one 1T table.

 And that leads me to the opposite conclusion on this point.

I don't think there would be any difference on that score, either.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-08 Thread Craig Ringer

Robert Haas wrote:

 Well, even if the table is not partitioned at all, I don't see that it
 should preclude parallel query access.  If I've got a 1 GB table that
 needs to be sequentially scanned for rows meeting some restriction
 clause, and I have two CPUs and plenty of I/O bandwidth, ISTM it
 should be possible to have them each scan half of the table and
 combine the results.  Now, this is not easy and there are probably
 substantial planner and executor changes required to make it work, but
 I don't know that it would be particularly easier if I had two 500 MB
 partitions instead of a single 1 GB table.

The point of partitioning in this scenario is primarily that you can put
the different partitions in different tablespaces, most likely on
independent disk devices. You therefore get more I/O bandwidth.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Simon Riggs


On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
 Tom Lane wrote:
  Alvaro Herrera alvhe...@commandprompt.com writes:
  I think there should be a way to refer to individual partitions as
  objects.
  
  Yeah, the individual partitions should be nameable tables, otherwise we
  will be reinventing a *whole* lot of management stuff to little gain.
  I don't actually think there is anything wrong with using table
  inheritance as the basic infrastructure --- I just want more smarts
  about one particular use pattern of inheritance.
 
 Maybe it's worth examining and documenting existing partition setups,
 the reasoning behind them, and how they're implemented, in order to
 guide any future plans for native partitioning support?
 
 Maybe that's already been/being done. On the off chance that it's not:
 
 Ones I can think of:
 
 - Partitioning an equally active dataset by ranges over a key to improve
  scan performance, INSERT/UPDATE costs on indexes, locking issues, etc.
 
 - The classic active/archive partition scheme where there's only one
 partition growing at any one time, and the others are historical data
 that's nowhere near as hot.
 
 - A variant on the basic active/archive structure, where query activity
 decreases slowly over time and there are many partitions of recent data.
 Partitions are merged into larger ones as they age, somewhat like a RRD
 database.
 
 I also expect that in the future there will be demand for striping data
 across multiple partitions in different tablespaces to exploit
 in-parallel scanning (when/if supported) for better I/O utilization in
 multiple-disk-array situations. For example, partitioning on
 MOD(id,10) across 10 separate volumes, and firing off 10 concurrent
 scans, one per partition, to satisfy a query.

That's a good summary. It has already been documented and discussed, but
saying it again and again is the best way to get this across.

You've highlighted that partitioning is a feature with many underlying
requirements: infrequent access to data (frequently historical),
striping for parallelism and getting around RDBMS flaws (if any). We
must be careful to implement each requirement in full, yet separately,
so we don't end up with 60% functionality in each case by delivering an
average or least common denominator solution.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Scott Carey




On 5/7/09 1:54 AM, Simon Riggs si...@2ndquadrant.com wrote:

 
 
 On Thu, 2009-05-07 at 10:56 +0800, Craig Ringer wrote:
 Tom Lane wrote:
 
 I also expect that in the future there will be demand for striping data
 across multiple partitions in different tablespaces to exploit
 in-parallel scanning (when/if supported) for better I/O utilization in
 multiple-disk-array situations. For example, partitioning on
 MOD(id,10) across 10 separate volumes, and firing off 10 concurrent
 scans, one per partition, to satisfy a query.
 
 That's a good summary. It has already been documented and discussed, but
 saying it again and again is the best way to get this across.
 
 You've highlighted that partitioning is a feature with many underlying
 requirements: infrequent access to data (frequently historical),

Actually, infrequent access is not a requirement.  It is a common
requirement however.

Take for instance, a very large set of data that contains an integer column
'type_id' that has about 200 distinct values.  The data is accessed with a
strict 'type_id = X' requirement 99.9% of the time.  If this was one large
table, then scans of all sorts become much more expensive than if it is
partitioned on 'type_id'.  Furthermore, partitioning on type_id removes the
requirement to even index on this value.  Statistics on each partition may
vary significantly, and the plannner can thus adapt to changes in the data
per value of type_id naturally.

The raw need is not infrequent access but highly partitioned access.  It
doesn't matter if your date-partitioned data is accessed evenly across all
dates or skewed to the most frequent -- it matters that you are almost
always accessing by small date ranges.

 striping for parallelism and getting around RDBMS flaws (if any). We
 must be careful to implement each requirement in full, yet separately,
 so we don't end up with 60% functionality in each case by delivering an
 average or least common denominator solution.
 
 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support
 
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread Robert Haas

On Wed, May 6, 2009 at 6:08 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Agreed. Perhaps I should say then that the syntax needs to express the
 requirements of the planner/executor behaviour, rather than being the
 main aspect of the feature, as some have suggested.

Agreed.

 Hopefully, notions of partitioning won't be directly tied to chunking of
 data for parallel query access. Most queries access recent data and
 hence only a single partition (or stripe), so partitioning and
 parallelism and frequently exactly orthogonal.

Yes, I think those things are unrelated.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-07 Thread david


On Thu, 7 May 2009, Robert Haas wrote:


On Wed, May 6, 2009 at 6:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

Agreed. Perhaps I should say then that the syntax needs to express the
requirements of the planner/executor behaviour, rather than being the
main aspect of the feature, as some have suggested.


Agreed.


Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal.


Yes, I think those things are unrelated.


I'm not so sure (warning, I am relativly inexperianced in this area)

it sounds like you can take two basic approaches to partition a database

1. The Isolation Plan

   you want to have it so that your queries match your partitioning.

  this is with the goal of only having to query a small number of 
paritions, minimizing the total amount of data touched (including 
minimumizing the number of indexes searched)


 this matches the use case mentioned above, with the partition based on 
date and only looking at the most recent date range.


2. The Load Balancing Plan

  you want to have your partitioning and your queries _not_ match as much 
as possible


  this is with the goal of having the query hit as many partitions as 
possible, so that the different parts of the search can happen in parallel



However, with either partitioning plan, you will have queries that 
degenerate to look like the other plan.


In the case of the isolation plan, you may need to search for all 
instances of a rare thing over the entire history (after all, if you never 
need to access that history, why do you pay for disks to store it? ;-)


and even when you are searching a narrow time window, it may still span 
multiple partitions. I have a log analysis setup using the Splunk 
prioriatary database, it paritions by time, creating a new parition as the 
current one hits a configurable size (by default 10G on 64 bit systems). 
for my volume of logs I end up with each parition only covering a few 
hours. it's very common to want to search over a few days, which can be a 
few dozen partitions (this is out of many hundreds of partitions, so it's 
still a _huge_ win to narrow the timeframe)



In the case of the load balancing plan, you may run into a query that 
happens to only fall into one partition (the query matches your 
paritioning logic)





I think the only real difference is how common it is to need to search 
multiple partitions.


If the expectation is that you will frequently need to search most/all of 
the partitions (the load balancing plan), then it's a waste of time to 
analyse the query to try and figure out which paritions you need to look 
at.


If the expectation is that you will frequently only need to search a small 
number of the partitions (the isolation plan), then it's extremely valuble 
to spend as much time as needed working to analyse the query to try and 
figure out which partitions you need to look at.



I believe that the isolation plan is probably more common than the load 
balancing plan, but I don't see them as being that different for the 
database engine point of view. To tune a system that can handle the 
isolation plan for load balancing, the key thing to do would be to have a 
knob to disable the partition planning, and just blindly send the search 
out to every partition.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs


On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:

 The problem has been finding someone who has both the time and the
 ability to do the work.

Unfortunately there has been significant debate over which parts of
partitioning need to be improved. My own view is that considerable
attention needs to be applied to both the executor and planner to
improve matters and that syntax improvements are largely irrelevant,
though seductive.

Deep improvements will require significant analysis, agreement, effort
and skill. What we have now took approximately 20 days to implement,
with later patches adding about another 10-20 days work. I'd estimate
the required work as 60-100 days work from primary author, plus planning
and discussion time. YMMV.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:
 The problem has been finding someone who has both the time and the
 ability to do the work.

 Unfortunately there has been significant debate over which parts of
 partitioning need to be improved. My own view is that considerable
 attention needs to be applied to both the executor and planner to
 improve matters and that syntax improvements are largely irrelevant,
 though seductive.

My thought about it is that what we really need is an explicit notion
of partitioned tables built into the system, instead of trying to make
the planner re-deduce the partitioning behavior from first principles
every time it builds a plan for such a table.  Such a notion would
presumably involve some new syntax to allow the partitioning rule to be
specified at table creation time.  I agree that the syntax details are a
minor issue, but the set of possible partitioning rules is certainly a
topic of great interest.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Simon Riggs


On Wed, 2009-05-06 at 17:55 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2009-05-01 at 11:27 -0400, Robert Haas wrote:
  The problem has been finding someone who has both the time and the
  ability to do the work.
 
  Unfortunately there has been significant debate over which parts of
  partitioning need to be improved. My own view is that considerable
  attention needs to be applied to both the executor and planner to
  improve matters and that syntax improvements are largely irrelevant,
  though seductive.
 
 My thought about it is that what we really need is an explicit notion
 of partitioned tables built into the system, instead of trying to make
 the planner re-deduce the partitioning behavior from first principles
 every time it builds a plan for such a table.  Such a notion would
 presumably involve some new syntax to allow the partitioning rule to be
 specified at table creation time.  I agree that the syntax details are a
 minor issue, but the set of possible partitioning rules is certainly a
 topic of great interest.

Agreed. Perhaps I should say then that the syntax needs to express the
requirements of the planner/executor behaviour, rather than being the
main aspect of the feature, as some have suggested.

Hopefully, notions of partitioning won't be directly tied to chunking of
data for parallel query access. Most queries access recent data and
hence only a single partition (or stripe), so partitioning and
parallelism and frequently exactly orthogonal. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 I think there should be a way to refer to individual partitions as
 objects.

Yeah, the individual partitions should be nameable tables, otherwise we
will be reinventing a *whole* lot of management stuff to little gain.
I don't actually think there is anything wrong with using table
inheritance as the basic infrastructure --- I just want more smarts
about one particular use pattern of inheritance.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-06 Thread Alvaro Herrera

Simon Riggs escribió:


 Hopefully, notions of partitioning won't be directly tied to chunking of
 data for parallel query access. Most queries access recent data and
 hence only a single partition (or stripe), so partitioning and
 parallelism and frequently exactly orthogonal. 

I think there should be a way to refer to individual partitions as
objects.  That way we could execute some commands to enable certain
optimizations, for example mark this partition read only which would
mean it could be marked as not needing vacuum.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Transparent table partitioning in future version of PG?

2009-05-01 Thread henk de wit


Hi,
I was looking at the support that PostgreSQL offers for table partitioning at 
http://www.postgresql.org/docs/8.4/static/ddl-partitioning.html. The concept 
looks promising, but its maybe fair to say that PG itself doesn't really 
supports partitioning natively, but one can simulate it using some of the 
existing PG features (namely inheritance, triggers, rules and constraint 
exclusion). This simulating does seem to work, but there are some disadvantages 
and caveats. 
A major disadvantage is obviously that you need to set up and maintain the 
whole structure yourself (which is somewhat dangerous, or at least involves a 
lot of maintenance overhead). Next to that, it seemingly becomes hard to do 
simple queries likes 'select * from foo where bar 1000 and bar  5000', in 
case the answer to this query spans multiple partitions. constraint exclusion 
works to some degree, but the document I referred to above tells me I can no 
longer use prepared statements then.
I wonder if there are any plans to incorporate 'native' or 'transparent' 
partitioning in some future version of PG? With this I mean that I would 
basically be able to say something like (pseudo): alter table foo partition on 
bar range 100, and PG would then simply start doing internally what we now 
have to do manually.
Is something like this on the radar or is it just wishful thinking of me?
Kind regards


_
What can you do with the new Windows Live? Find out
http://www.microsoft.com/windows/windowslive/default.aspx

Re: [PERFORM] Transparent table partitioning in future version of PG?

2009-05-01 Thread Scott Carey

On 5/1/09 7:32 AM, henk de wit henk53...@hotmail.com wrote:

 Hi,
 
 I was looking at the support that PostgreSQL offers for table partitioning
 at http://www.postgresql.org/docs/8.4/static/ddl-partitioning.html. The
 concept looks promising, but its maybe fair to say that PG itself doesn't
 really supports partitioning natively, but one can simulate it using some of
 the existing PG features (namely inheritance, triggers, rules and constraint
 exclusion). This simulating does seem to work, but there are some
 disadvantages and caveats. 
 
 A major disadvantage is obviously that you need to set up and maintain the
 whole structure yourself (which is somewhat dangerous, or at least involves a
 lot of maintenance overhead). Next to that, it seemingly becomes hard to do
 simple queries likes 'select * from foo where bar 1000 and bar  5000', in
 case the answer to this query spans multiple partitions. constraint exclusion
 works to some degree, but the document I referred to above tells me I can no
 longer use prepared statements then.

More caveats:  

Query plans go bad pretty quickly because the planner doesn't aggregate
statistics correctly when scanning more than one table.

Constraint exclusion code is completely and utterly broken if the table
count gets large on DELETE or UPDATE queries -- I can get the query planner
/ constraint exclusion stuff to eat up 7GB of RAM trying to figure out what
table to access when the number of partitions ~=6000.
The same thing in select form  doesn't consume that memory but still takes
over a second. 

This is not a bug.
http://www.nabble.com/8.3.5:-Query-Planner-takes-15%2B-seconds-to-plan-Updat
e-or-Delete-queries-on-partitioned-tables.-td21992054.html

Its pretty much faster to do merge joins or hash joins client side on
multiple tables -- basically doing partitioning client side -- after a point
and for any more complicated aggregation or join.

There is a lot of talk about overly complicated partitioning or
auto-partitioning, but two much more simple things would go a long way to
making this fairly workable:

Make stat aggregation across tables better -- use weighted average for
estimating row width, aggregate distinct counts and correlations better.
Right now it mostly assumes the worst possible case and can end up with very
unoptimal plans.

Make a special case for unique child inheritance constraints that can be
checked much faster -- nobody wants to partition and have overlapping
constraint regions.  And whatever is going on for it on the update / delete
side that causes it to take so much longer and use so much more memory for
what should be the same constraint exclusion check as a select needs to be
attended to.

There would still be manual work for managing creating partitions, but at
this point, that is the _least_ of the problems.


 
 I wonder if there are any plans to incorporate 'native' or 'transparent'
 partitioning in some future version of PG? With this I mean that I would
 basically be able to say something like (pseudo): alter table foo partition
 on bar range 100, and PG would then simply start doing internally what we now
 have to do manually.
 
 Is something like this on the radar or is it just wishful thinking of me?
 
 Kind regards
 
 
 
 
 What can you do with the new Windows Live? Find out
 http://www.microsoft.com/windows/windowslive/default.aspx 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

[PERFORM] Transparent table partitioning in future version of PG?

Re: [PERFORM] Transparent table partitioning in future version of PG?

18 matches

Site Navigation

Mail list logo

Footer information