Re: [PERFORM] partitioned table performance

2006-10-30 Thread Simon Riggs
On Mon, 2006-10-30 at 22:58 +0100, Andreas Kostyrka wrote:
> Am Montag, den 30.10.2006, 08:18 + schrieb Simon Riggs:
> > On Sun, 2006-10-29 at 00:28 +0200, Andreas Kostyrka wrote:
> > 
> > > Any ideas how to make the optimizer handle partitioned tables more
> > > sensible? 
> > 
> > Yes, those are known inefficiencies in the current implementation which
> > we expect to address for 8.3.
> 
> Any ideas to force the current optimizer to do something sensible?

Brute force & ignorance: PL/pgSQL

Perhaps some other domain knowledge might help you shorten the search?

Thats all for now. It's not a minor fixup and nobody had time to fix
that for 8.2 since other fish were bigger.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] partitioned table performance

2006-10-30 Thread Andreas Kostyrka
Am Montag, den 30.10.2006, 08:18 + schrieb Simon Riggs:
> On Sun, 2006-10-29 at 00:28 +0200, Andreas Kostyrka wrote:
> 
> > Any ideas how to make the optimizer handle partitioned tables more
> > sensible? 
> 
> Yes, those are known inefficiencies in the current implementation which
> we expect to address for 8.3.

Any ideas to force the current optimizer to do something sensible?

Andreas

> 


signature.asc
Description: Dies ist ein digital signierter Nachrichtenteil


Re: [PERFORM] partitioned table performance

2006-10-30 Thread Simon Riggs
On Sun, 2006-10-29 at 00:28 +0200, Andreas Kostyrka wrote:

> Any ideas how to make the optimizer handle partitioned tables more
> sensible? 

Yes, those are known inefficiencies in the current implementation which
we expect to address for 8.3.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] partitioned table performance

2006-10-28 Thread Andreas Kostyrka
Hi!

I'm just wondering, I've got a table that is partitioned into monthly
tables:

media_downloads -> media_downloads_MM
 I\- id (primary key)
  \- created_on (timestamp criteria for the monthly table split)

There are constraints upon the created_on column, all needed insert
instead rules are defined too.
One additional hardship is that id are not monotone against created_on,
id1 < id2 does not imply created_on1 <= created_on2 :(
The table contains basically almost 100M rows, and the number is
growing. (the table will be about a 12GB pg_dump.)
All relevant indexes (primary key id, index on created_on) are defined
too.

The good thing is, queries like all rows in the last 7 days work
reasonable fast, the optimizer just checks the 1-2 last month tables.

Using postgres 8.1.4-0ubuntu1, I've got to implement the following
queries in a reasonable fast way:

-- sequential reading of rows
SELECT * FROM media_downloads WHERE id > 100 ORDER BY id LIMIT 100;

Against the same monolithic table with about 16.5M rows, I'm getting a
cost of 20.6 pages. (Index scan)

Against the partitioned tables, I'm getting a cost of 5406822 pages.
Now I understand, that without any additional conditions, postgresql
needs to do the query for all subtables first, but explain against the
subtables show costs of 4-5 pages.
events=#  explain select * from media_downloads where id >9000 order
by id limit 100;

QUERY
PLAN   
---
 Limit  (cost=5406822.39..5406822.64 rows=100 width=1764)
   ->  Sort  (cost=5406822.39..5413639.50 rows=2726843 width=1764)
 Sort Key: public.media_downloads.id
 ->  Result  (cost=0.00..115960.71 rows=2726843 width=1764)
   ->  Append  (cost=0.00..115960.71 rows=2726843
width=1764)
 ->  Seq Scan on media_downloads  (cost=0.00..10.50
rows=13 width=1764)
   Filter: (id > 9000)
 ->  Index Scan using media_downloads_200510_pkey on
media_downloads_200510 media_downloads  (cost=0.00..3.75 rows=14
width=243)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200511_pkey on
media_downloads_200511 media_downloads  (cost=0.00..72.19 rows=172
width=239)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200512_pkey on
media_downloads_200512 media_downloads  (cost=0.00..603.64 rows=172
width=240)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200601_pkey on
media_downloads_200601 media_downloads  (cost=0.00..19.33 rows=232
width=239)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200602_pkey on
media_downloads_200602 media_downloads  (cost=0.00..56.82 rows=316
width=240)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200603_pkey on
media_downloads_200603 media_downloads  (cost=0.00..18.88 rows=270
width=243)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200604_pkey on
media_downloads_200604 media_downloads  (cost=0.00..1194.16 rows=939
width=298)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200605_pkey on
media_downloads_200605 media_downloads  (cost=0.00..79.28 rows=672
width=326)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200606_pkey on
media_downloads_200606 media_downloads  (cost=0.00..75.26 rows=1190
width=314)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200607_pkey on
media_downloads_200607 media_downloads  (cost=0.00..55.29 rows=1238
width=319)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200608_pkey on
media_downloads_200608 media_downloads  (cost=0.00..73.95 rows=1305
width=319)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200609_pkey on
media_downloads_200609 media_downloads  (cost=0.00..144.10 rows=1575
width=324)
   Index Cond: (id > 9000)
 ->  Index Scan using media_downloads_200610_pkey on
media_downloads_200610 media_downloads  (cost=0.00..113532.57
rows=2718709 width=337)
   Index Cond: (id > 9000)
 ->  Seq Scan on media_downloads_200611
media_downloads  (cost=0.00..10.50 rows=13 width=1764)
   Filter: (id > 9000)
 ->  S

Re: [PERFORM] Partitioned table performance

2004-12-21 Thread Stacy White
The discussion seems to have diverged a little, so I don't feel too bad
about making some semi-off-topic comments.

From: "Greg Stark" <[EMAIL PROTECTED]>
> Like I said though, we found "global indexes" defeated the whole purpose.

First semi-off-topic comment: I think this depends on the index, the data,
and the goal of the partitioning.  We use partitioning on one of our Oracle
projects for performance rather than managability.  In this case, a global
index on a non-partitioned field can be helpful.

Imagine an 'orders' table with 100 partitions on week.  Products have a
short life cycle, and are typically around for only a week or two.  A query
like 'SELECT * FROM orders WHERE product_no = ?' forces a lookup on 100
different local indexes, but only one global index.


Second-semi-off-topic comment: Josh mentioned that Oracle's partitioning
introduces it's own overhead, so I re-ran my earlier benchmarks on one of
our Oracle machines.

I believe Oracle's licensing agreement prohibits me from posting any
benchmarks, so all I'll say is that Postgres' inheritance partitioning
implementation is _very_ low overhead, and even union views are competitive.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Partitioned table performance

2004-12-21 Thread Jim C. Nasby
On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote:
> Greg,
> 
> > Well Oracle has lots of partitioning intelligence pushed up to the planner
> > to avoid overhead.
> >
> > If you have a query with something like "WHERE date = '2004-01-01'" and
> > date is your partition key (even if it's a range) then Oracle will figure
> > out which partition it will need at planning time.
> 
> Hmmm ... well, we're looking at making a spec for Postgres Table 
> Partitioning.   
> Maybe you could help?

This is something I've been thinking about doing for
http://stats.distributed.net; is there a formal project for this
somewhere?

On a different note, has anyone looked at the savings you get by
ommitting the partition field from the child tables? ISTM that the
savings would be substantial for narrow tables. Of course that most
likely means doing a union view instead of inheritence, but I'm guessing
here. The table I'm thinking of partitioning is quite narrow (see
below), so I suspect that dropping project_id out would result in a
substantial savings (there's basically nothing that ever queries across
the whole table). With the data distribution, I suspect just breaking
project ID's 205, 5, and 25 into partitioned tables that didn't contain
project_id would save about 450M (4bytes * 95% * 130M).

(the table has ~130M rows)

   Table "public.email_contrib"
   Column   |  Type   | Modifiers 
+-+---
 project_id | integer | not null
 id | integer | not null
 date   | date| not null
 team_id| integer | 
 work_units | bigint  | not null
Indexes:
"email_contrib_pkey" primary key, btree (project_id, id, date)
"email_contrib__pk24" btree (id, date) WHERE (project_id = 24)
"email_contrib__pk25" btree (id, date) WHERE (project_id = 25)
"email_contrib__pk8" btree (id, date) WHERE (project_id = 8)
"email_contrib__project_date" btree (project_id, date)
Foreign-key constraints:
"fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON 
UPDATE CASCADE
"fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES 
stats_team(team) ON UPDATE CASCADE

stats=# select * from pg_stats where tablename='email_contrib' and
attname='project_id';
 schemaname |   tablename   |  attname   | null_frac | avg_width | n_distinct | 
most_common_vals  |most_common_freqs | histogram_bounds | 
correlation 
 
+---++---+---++---+-+--+-
  public | email_contrib | project_id | 0 | 4 | 6 | 
{205,5,25,8,24,3} | {0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667} | | 
   0.703936
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioned table performance

2004-12-21 Thread Jim C. Nasby
Sorry for the late reply, so I included the whole thread. Should this be
a TODO?

On Wed, Dec 15, 2004 at 08:30:08PM -0500, Tom Lane wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > But I'm a bit puzzled. Why would Append have any significant cost? It's just
> > taking the tuples from one plan node and returning them until they run out,
> > then taking the tuples from another plan node. It should have no i/o cost 
> > and
> > hardly any cpu cost. Where is the time going?
> 
> As best I can tell by profiling, the cost of the Append node per se is
> indeed negligible --- no more than a couple percent of the runtime in
> CVS tip for a test case similar to Stacy White's example.
> 
> It looks bad in EXPLAIN ANALYZE, but you have to realize that passing
> the tuples up through the Append node doubles the instrumentation
> overhead of EXPLAIN ANALYZE, which is pretty sizable already.  (If you
> turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE,
> the actual elapsed time is about double, at least for me.)
> 
> The other effect, which I hadn't expected, is that the seqscans
> themselves actually slow down.  I get
> 
> regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ;
>QUERY PLAN
> 
>  Aggregate  (cost=16414.32..16414.32 rows=1 width=4) (actual 
> time=32313.980..32313.988 rows=1 loops=1)
>->  Append  (cost=0.00..13631.54 rows=556555 width=4) (actual 
> time=0.232..21848.401 rows=524289 loops=1)
>  ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4) (actual 
> time=0.020..0.020 rows=0 loops=1)
>  ->  Seq Scan on sub_foo1 super_foo  (cost=0.00..6815.77 rows=278277 
> width=4) (actual time=0.187..6926.395 rows=262144 loops=1)
>  ->  Seq Scan on sub_foo2 super_foo  (cost=0.00..6815.77 rows=278277 
> width=4) (actual time=0.168..7026.953 rows=262145 loops=1)
>  Total runtime: 32314.993 ms
> (6 rows)
> 
> regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1;
>QUERY PLAN
> 
>  Aggregate  (cost=8207.16..8207.16 rows=1 width=4) (actual 
> time=9850.420..9850.428 rows=1 loops=1)
>->  Seq Scan on sub_foo1  (cost=0.00..6815.77 rows=278277 width=4) (actual 
> time=0.202..4642.401 rows=262144 loops=1)
>  Total runtime: 9851.423 ms
> (3 rows)
> 
> Notice the actual times for the sub_foo1 seqscans.  That increase (when
> counted for both input tables) almost exactly accounts for the
> difference in non-EXPLAIN ANALYZE runtime.
> 
> After digging around, I find that the reason for the difference is that
> the optimization to avoid a projection step (ExecProject) isn't applied
> for scans of inheritance unions:
> 
>   /*
>* Can't do it with inheritance cases either (mainly because Append
>* doesn't project).
>*/
>   if (rel->reloptkind != RELOPT_BASEREL)
>   return false;
> 
> So if you were to try the example in a pre-7.4 PG, which didn't have
> that optimization, you'd probably find that the speeds were just about
> the same.  (I'm too lazy to verify this though.)
> 
> I looked briefly at what it would take to cover this case, and decided
> that it's a nontrivial change, so it's too late to do something about it
> for 8.0.  I think it's probably possible to fix it though, at least for
> cases where the child tables have rowtypes identical to the parent.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
> 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> But I'm a bit puzzled. Why would Append have any significant cost? It's just
> taking the tuples from one plan node and returning them until they run out,
> then taking the tuples from another plan node. It should have no i/o cost and
> hardly any cpu cost. Where is the time going?

As best I can tell by profiling, the cost of the Append node per se is
indeed negligible --- no more than a couple percent of the runtime in
CVS tip for a test case similar to Stacy White's example.

It looks bad in EXPLAIN ANALYZE, but you have to realize that passing
the tuples up through the Append node doubles the instrumentation
overhead of EXPLAIN ANALYZE, which is pretty sizable already.  (If you
turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE,
the actual elapsed time is about double, at least for me.)

The other effect, which I hadn't expected, is that the seqscans
themselves actually slow down.  I get

regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ;
   QUERY PLAN

 Aggregate  (cost=16414.32..16414.32 rows=1 width=4) (actual 
time=32313.980..32313.988 rows=1 loops=1)
   ->  Append  (cost=0.00..13631.54 rows=556555 width=4) (actual 
time=0.232..21848.401 rows=524289 loops=1)
 ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4) (actual 
time=0.020..0.020 rows=0 loops=1)
 ->  Seq Scan on sub_foo1 super_foo  (cost=0.00..6815.77 rows=278277 
width=4) (actual time=0.187..6926.395 rows=262144 loops=1)
 ->  Seq Scan on sub_foo2 super_foo  (cost=0.00..6815.77 rows=278277 
width=4) (actual time=0.168..7026.953 rows=262145 loops=1)
 Total runtime: 32314.993 ms
(6 rows)

regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1;
   QUERY PLAN

 Aggregate  (cost=8207.16..8207.16 rows=1 width=4) (actual 
time=9850.420..9850.428 rows=1 loops=1)
   ->  Seq Scan on sub_foo1  (cost=0.00..6815.77 rows=278277 width=4) (actual 
time=0.202..4642.401 rows=262144 loops=1)
 Total runtime: 9851.423 ms
(3 rows)

Notice the actual times for the sub_foo1 seqscans.  That increase (when
counted for both input tables) almost exactly accounts for the
difference in non-EXPLAIN ANALYZE runtime.

After digging around, I find that the reason for the difference is that
the optimization to avoid a projection step (ExecProject) isn't applied
for scans of inheritance unions:

/*
 * Can't do it with inheritance cases either (mainly because Append
 * doesn't project).
 */
if (rel->reloptkind != RELOPT_BASEREL)
return false;

So if you were to try the example in a pre-7.4 PG, which didn't have
that optimization, you'd probably find that the speeds were just about
the same.  (I'm too lazy to verify this though.)

I looked briefly at what it would take to cover this case, and decided
that it's a nontrivial change, so it's too late to do something about it
for 8.0.  I think it's probably possible to fix it though, at least for
cases where the child tables have rowtypes identical to the parent.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Greg Stark

Josh Berkus <[EMAIL PROTECTED]> writes:

> > But I'm a bit puzzled. Why would Append have any significant cost? It's
> > just taking the tuples from one plan node and returning them until they run
> > out, then taking the tuples from another plan node. It should have no i/o
> > cost and hardly any cpu cost. Where is the time going?
> 
> Beats me.  Tom?
> 
> > In my experience "global indexes" defeat the whole purpose of having the
> > partitions. They make dropping and adding partitions expensive which was
> > always the reason we wanted to partition something anyways.
> 
> Hmmm.Possibly, I was just thinking about the cost to partitioned tables 
> when you do a selection *not* on the partitioned axis.   Also that currently 
> we can't enforce UNIQUE constraints across partitions.

Like I said though, we found "global indexes" defeated the whole purpose. That
meant no global UNIQUE constraints for us when we went to partitioned tables.
It gave the DBAs the willies but it really wasn't a big deal.

You can still do unique local indexes on a specific partition. So as long as
your partition key is in the primary key you can have a trustworthy primary
key.

And even if not, you usually find you're only loading data into only one
partition. In most applications it's pretty hard to get a record from two
different partitions with conflicting IDs and not hard to check for. You could
easily put a constraint saying that all PO numbers in the new fiscal year have
to be greater than the last PO number from last year, for example.

> But maybe reducing the cost of Append is the answer to this.

The problem with global indexes is that adding or removing an entire partition
becomes a large job. [Actually with Postgres MVCC I suppose removing might
not. But cleaning up would eventually be a large job, and the point remains
for adding a partition.]

Ideally adding and removing a partition should be a O(1) operation. No data
modification at all, purely catalog changes.

> > It is handy having a higher level interface to deal with partitioned
> > tables. You can create a single "local" or "segmented" index and not have
> > to manually deal with all the partitions as separate tables. But that's
> > just syntactic sugar.
> 
> Right, and the easy part.

I think the hard part lies in the optimizer actually. The semantics of the
operations to manipulate partitions might be tricky to get right but the
coding should be straightforward. Having the optimizer be able to recognize
when it can prune partitions will be a lot of work.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Greg Stark
Josh Berkus <[EMAIL PROTECTED]> writes:

> Stacy,
> 
> > Thanks again for the reply.  So it sounds like the answer to my original
> > question is that it's expected that the pseudo-partitioning would introduce
> > a fairly significant amount of overhead.  Correct?
> 
> Correct.   For that matter, Oracle table partitioning introduces significant 
> overhead, from what I've seen.  I don't think there's a way not to.

Well Oracle has lots of partitioning intelligence pushed up to the planner to
avoid overhead.

If you have a query with something like "WHERE date = '2004-01-01'" and date
is your partition key (even if it's a range) then Oracle will figure out which
partition it will need at planning time.

Even if your query is something like "WHERE date = ?" then Oracle will still
recognize that it will only need a single partition at planning time, though
it has to decide which partition at execution time.

We didn't notice any run-time performance degradation when we went to
partitioned tables. Maybe we were so blinded by the joy they brought us on the
maintenance side though. I don't think we specifically checked for run-time
consequences.

But I'm a bit puzzled. Why would Append have any significant cost? It's just
taking the tuples from one plan node and returning them until they run out,
then taking the tuples from another plan node. It should have no i/o cost and
hardly any cpu cost. Where is the time going?

> What would improve the situation significantly, and the utility of 
> pseudo-partitioning, is the ability to have a single index span multiple 
> partitions.   This would allow you to have a segmented index for the 
> partitioned axis, yet still use an unsegmented index for the other columns.  
> However, there's a *lot* of work to do to make that happen.

In my experience "global indexes" defeat the whole purpose of having the
partitions. They make dropping and adding partitions expensive which was
always the reason we wanted to partition something anyways.

It is handy having a higher level interface to deal with partitioned tables.
You can create a single "local" or "segmented" index and not have to manually
deal with all the partitions as separate tables. But that's just syntactic
sugar.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Josh Berkus
Greg,

> Well Oracle has lots of partitioning intelligence pushed up to the planner
> to avoid overhead.
>
> If you have a query with something like "WHERE date = '2004-01-01'" and
> date is your partition key (even if it's a range) then Oracle will figure
> out which partition it will need at planning time.

Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning.  
 
Maybe you could help?

> But I'm a bit puzzled. Why would Append have any significant cost? It's
> just taking the tuples from one plan node and returning them until they run
> out, then taking the tuples from another plan node. It should have no i/o
> cost and hardly any cpu cost. Where is the time going?

Beats me.  Tom?

> In my experience "global indexes" defeat the whole purpose of having the
> partitions. They make dropping and adding partitions expensive which was
> always the reason we wanted to partition something anyways.

Hmmm.Possibly, I was just thinking about the cost to partitioned tables 
when you do a selection *not* on the partitioned axis.   Also that currently 
we can't enforce UNIQUE constraints across partitions.

But maybe reducing the cost of Append is the answer to this.

> It is handy having a higher level interface to deal with partitioned
> tables. You can create a single "local" or "segmented" index and not have
> to manually deal with all the partitions as separate tables. But that's
> just syntactic sugar.

Right, and the easy part.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioned table performance

2004-12-15 Thread Josh Berkus
Stacy,

> Thanks again for the reply.  So it sounds like the answer to my original
> question is that it's expected that the pseudo-partitioning would introduce
> a fairly significant amount of overhead.  Correct?

Correct.   For that matter, Oracle table partitioning introduces significant 
overhead, from what I've seen.  I don't think there's a way not to.

Generally, I counsel people that they only want to consider 
pseudo-partitioning if they have one axis on the table which is used in 90% 
or more of the queries against that table.

What would improve the situation significantly, and the utility of 
pseudo-partitioning, is the ability to have a single index span multiple 
partitions.   This would allow you to have a segmented index for the 
partitioned axis, yet still use an unsegmented index for the other columns.  
However, there's a *lot* of work to do to make that happen.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Partitioned table performance

2004-12-14 Thread Stacy White
Josh,

You're absolutely correct that the overhead becomes less significant as the
partitioning prunes more rows.  I can even see a two-partition table being
useful in some situations (e.g., a table divided into a relatively small
"recent data" partition and a much larger "historical data" partition).  The
break-even point is when your partitioning scheme prunes 20% of the rows
(assuming you're using the inheritance based scheme).

Thanks again for the reply.  So it sounds like the answer to my original
question is that it's expected that the pseudo-partitioning would introduce
a fairly significant amount of overhead.  Correct?



- Original Message - 
From: "Josh Berkus" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Stacy White" <[EMAIL PROTECTED]>
Sent: Friday, December 10, 2004 9:52 PM
Subject: Re: [PERFORM] Partitioned table performance


Stacy,

> Each set of test tables holds 1,000,000 tuples with a partition value of
> '1', and 1,000,000 with a partition value of '2'. The bar* columns are all
> set to non-null values. The 'one_big_foo' table stores all 2M rows in one
> table. 'super_foo' and 'union_foo' split the data into two tables, and use
> inheritance and union views (respectively) to tie them together, as
> described in my previous message.
>
> Query timings and 'EXPLAIN ANALYZE' results for full table scans and for
> partition scans follow:

Hmmm  interesting.   I think you've demonstrated that
pseudo-partitioning
doesn't pay for having only 2 partitions.   Examine this:

-> Index Scan using idx_sub_foo2_partition on sub_foo2
super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221
rows=0 loops=1)
Index Cond: (partition = 1::numeric)
Total runtime: 15670.463 ms

As you see, even though the aggregate operation requires a seq scan, the
planner is still able to scan, and discard, sub_foo2, using its index in 0.2
seconds.  Unfortunately, super_foo still needs to contend with:

-> Append (cost=0.00..28376.79 rows=164 width=4) (actual
time=6.699..12072.483 rows=100 loops=1)

Right there, in the Append, you lose 6 seconds.   This means that
pseudo-partitioning via inheritance will become a speed gain once you can
"make up" that 6 seconds by being able to discard more partitions.   If you
want, do a test with 6 partitions instead of 2 and let us know how it comes
out.

Also, keep in mind that there are other reasons to do pseudo-partitioning
than
your example.  Data write performance, expiring partitions, and vacuum are
big reasons that can motivate partitioning even in cases when selects are
slower.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Partitioned table performance

2004-12-10 Thread Josh Berkus
Stacy,

> Each set of test tables holds 1,000,000 tuples with a partition value of
> '1', and 1,000,000 with a partition value of '2'.  The bar* columns are all
> set to non-null values.  The 'one_big_foo' table stores all 2M rows in one
> table.  'super_foo' and 'union_foo' split the data into two tables, and use
> inheritance and union views (respectively) to tie them together, as
> described in my previous message.
>
> Query timings and 'EXPLAIN ANALYZE' results for full table scans and for
> partition scans follow:

Hmmm  interesting.   I think you've demonstrated that pseudo-partitioning 
doesn't pay for having only 2 partitions.   Examine this:

         ->  Index Scan using idx_sub_foo2_partition on sub_foo2
super_foo  (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221
rows=0 loops=1)
               Index Cond: (partition = 1::numeric)
 Total runtime: 15670.463 ms

As you see, even though the aggregate operation requires a seq scan, the 
planner is still able to scan, and discard, sub_foo2, using its index in 0.2 
seconds.  Unfortunately, super_foo still needs to contend with:

   ->  Append  (cost=0.00..28376.79 rows=164 width=4) (actual
time=6.699..12072.483 rows=100 loops=1)

Right there, in the Append, you lose 6 seconds.   This means that 
pseudo-partitioning via inheritance will become a speed gain once you can 
"make up" that 6 seconds by being able to discard more partitions.   If you 
want, do a test with 6 partitions instead of 2 and let us know how it comes 
out.

Also, keep in mind that there are other reasons to do pseudo-partitioning than 
your example.  Data write performance, expiring partitions, and vacuum are 
big reasons that can motivate partitioning even in cases when selects are 
slower.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Partitioned table performance

2004-12-10 Thread Stacy White
ggregate  (cost=33377.11..33377.11 rows=1 width=4) (actual
time=15670.309..15670.312 rows=1 loops=1)
   ->  Append  (cost=0.00..28376.79 rows=164 width=4) (actual
time=6.699..12072.483 rows=100 loops=1)
 ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4)
(actual time=0.005..0.005 rows=0 loops=1)
   Filter: (partition = 1::numeric)
 ->  Seq Scan on sub_foo1 super_foo  (cost=0.00..28374.78
rows=162 width=4) (actual time=0.106..6688.812 rows=100 loops=1)
   Filter: (partition = 1::numeric)
 ->  Index Scan using idx_sub_foo2_partition on sub_foo2
super_foo  (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221
rows=0 loops=1)
   Index Cond: (partition = 1::numeric)
 Total runtime: 15670.463 ms
(9 rows)

Time: 15672.235 ms

vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM union_foo WHERE
partition = 1 ;
   QUERY
PLAN


 Aggregate  (cost=98573.40..98573.40 rows=1 width=4) (actual
time=31897.629..31897.632 rows=1 loops=1)
   ->  Subquery Scan union_foo  (cost=0.00..88573.20 rows=240
width=4) (actual time=0.134..28323.692 rows=100 loops=1)
 ->  Append  (cost=0.00..68572.80 rows=240 width=80) (actual
time=0.125..21969.522 rows=100 loops=1)
   ->  Subquery Scan "*SELECT* 1"  (cost=0.00..34286.40
rows=120 width=80) (actual time=0.120..16867.005 rows=100
loops=1)
 ->  Seq Scan on union_foo1  (cost=0.00..24286.20
rows=120 width=80) (actual time=0.108..8017.931 rows=100
loops=1)
   ->  Subquery Scan "*SELECT* 2"  (cost=0.00..34286.40
rows=120 width=80) (actual time=0.011..0.011 rows=0 loops=1)
 ->  Result  (cost=0.00..24286.20 rows=120
width=80) (actual time=0.004..0.004 rows=0 loops=1)
   One-Time Filter: false
   ->  Seq Scan on union_foo2
(cost=0.00..24286.20 rows=120 width=80) (never executed)
 Total runtime: 31897.897 ms
(10 rows)

Time: 31900.204 ms



- Original Message - 
From: "Josh Berkus" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "Stacy White" <[EMAIL PROTECTED]>
Sent: Sunday, December 05, 2004 3:06 PM
Subject: Re: [PERFORM] Partitioned table performance


Stacy,

Thanks for the stats!

> In some cases we've seen some increased performance in tests by splitting
> the table into several smaller tables. Both 'UNION ALL' views, and the
> superclass/subclass scheme work well at pruning down the set of rows a
> query uses, but they seem to introduce a large performance hit to the time
> to process each row (~50% for superclass/subclass, and ~150% for union
> views).

This seems reasonable, actually, given your test.   Really, what you should
be
comparing it against is not against selecting from an individual partition,
but selecting from the whole business as one large table.

I also suspect that wider rows results in less overhead proportionally; note
that your test contains *only* the indexed rows.   I should soon have a test
to prove this, hopefully.

However, I would be interested in seeing EXPLAIN ANALYZE from your tests
rather than just EXPLAIN.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Partitioned table performance

2004-12-05 Thread Josh Berkus
Stacy,

Thanks for the stats!

> In some cases we've seen some increased performance in tests by splitting
> the table into several smaller tables.  Both 'UNION ALL' views, and the
> superclass/subclass scheme work well at pruning down the set of rows a
> query uses, but they seem to introduce a large performance hit to the time
> to process each row (~50% for superclass/subclass, and ~150% for union
> views).

This seems reasonable, actually, given your test.   Really, what you should be 
comparing it against is not against selecting from an individual partition, 
but selecting from the whole business as one large table.  

I also suspect that wider rows results in less overhead proportionally; note 
that your test contains *only* the indexed rows.   I should soon have a test 
to prove this, hopefully.

However, I would be interested in seeing EXPLAIN ANALYZE from your tests 
rather than just EXPLAIN.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Partitioned table performance

2004-12-04 Thread Stacy White
We're working with a Postgres database that includes a fairly large table
(100M rows, increasing at around 2M per day).

In some cases we've seen some increased performance in tests by splitting
the table into several smaller tables.  Both 'UNION ALL' views, and the
superclass/subclass scheme work well at pruning down the set of rows a query
uses, but they seem to introduce a large performance hit to the time to
process each row (~50% for superclass/subclass, and ~150% for union views).

Is this to be expected?  Or is this a problem with our test setup?

I've listed details on our tests at the end of this message.  The results
are similar with our larger tables; the overhead appears to be per record
returned from the subquery/subclass; it's not a constant overhead per query.
Our production instance is running 7.4.2, but the results are the same on
8.0.

For reference, I tested with this setup (for the superclass/subclass
partitioning scheme):

   CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC );
   ANALYZE super_foo ;

   CREATE TABLE sub_foo1 () INHERITS ( super_foo );
   INSERT INTO sub_foo1 VALUES ( 1, 1 );
   -- repeat insert until sub_foo1 has 1,000,000 rows
   CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition );
   ANALYZE sub_foo1 ;

   CREATE TABLE sub_foo2 () INHERITS ( super_foo );
   INSERT INTO sub_foo2 VALUES ( 2, 1 );
   -- repeat insert until sub_foo2 has 1,000,000 rows
   CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition );
   ANALYZE sub_foo2 ;

and this setup for the union all scheme:

   CREATE TABLE union_foo1 ( bar NUMERIC );
   INSERT INTO union_foo1 VALUES ( 1 ) ;
   -- repeat insert until union_foo1 has 1,000,000 rows
   ANALYZE union_foo1 ;

   CREATE TABLE union_foo2 ( bar NUMERIC );
   INSERT INTO union_foo2 VALUES ( 1 ) ;
   -- repeat insert until union_foo2 has 1,000,000 rows
   ANALYZE union_foo2 ;

   CREATE VIEW union_foo AS
  SELECT 1 AS partition, * FROM union_foo1
  UNION ALL
  SELECT 2 AS partition, * FROM union_foo2 ;

The partition pruning works marvelously:

  EXPLAIN SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
   QUERY PLAN
 ---
--
 Aggregate  (cost=21899.02..21899.02 rows=1 width=32)
   ->  Append  (cost=0.00..19399.01 rows=102 width=32)
 ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=32)
   Filter: (partition = 2::numeric)
 ->  Index Scan using idx_subfoo1_partition on sub_foo1 super_foo
(cost=0.00..2.01 rows=1 width=10)
   Index Cond: (partition = 2::numeric)
 ->  Seq Scan on sub_foo2 super_foo  (cost=0.00..19397.00
rows=100 width=10)
   Filter: (partition = 2::numeric)

and

  EXPLAIN SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
 QUERY PLAN
 ---
--
 Aggregate  (cost=75819.15..75819.15 rows=1 width=32)
   ->  Subquery Scan union_foo  (cost=0.00..70818.60 rows=2000220 width=32)
 ->  Append  (cost=0.00..50816.40 rows=2000220 width=10)
   ->  Subquery Scan "*SELECT* 1"  (cost=0.00..25408.20
rows=1000110 width=10)
 ->  Result  (cost=0.00..15407.10 rows=1000110 width=10)
   One-Time Filter: false
   ->  Seq Scan on union_foo1  (cost=0.00..15407.10
rows=1000110 width=10)
   ->  Subquery Scan "*SELECT* 2"  (cost=0.00..25408.20
rows=1000110 width=10)
 ->  Seq Scan on union_foo2  (cost=0.00..15407.10
rows=1000110 width=10)


But you can see a fair amount of overhead, espcially in the case of the
union view:

  SELECT SUM(bar) FROM sub_foo1 UNION ALL SELECT SUM(bar) FROM sub_foo2 ;
  Time: 2291.637 ms

  SELECT SUM(bar) FROM union_foo1 UNION ALL SELECT SUM(bar) FROM union_foo2
;
  Time: 2248.225 ms

  SELECT SUM(bar) FROM super_foo ;
  Time: 3329.953 ms

  SELECT SUM(bar) FROM union_foo ;
  Time: 5267.742 ms


  SELECT SUM(bar) FROM sub_foo2 ;
  Time: 1124.496 ms

  SELECT SUM(bar) FROM union_foo2 ;
  Time: 1090.616 ms

  SELECT SUM(bar) FROM super_foo WHERE partition = 2 ;
  Time: 2137.767 ms

  SELECT SUM(bar) FROM union_foo WHERE partition = 2 ;
  Time: 2774.887 ms


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster