Re: [PERFORM] partitioned table performance
On Mon, 2006-10-30 at 22:58 +0100, Andreas Kostyrka wrote: > Am Montag, den 30.10.2006, 08:18 + schrieb Simon Riggs: > > On Sun, 2006-10-29 at 00:28 +0200, Andreas Kostyrka wrote: > > > > > Any ideas how to make the optimizer handle partitioned tables more > > > sensible? > > > > Yes, those are known inefficiencies in the current implementation which > > we expect to address for 8.3. > > Any ideas to force the current optimizer to do something sensible? Brute force & ignorance: PL/pgSQL Perhaps some other domain knowledge might help you shorten the search? Thats all for now. It's not a minor fixup and nobody had time to fix that for 8.2 since other fish were bigger. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] partitioned table performance
Am Montag, den 30.10.2006, 08:18 + schrieb Simon Riggs: > On Sun, 2006-10-29 at 00:28 +0200, Andreas Kostyrka wrote: > > > Any ideas how to make the optimizer handle partitioned tables more > > sensible? > > Yes, those are known inefficiencies in the current implementation which > we expect to address for 8.3. Any ideas to force the current optimizer to do something sensible? Andreas > signature.asc Description: Dies ist ein digital signierter Nachrichtenteil
Re: [PERFORM] partitioned table performance
On Sun, 2006-10-29 at 00:28 +0200, Andreas Kostyrka wrote: > Any ideas how to make the optimizer handle partitioned tables more > sensible? Yes, those are known inefficiencies in the current implementation which we expect to address for 8.3. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] partitioned table performance
Hi! I'm just wondering, I've got a table that is partitioned into monthly tables: media_downloads -> media_downloads_MM I\- id (primary key) \- created_on (timestamp criteria for the monthly table split) There are constraints upon the created_on column, all needed insert instead rules are defined too. One additional hardship is that id are not monotone against created_on, id1 < id2 does not imply created_on1 <= created_on2 :( The table contains basically almost 100M rows, and the number is growing. (the table will be about a 12GB pg_dump.) All relevant indexes (primary key id, index on created_on) are defined too. The good thing is, queries like all rows in the last 7 days work reasonable fast, the optimizer just checks the 1-2 last month tables. Using postgres 8.1.4-0ubuntu1, I've got to implement the following queries in a reasonable fast way: -- sequential reading of rows SELECT * FROM media_downloads WHERE id > 100 ORDER BY id LIMIT 100; Against the same monolithic table with about 16.5M rows, I'm getting a cost of 20.6 pages. (Index scan) Against the partitioned tables, I'm getting a cost of 5406822 pages. Now I understand, that without any additional conditions, postgresql needs to do the query for all subtables first, but explain against the subtables show costs of 4-5 pages. events=# explain select * from media_downloads where id >9000 order by id limit 100; QUERY PLAN --- Limit (cost=5406822.39..5406822.64 rows=100 width=1764) -> Sort (cost=5406822.39..5413639.50 rows=2726843 width=1764) Sort Key: public.media_downloads.id -> Result (cost=0.00..115960.71 rows=2726843 width=1764) -> Append (cost=0.00..115960.71 rows=2726843 width=1764) -> Seq Scan on media_downloads (cost=0.00..10.50 rows=13 width=1764) Filter: (id > 9000) -> Index Scan using media_downloads_200510_pkey on media_downloads_200510 media_downloads (cost=0.00..3.75 rows=14 width=243) Index Cond: (id > 9000) -> Index Scan using media_downloads_200511_pkey on media_downloads_200511 media_downloads (cost=0.00..72.19 rows=172 width=239) Index Cond: (id > 9000) -> Index Scan using media_downloads_200512_pkey on media_downloads_200512 media_downloads (cost=0.00..603.64 rows=172 width=240) Index Cond: (id > 9000) -> Index Scan using media_downloads_200601_pkey on media_downloads_200601 media_downloads (cost=0.00..19.33 rows=232 width=239) Index Cond: (id > 9000) -> Index Scan using media_downloads_200602_pkey on media_downloads_200602 media_downloads (cost=0.00..56.82 rows=316 width=240) Index Cond: (id > 9000) -> Index Scan using media_downloads_200603_pkey on media_downloads_200603 media_downloads (cost=0.00..18.88 rows=270 width=243) Index Cond: (id > 9000) -> Index Scan using media_downloads_200604_pkey on media_downloads_200604 media_downloads (cost=0.00..1194.16 rows=939 width=298) Index Cond: (id > 9000) -> Index Scan using media_downloads_200605_pkey on media_downloads_200605 media_downloads (cost=0.00..79.28 rows=672 width=326) Index Cond: (id > 9000) -> Index Scan using media_downloads_200606_pkey on media_downloads_200606 media_downloads (cost=0.00..75.26 rows=1190 width=314) Index Cond: (id > 9000) -> Index Scan using media_downloads_200607_pkey on media_downloads_200607 media_downloads (cost=0.00..55.29 rows=1238 width=319) Index Cond: (id > 9000) -> Index Scan using media_downloads_200608_pkey on media_downloads_200608 media_downloads (cost=0.00..73.95 rows=1305 width=319) Index Cond: (id > 9000) -> Index Scan using media_downloads_200609_pkey on media_downloads_200609 media_downloads (cost=0.00..144.10 rows=1575 width=324) Index Cond: (id > 9000) -> Index Scan using media_downloads_200610_pkey on media_downloads_200610 media_downloads (cost=0.00..113532.57 rows=2718709 width=337) Index Cond: (id > 9000) -> Seq Scan on media_downloads_200611 media_downloads (cost=0.00..10.50 rows=13 width=1764) Filter: (id > 9000) -> S
Re: [PERFORM] Partitioned table performance
The discussion seems to have diverged a little, so I don't feel too bad about making some semi-off-topic comments. From: "Greg Stark" <[EMAIL PROTECTED]> > Like I said though, we found "global indexes" defeated the whole purpose. First semi-off-topic comment: I think this depends on the index, the data, and the goal of the partitioning. We use partitioning on one of our Oracle projects for performance rather than managability. In this case, a global index on a non-partitioned field can be helpful. Imagine an 'orders' table with 100 partitions on week. Products have a short life cycle, and are typically around for only a week or two. A query like 'SELECT * FROM orders WHERE product_no = ?' forces a lookup on 100 different local indexes, but only one global index. Second-semi-off-topic comment: Josh mentioned that Oracle's partitioning introduces it's own overhead, so I re-ran my earlier benchmarks on one of our Oracle machines. I believe Oracle's licensing agreement prohibits me from posting any benchmarks, so all I'll say is that Postgres' inheritance partitioning implementation is _very_ low overhead, and even union views are competitive. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioned table performance
On Wed, Dec 15, 2004 at 11:56:40AM -0800, Josh Berkus wrote: > Greg, > > > Well Oracle has lots of partitioning intelligence pushed up to the planner > > to avoid overhead. > > > > If you have a query with something like "WHERE date = '2004-01-01'" and > > date is your partition key (even if it's a range) then Oracle will figure > > out which partition it will need at planning time. > > Hmmm ... well, we're looking at making a spec for Postgres Table > Partitioning. > Maybe you could help? This is something I've been thinking about doing for http://stats.distributed.net; is there a formal project for this somewhere? On a different note, has anyone looked at the savings you get by ommitting the partition field from the child tables? ISTM that the savings would be substantial for narrow tables. Of course that most likely means doing a union view instead of inheritence, but I'm guessing here. The table I'm thinking of partitioning is quite narrow (see below), so I suspect that dropping project_id out would result in a substantial savings (there's basically nothing that ever queries across the whole table). With the data distribution, I suspect just breaking project ID's 205, 5, and 25 into partitioned tables that didn't contain project_id would save about 450M (4bytes * 95% * 130M). (the table has ~130M rows) Table "public.email_contrib" Column | Type | Modifiers +-+--- project_id | integer | not null id | integer | not null date | date| not null team_id| integer | work_units | bigint | not null Indexes: "email_contrib_pkey" primary key, btree (project_id, id, date) "email_contrib__pk24" btree (id, date) WHERE (project_id = 24) "email_contrib__pk25" btree (id, date) WHERE (project_id = 25) "email_contrib__pk8" btree (id, date) WHERE (project_id = 8) "email_contrib__project_date" btree (project_id, date) Foreign-key constraints: "fk_email_contrib__id" FOREIGN KEY (id) REFERENCES stats_participant(id) ON UPDATE CASCADE "fk_email_contrib__team_id" FOREIGN KEY (team_id) REFERENCES stats_team(team) ON UPDATE CASCADE stats=# select * from pg_stats where tablename='email_contrib' and attname='project_id'; schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals |most_common_freqs | histogram_bounds | correlation +---++---+---++---+-+--+- public | email_contrib | project_id | 0 | 4 | 6 | {205,5,25,8,24,3} | {0.461133,0.4455,0.0444333,0.0418667,0.0049,0.00216667} | | 0.703936 -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioned table performance
Sorry for the late reply, so I included the whole thread. Should this be a TODO? On Wed, Dec 15, 2004 at 08:30:08PM -0500, Tom Lane wrote: > Greg Stark <[EMAIL PROTECTED]> writes: > > But I'm a bit puzzled. Why would Append have any significant cost? It's just > > taking the tuples from one plan node and returning them until they run out, > > then taking the tuples from another plan node. It should have no i/o cost > > and > > hardly any cpu cost. Where is the time going? > > As best I can tell by profiling, the cost of the Append node per se is > indeed negligible --- no more than a couple percent of the runtime in > CVS tip for a test case similar to Stacy White's example. > > It looks bad in EXPLAIN ANALYZE, but you have to realize that passing > the tuples up through the Append node doubles the instrumentation > overhead of EXPLAIN ANALYZE, which is pretty sizable already. (If you > turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE, > the actual elapsed time is about double, at least for me.) > > The other effect, which I hadn't expected, is that the seqscans > themselves actually slow down. I get > > regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ; >QUERY PLAN > > Aggregate (cost=16414.32..16414.32 rows=1 width=4) (actual > time=32313.980..32313.988 rows=1 loops=1) >-> Append (cost=0.00..13631.54 rows=556555 width=4) (actual > time=0.232..21848.401 rows=524289 loops=1) > -> Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=4) (actual > time=0.020..0.020 rows=0 loops=1) > -> Seq Scan on sub_foo1 super_foo (cost=0.00..6815.77 rows=278277 > width=4) (actual time=0.187..6926.395 rows=262144 loops=1) > -> Seq Scan on sub_foo2 super_foo (cost=0.00..6815.77 rows=278277 > width=4) (actual time=0.168..7026.953 rows=262145 loops=1) > Total runtime: 32314.993 ms > (6 rows) > > regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1; >QUERY PLAN > > Aggregate (cost=8207.16..8207.16 rows=1 width=4) (actual > time=9850.420..9850.428 rows=1 loops=1) >-> Seq Scan on sub_foo1 (cost=0.00..6815.77 rows=278277 width=4) (actual > time=0.202..4642.401 rows=262144 loops=1) > Total runtime: 9851.423 ms > (3 rows) > > Notice the actual times for the sub_foo1 seqscans. That increase (when > counted for both input tables) almost exactly accounts for the > difference in non-EXPLAIN ANALYZE runtime. > > After digging around, I find that the reason for the difference is that > the optimization to avoid a projection step (ExecProject) isn't applied > for scans of inheritance unions: > > /* >* Can't do it with inheritance cases either (mainly because Append >* doesn't project). >*/ > if (rel->reloptkind != RELOPT_BASEREL) > return false; > > So if you were to try the example in a pre-7.4 PG, which didn't have > that optimization, you'd probably find that the speeds were just about > the same. (I'm too lazy to verify this though.) > > I looked briefly at what it would take to cover this case, and decided > that it's a nontrivial change, so it's too late to do something about it > for 8.0. I think it's probably possible to fix it though, at least for > cases where the child tables have rowtypes identical to the parent. > > regards, tom lane > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioned table performance
Greg Stark <[EMAIL PROTECTED]> writes: > But I'm a bit puzzled. Why would Append have any significant cost? It's just > taking the tuples from one plan node and returning them until they run out, > then taking the tuples from another plan node. It should have no i/o cost and > hardly any cpu cost. Where is the time going? As best I can tell by profiling, the cost of the Append node per se is indeed negligible --- no more than a couple percent of the runtime in CVS tip for a test case similar to Stacy White's example. It looks bad in EXPLAIN ANALYZE, but you have to realize that passing the tuples up through the Append node doubles the instrumentation overhead of EXPLAIN ANALYZE, which is pretty sizable already. (If you turn on \timing in psql and try the query itself vs. EXPLAIN ANALYZE, the actual elapsed time is about double, at least for me.) The other effect, which I hadn't expected, is that the seqscans themselves actually slow down. I get regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM super_foo ; QUERY PLAN Aggregate (cost=16414.32..16414.32 rows=1 width=4) (actual time=32313.980..32313.988 rows=1 loops=1) -> Append (cost=0.00..13631.54 rows=556555 width=4) (actual time=0.232..21848.401 rows=524289 loops=1) -> Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=4) (actual time=0.020..0.020 rows=0 loops=1) -> Seq Scan on sub_foo1 super_foo (cost=0.00..6815.77 rows=278277 width=4) (actual time=0.187..6926.395 rows=262144 loops=1) -> Seq Scan on sub_foo2 super_foo (cost=0.00..6815.77 rows=278277 width=4) (actual time=0.168..7026.953 rows=262145 loops=1) Total runtime: 32314.993 ms (6 rows) regression=# explain analyze SELECT COUNT(*), MAX(bar1) FROM sub_foo1; QUERY PLAN Aggregate (cost=8207.16..8207.16 rows=1 width=4) (actual time=9850.420..9850.428 rows=1 loops=1) -> Seq Scan on sub_foo1 (cost=0.00..6815.77 rows=278277 width=4) (actual time=0.202..4642.401 rows=262144 loops=1) Total runtime: 9851.423 ms (3 rows) Notice the actual times for the sub_foo1 seqscans. That increase (when counted for both input tables) almost exactly accounts for the difference in non-EXPLAIN ANALYZE runtime. After digging around, I find that the reason for the difference is that the optimization to avoid a projection step (ExecProject) isn't applied for scans of inheritance unions: /* * Can't do it with inheritance cases either (mainly because Append * doesn't project). */ if (rel->reloptkind != RELOPT_BASEREL) return false; So if you were to try the example in a pre-7.4 PG, which didn't have that optimization, you'd probably find that the speeds were just about the same. (I'm too lazy to verify this though.) I looked briefly at what it would take to cover this case, and decided that it's a nontrivial change, so it's too late to do something about it for 8.0. I think it's probably possible to fix it though, at least for cases where the child tables have rowtypes identical to the parent. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioned table performance
Josh Berkus <[EMAIL PROTECTED]> writes: > > But I'm a bit puzzled. Why would Append have any significant cost? It's > > just taking the tuples from one plan node and returning them until they run > > out, then taking the tuples from another plan node. It should have no i/o > > cost and hardly any cpu cost. Where is the time going? > > Beats me. Tom? > > > In my experience "global indexes" defeat the whole purpose of having the > > partitions. They make dropping and adding partitions expensive which was > > always the reason we wanted to partition something anyways. > > Hmmm.Possibly, I was just thinking about the cost to partitioned tables > when you do a selection *not* on the partitioned axis. Also that currently > we can't enforce UNIQUE constraints across partitions. Like I said though, we found "global indexes" defeated the whole purpose. That meant no global UNIQUE constraints for us when we went to partitioned tables. It gave the DBAs the willies but it really wasn't a big deal. You can still do unique local indexes on a specific partition. So as long as your partition key is in the primary key you can have a trustworthy primary key. And even if not, you usually find you're only loading data into only one partition. In most applications it's pretty hard to get a record from two different partitions with conflicting IDs and not hard to check for. You could easily put a constraint saying that all PO numbers in the new fiscal year have to be greater than the last PO number from last year, for example. > But maybe reducing the cost of Append is the answer to this. The problem with global indexes is that adding or removing an entire partition becomes a large job. [Actually with Postgres MVCC I suppose removing might not. But cleaning up would eventually be a large job, and the point remains for adding a partition.] Ideally adding and removing a partition should be a O(1) operation. No data modification at all, purely catalog changes. > > It is handy having a higher level interface to deal with partitioned > > tables. You can create a single "local" or "segmented" index and not have > > to manually deal with all the partitions as separate tables. But that's > > just syntactic sugar. > > Right, and the easy part. I think the hard part lies in the optimizer actually. The semantics of the operations to manipulate partitions might be tricky to get right but the coding should be straightforward. Having the optimizer be able to recognize when it can prune partitions will be a lot of work. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioned table performance
Josh Berkus <[EMAIL PROTECTED]> writes: > Stacy, > > > Thanks again for the reply. So it sounds like the answer to my original > > question is that it's expected that the pseudo-partitioning would introduce > > a fairly significant amount of overhead. Correct? > > Correct. For that matter, Oracle table partitioning introduces significant > overhead, from what I've seen. I don't think there's a way not to. Well Oracle has lots of partitioning intelligence pushed up to the planner to avoid overhead. If you have a query with something like "WHERE date = '2004-01-01'" and date is your partition key (even if it's a range) then Oracle will figure out which partition it will need at planning time. Even if your query is something like "WHERE date = ?" then Oracle will still recognize that it will only need a single partition at planning time, though it has to decide which partition at execution time. We didn't notice any run-time performance degradation when we went to partitioned tables. Maybe we were so blinded by the joy they brought us on the maintenance side though. I don't think we specifically checked for run-time consequences. But I'm a bit puzzled. Why would Append have any significant cost? It's just taking the tuples from one plan node and returning them until they run out, then taking the tuples from another plan node. It should have no i/o cost and hardly any cpu cost. Where is the time going? > What would improve the situation significantly, and the utility of > pseudo-partitioning, is the ability to have a single index span multiple > partitions. This would allow you to have a segmented index for the > partitioned axis, yet still use an unsegmented index for the other columns. > However, there's a *lot* of work to do to make that happen. In my experience "global indexes" defeat the whole purpose of having the partitions. They make dropping and adding partitions expensive which was always the reason we wanted to partition something anyways. It is handy having a higher level interface to deal with partitioned tables. You can create a single "local" or "segmented" index and not have to manually deal with all the partitions as separate tables. But that's just syntactic sugar. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioned table performance
Greg, > Well Oracle has lots of partitioning intelligence pushed up to the planner > to avoid overhead. > > If you have a query with something like "WHERE date = '2004-01-01'" and > date is your partition key (even if it's a range) then Oracle will figure > out which partition it will need at planning time. Hmmm ... well, we're looking at making a spec for Postgres Table Partitioning. Maybe you could help? > But I'm a bit puzzled. Why would Append have any significant cost? It's > just taking the tuples from one plan node and returning them until they run > out, then taking the tuples from another plan node. It should have no i/o > cost and hardly any cpu cost. Where is the time going? Beats me. Tom? > In my experience "global indexes" defeat the whole purpose of having the > partitions. They make dropping and adding partitions expensive which was > always the reason we wanted to partition something anyways. Hmmm.Possibly, I was just thinking about the cost to partitioned tables when you do a selection *not* on the partitioned axis. Also that currently we can't enforce UNIQUE constraints across partitions. But maybe reducing the cost of Append is the answer to this. > It is handy having a higher level interface to deal with partitioned > tables. You can create a single "local" or "segmented" index and not have > to manually deal with all the partitions as separate tables. But that's > just syntactic sugar. Right, and the easy part. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioned table performance
Stacy, > Thanks again for the reply. So it sounds like the answer to my original > question is that it's expected that the pseudo-partitioning would introduce > a fairly significant amount of overhead. Correct? Correct. For that matter, Oracle table partitioning introduces significant overhead, from what I've seen. I don't think there's a way not to. Generally, I counsel people that they only want to consider pseudo-partitioning if they have one axis on the table which is used in 90% or more of the queries against that table. What would improve the situation significantly, and the utility of pseudo-partitioning, is the ability to have a single index span multiple partitions. This would allow you to have a segmented index for the partitioned axis, yet still use an unsegmented index for the other columns. However, there's a *lot* of work to do to make that happen. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Partitioned table performance
Josh, You're absolutely correct that the overhead becomes less significant as the partitioning prunes more rows. I can even see a two-partition table being useful in some situations (e.g., a table divided into a relatively small "recent data" partition and a much larger "historical data" partition). The break-even point is when your partitioning scheme prunes 20% of the rows (assuming you're using the inheritance based scheme). Thanks again for the reply. So it sounds like the answer to my original question is that it's expected that the pseudo-partitioning would introduce a fairly significant amount of overhead. Correct? - Original Message - From: "Josh Berkus" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "Stacy White" <[EMAIL PROTECTED]> Sent: Friday, December 10, 2004 9:52 PM Subject: Re: [PERFORM] Partitioned table performance Stacy, > Each set of test tables holds 1,000,000 tuples with a partition value of > '1', and 1,000,000 with a partition value of '2'. The bar* columns are all > set to non-null values. The 'one_big_foo' table stores all 2M rows in one > table. 'super_foo' and 'union_foo' split the data into two tables, and use > inheritance and union views (respectively) to tie them together, as > described in my previous message. > > Query timings and 'EXPLAIN ANALYZE' results for full table scans and for > partition scans follow: Hmmm interesting. I think you've demonstrated that pseudo-partitioning doesn't pay for having only 2 partitions. Examine this: -> Index Scan using idx_sub_foo2_partition on sub_foo2 super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221 rows=0 loops=1) Index Cond: (partition = 1::numeric) Total runtime: 15670.463 ms As you see, even though the aggregate operation requires a seq scan, the planner is still able to scan, and discard, sub_foo2, using its index in 0.2 seconds. Unfortunately, super_foo still needs to contend with: -> Append (cost=0.00..28376.79 rows=164 width=4) (actual time=6.699..12072.483 rows=100 loops=1) Right there, in the Append, you lose 6 seconds. This means that pseudo-partitioning via inheritance will become a speed gain once you can "make up" that 6 seconds by being able to discard more partitions. If you want, do a test with 6 partitions instead of 2 and let us know how it comes out. Also, keep in mind that there are other reasons to do pseudo-partitioning than your example. Data write performance, expiring partitions, and vacuum are big reasons that can motivate partitioning even in cases when selects are slower. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioned table performance
Stacy, > Each set of test tables holds 1,000,000 tuples with a partition value of > '1', and 1,000,000 with a partition value of '2'. The bar* columns are all > set to non-null values. The 'one_big_foo' table stores all 2M rows in one > table. 'super_foo' and 'union_foo' split the data into two tables, and use > inheritance and union views (respectively) to tie them together, as > described in my previous message. > > Query timings and 'EXPLAIN ANALYZE' results for full table scans and for > partition scans follow: Hmmm interesting. I think you've demonstrated that pseudo-partitioning doesn't pay for having only 2 partitions. Examine this: -> Index Scan using idx_sub_foo2_partition on sub_foo2 super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221 rows=0 loops=1) Index Cond: (partition = 1::numeric) Total runtime: 15670.463 ms As you see, even though the aggregate operation requires a seq scan, the planner is still able to scan, and discard, sub_foo2, using its index in 0.2 seconds. Unfortunately, super_foo still needs to contend with: -> Append (cost=0.00..28376.79 rows=164 width=4) (actual time=6.699..12072.483 rows=100 loops=1) Right there, in the Append, you lose 6 seconds. This means that pseudo-partitioning via inheritance will become a speed gain once you can "make up" that 6 seconds by being able to discard more partitions. If you want, do a test with 6 partitions instead of 2 and let us know how it comes out. Also, keep in mind that there are other reasons to do pseudo-partitioning than your example. Data write performance, expiring partitions, and vacuum are big reasons that can motivate partitioning even in cases when selects are slower. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioned table performance
ggregate (cost=33377.11..33377.11 rows=1 width=4) (actual time=15670.309..15670.312 rows=1 loops=1) -> Append (cost=0.00..28376.79 rows=164 width=4) (actual time=6.699..12072.483 rows=100 loops=1) -> Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=4) (actual time=0.005..0.005 rows=0 loops=1) Filter: (partition = 1::numeric) -> Seq Scan on sub_foo1 super_foo (cost=0.00..28374.78 rows=162 width=4) (actual time=0.106..6688.812 rows=100 loops=1) Filter: (partition = 1::numeric) -> Index Scan using idx_sub_foo2_partition on sub_foo2 super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221 rows=0 loops=1) Index Cond: (partition = 1::numeric) Total runtime: 15670.463 ms (9 rows) Time: 15672.235 ms vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM union_foo WHERE partition = 1 ; QUERY PLAN Aggregate (cost=98573.40..98573.40 rows=1 width=4) (actual time=31897.629..31897.632 rows=1 loops=1) -> Subquery Scan union_foo (cost=0.00..88573.20 rows=240 width=4) (actual time=0.134..28323.692 rows=100 loops=1) -> Append (cost=0.00..68572.80 rows=240 width=80) (actual time=0.125..21969.522 rows=100 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..34286.40 rows=120 width=80) (actual time=0.120..16867.005 rows=100 loops=1) -> Seq Scan on union_foo1 (cost=0.00..24286.20 rows=120 width=80) (actual time=0.108..8017.931 rows=100 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..34286.40 rows=120 width=80) (actual time=0.011..0.011 rows=0 loops=1) -> Result (cost=0.00..24286.20 rows=120 width=80) (actual time=0.004..0.004 rows=0 loops=1) One-Time Filter: false -> Seq Scan on union_foo2 (cost=0.00..24286.20 rows=120 width=80) (never executed) Total runtime: 31897.897 ms (10 rows) Time: 31900.204 ms - Original Message - From: "Josh Berkus" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "Stacy White" <[EMAIL PROTECTED]> Sent: Sunday, December 05, 2004 3:06 PM Subject: Re: [PERFORM] Partitioned table performance Stacy, Thanks for the stats! > In some cases we've seen some increased performance in tests by splitting > the table into several smaller tables. Both 'UNION ALL' views, and the > superclass/subclass scheme work well at pruning down the set of rows a > query uses, but they seem to introduce a large performance hit to the time > to process each row (~50% for superclass/subclass, and ~150% for union > views). This seems reasonable, actually, given your test. Really, what you should be comparing it against is not against selecting from an individual partition, but selecting from the whole business as one large table. I also suspect that wider rows results in less overhead proportionally; note that your test contains *only* the indexed rows. I should soon have a test to prove this, hopefully. However, I would be interested in seeing EXPLAIN ANALYZE from your tests rather than just EXPLAIN. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioned table performance
Stacy, Thanks for the stats! > In some cases we've seen some increased performance in tests by splitting > the table into several smaller tables. Both 'UNION ALL' views, and the > superclass/subclass scheme work well at pruning down the set of rows a > query uses, but they seem to introduce a large performance hit to the time > to process each row (~50% for superclass/subclass, and ~150% for union > views). This seems reasonable, actually, given your test. Really, what you should be comparing it against is not against selecting from an individual partition, but selecting from the whole business as one large table. I also suspect that wider rows results in less overhead proportionally; note that your test contains *only* the indexed rows. I should soon have a test to prove this, hopefully. However, I would be interested in seeing EXPLAIN ANALYZE from your tests rather than just EXPLAIN. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Partitioned table performance
We're working with a Postgres database that includes a fairly large table (100M rows, increasing at around 2M per day). In some cases we've seen some increased performance in tests by splitting the table into several smaller tables. Both 'UNION ALL' views, and the superclass/subclass scheme work well at pruning down the set of rows a query uses, but they seem to introduce a large performance hit to the time to process each row (~50% for superclass/subclass, and ~150% for union views). Is this to be expected? Or is this a problem with our test setup? I've listed details on our tests at the end of this message. The results are similar with our larger tables; the overhead appears to be per record returned from the subquery/subclass; it's not a constant overhead per query. Our production instance is running 7.4.2, but the results are the same on 8.0. For reference, I tested with this setup (for the superclass/subclass partitioning scheme): CREATE TABLE super_foo ( partition NUMERIC, bar NUMERIC ); ANALYZE super_foo ; CREATE TABLE sub_foo1 () INHERITS ( super_foo ); INSERT INTO sub_foo1 VALUES ( 1, 1 ); -- repeat insert until sub_foo1 has 1,000,000 rows CREATE INDEX idx_subfoo1_partition ON sub_foo1 ( partition ); ANALYZE sub_foo1 ; CREATE TABLE sub_foo2 () INHERITS ( super_foo ); INSERT INTO sub_foo2 VALUES ( 2, 1 ); -- repeat insert until sub_foo2 has 1,000,000 rows CREATE INDEX idx_subfoo2_partition ON sub_foo2 ( partition ); ANALYZE sub_foo2 ; and this setup for the union all scheme: CREATE TABLE union_foo1 ( bar NUMERIC ); INSERT INTO union_foo1 VALUES ( 1 ) ; -- repeat insert until union_foo1 has 1,000,000 rows ANALYZE union_foo1 ; CREATE TABLE union_foo2 ( bar NUMERIC ); INSERT INTO union_foo2 VALUES ( 1 ) ; -- repeat insert until union_foo2 has 1,000,000 rows ANALYZE union_foo2 ; CREATE VIEW union_foo AS SELECT 1 AS partition, * FROM union_foo1 UNION ALL SELECT 2 AS partition, * FROM union_foo2 ; The partition pruning works marvelously: EXPLAIN SELECT SUM(bar) FROM super_foo WHERE partition = 2 ; QUERY PLAN --- -- Aggregate (cost=21899.02..21899.02 rows=1 width=32) -> Append (cost=0.00..19399.01 rows=102 width=32) -> Seq Scan on super_foo (cost=0.00..0.00 rows=1 width=32) Filter: (partition = 2::numeric) -> Index Scan using idx_subfoo1_partition on sub_foo1 super_foo (cost=0.00..2.01 rows=1 width=10) Index Cond: (partition = 2::numeric) -> Seq Scan on sub_foo2 super_foo (cost=0.00..19397.00 rows=100 width=10) Filter: (partition = 2::numeric) and EXPLAIN SELECT SUM(bar) FROM union_foo WHERE partition = 2 ; QUERY PLAN --- -- Aggregate (cost=75819.15..75819.15 rows=1 width=32) -> Subquery Scan union_foo (cost=0.00..70818.60 rows=2000220 width=32) -> Append (cost=0.00..50816.40 rows=2000220 width=10) -> Subquery Scan "*SELECT* 1" (cost=0.00..25408.20 rows=1000110 width=10) -> Result (cost=0.00..15407.10 rows=1000110 width=10) One-Time Filter: false -> Seq Scan on union_foo1 (cost=0.00..15407.10 rows=1000110 width=10) -> Subquery Scan "*SELECT* 2" (cost=0.00..25408.20 rows=1000110 width=10) -> Seq Scan on union_foo2 (cost=0.00..15407.10 rows=1000110 width=10) But you can see a fair amount of overhead, espcially in the case of the union view: SELECT SUM(bar) FROM sub_foo1 UNION ALL SELECT SUM(bar) FROM sub_foo2 ; Time: 2291.637 ms SELECT SUM(bar) FROM union_foo1 UNION ALL SELECT SUM(bar) FROM union_foo2 ; Time: 2248.225 ms SELECT SUM(bar) FROM super_foo ; Time: 3329.953 ms SELECT SUM(bar) FROM union_foo ; Time: 5267.742 ms SELECT SUM(bar) FROM sub_foo2 ; Time: 1124.496 ms SELECT SUM(bar) FROM union_foo2 ; Time: 1090.616 ms SELECT SUM(bar) FROM super_foo WHERE partition = 2 ; Time: 2137.767 ms SELECT SUM(bar) FROM union_foo WHERE partition = 2 ; Time: 2774.887 ms ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster