Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time
On Tue, 1 Dec 2009, Jean-Michel Pouré wrote: PostgreSQL query analyzer needs to run a couple of times before it can rewrite and optimize the query. Make sure demand_id, id and join IDs carry indexes. Huh? At what point does the planner carry over previous plans and use them to further optimise the query? But perhaps the biggest factor here is calling a five table join a pretty simple query. Matthew -- Prolog doesn't have enough parentheses. -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
hummm Adding pgsql-perf :) On Mon, Nov 30, 2009 at 5:54 PM, Laurent Laborde kerdez...@gmail.com wrote: Friendly greetings ! I use postgresql 8.3.6. here is a few info about the table i'm querying : - - select count(*) from _article : 17301610 - select count(*) from _article WHERE (_article.bitfield getbit(0)) : 6729 Here are both request with problems : -- QUERY 1 : Very fast ! - explain SELECT * FROM _article WHERE (_article.bitfield getbit(0)) ORDER BY _article.id ASC LIMIT 500; QUERY PLAN - Limit (cost=66114.13..66115.38 rows=500 width=1114) - Sort (cost=66114.13..66157.37 rows=17296 width=1114) Sort Key: id - Bitmap Heap Scan on _article (cost=138.32..65252.29 rows=17296 width=1114) Recheck Cond: (bitfield B'1'::bit varying) - Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.00 rows=17296 width=0) Index Cond: (bitfield B'1'::bit varying) QUERY 2 : Endless ... (more than 30mn... i stopped the query) - explain SELECT * FROM _article WHERE (_article.bitfield getbit(0)) ORDER BY _article.id ASC LIMIT 5; QUERY PLAN - Limit (cost=0.00..2042.87 rows=5 width=1114) - Index Scan using _article_pkey on _article (cost=0.00..7066684.46 rows=17296 width=1114) Filter: (bitfield B'1'::bit varying) (3 rows) With LIMIT 5 and LIMIT 500, the query plan are differents. Postgresql estimate that it can do a a simple index scan to find only 5 row. With more than LIMIT ~400 it estimate that it's faster to do a more complex plan. and it make sense ! The problem is in the order by, of course. If i remove the order by the LIMIT 5 is faster (0.044 ms) and do an index scan. At limit 500 (without order) it still use an index scan and it is slightly slower. At limit 5000 (without order) it switch to a Bitmap Index Scan + Bitmap Heap Scan and it's slower but acceptable (5.275 ms) Why, with the QUERY 2, postgresql doesn't estimate the cost of the Sort/ORDER BY ? Of course, by ignoring the order, both query plan are right and the choice for thoses differents plans totally make sense. But... if the planner would be kind enough to considerate the cost of the order by, it would certainly choose the Bitmap Index + Bitmap Heap scan for the limit 5. And not an index_scan pkey ! I have set the statistics to 1000 for _article.bitfield, just in case (and ran a vacuum analyze), it doesn't change anything. Is that a bug ? any Idea ? Thank you :) -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde kerdez...@gmail.com wrote: QUERY PLAN - Limit (cost=66114.13..66115.38 rows=500 width=1114) - Sort (cost=66114.13..66157.37 rows=17296 width=1114) Sort Key: id - Bitmap Heap Scan on _article (cost=138.32..65252.29 rows=17296 width=1114) Recheck Cond: (bitfield B'1'::bit varying) - Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.00 rows=17296 width=0) Index Cond: (bitfield B'1'::bit varying) Uhm, what kind of index is idx_article_bitfield? -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde kerdez...@gmail.com wrote: QUERY PLAN - Limit (cost=0.00..2042.87 rows=5 width=1114) - Index Scan using _article_pkey on _article (cost=0.00..7066684.46 rows=17296 width=1114) Filter: (bitfield B'1'::bit varying) Ah, I missed this the first time around. It's scanning _article_pkey here. Ie, it's scanning the table from the oldest to the newest article assuming that the values wihch satisfy that constraint are evenly distributed and it'll find five of them pretty quickly. In reality there's a correlation between this bit being set and the value of _article.id and all the ones with it set are towards the end. Postgres doesn't have any statistics on how multiple columns are related yet so it can't know this. If this is an important query you might try having an index on bitfield,id or a partial index on id where bitfield B'1' . The latter sounds like what you really need -- greg -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
On Wed, Dec 2, 2009 at 8:01 AM, Laurent Laborde kerdez...@gmail.com wrote: On Wed, Dec 2, 2009 at 1:47 PM, Greg Stark gsst...@mit.edu wrote: On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde kerdez...@gmail.com wrote: QUERY PLAN - Limit (cost=0.00..2042.87 rows=5 width=1114) - Index Scan using _article_pkey on _article (cost=0.00..7066684.46 rows=17296 width=1114) Filter: (bitfield B'1'::bit varying) Ah, I missed this the first time around. It's scanning _article_pkey here. Ie, it's scanning the table from the oldest to the newest article assuming that the values wihch satisfy that constraint are evenly distributed and it'll find five of them pretty quickly. In reality there's a correlation between this bit being set and the value of _article.id and all the ones with it set are towards the end. Postgres doesn't have any statistics on how multiple columns are related yet so it can't know this. If this is an important query you might try having an index on bitfield,id or a partial index on id where bitfield B'1' . The latter sounds like what you really need There is, indeed, a lot of tricks and hacks. Maybe my question was too confusing. The question is : why a limit 5 is much much slower than a limit 500 ? The problem is in the order by and not finding enough the data that match the filter. Even if it's not evenly distributed, the queries without order by are much much faster, EVEN when using the pkey query plan. without order by using the bitmap - fast without order by using the pkey index - fast with order by using the bitmap - fast with order by using the pkey index - slow I'm confused. I think you've only shown us two query plans, so it's hard to judge what's going on here in the two cases you haven't shown. Also, you haven't shown the EXPLAIN ANALYZE output, so it's a bit tricky to judge what is really happening. However... as a general rule, the usual reason why the planner makes bad decisions with small LIMITs is that it overestimates the impact of the startup cost. If one plan has a startup cost of 1 and a run cost of 100, and another plan has a startup cost of 0 and a run cost of 100, the planner will pick the latter plan if a sufficiently small fraction of the rows are being fetched (less than a millionth of them). It's easy for the estimates to be off by enough to make this is a bad decision, especially if using operations that the planner doesn't have good estimates for ( may be one such). ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
* without order by, limit 5 : 70ms -- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) LIMIT 5; QUERY PLAN : Limit (cost=0.00..20.03 rows=5 width=1109) (actual time=70.190..70.265 rows=5 loops=1) - Index Scan using idx_article_bitfield on _article (cost=0.00..69290.99 rows=17298 width=1109) (actual time=70.188..70.260 rows=5 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 70.406 ms (4 rows) * without order by, limit 500 (same plan as above) : 371ms -- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) LIMIT 500; QUERY PLAN: Limit (cost=0.00..2002.86 rows=500 width=1109) (actual time=0.087..371.257 rows=500 loops=1) - Index Scan using idx_article_bitfield on _article (cost=0.00..69290.99 rows=17298 width=1109) (actual time=0.086..371.075 rows=500 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 371.369 ms * without order by, limit 5000 (query plan changed) : 1307ms --- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) LIMIT 5000; QUERY PLAN : Limit (cost=138.34..18971.86 rows=5000 width=1109) (actual time=53.782..1307.173 rows=5000 loops=1) - Bitmap Heap Scan on _article (cost=138.34..65294.79 rows=17298 width=1109) (actual time=53.781..1305.565 rows=5000 loops=1) Recheck Cond: (bitfield B'1'::bit varying) - Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.01 rows=17298 width=0) (actual time=53.606..53.606 rows=6743 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 1307.972 ms So... *without* order by, differents limit and different query plan : the queries are fast. * with order by, limit 5 : -- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) ORDER BY _article.id ASC LIMIT 5; QUERY PLAN : Mmmm the query is running since 2h ... waiting, waiting. * with order by, limit 500 : 546ms --- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) ORDER BY _article.id ASC LIMIT 500; QUERY PLAN : Limit (cost=66156.73..66157.98 rows=500 width=1109) (actual time=545.671..545.900 rows=500 loops=1) - Sort (cost=66156.73..66199.98 rows=17298 width=1109) (actual time=545.670..545.766 rows=500 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 603kB - Bitmap Heap Scan on _article (cost=138.34..65294.79 rows=17298 width=1109) (actual time=1.059..541.359 rows=6729 loops=1) Recheck Cond: (bitfield B'1'::bit varying) - Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.01 rows=17298 width=0) (actual time=0.922..0.922 rows=6743 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 546.163 ms Now... with ordery by, different limit, different query plan, the limit 5 query is insanly *SLOW* (while the limit 500 is super fast). What is think : The query planner do not consider the time taken by the order by... which is *much* slower !! -- Laurent ker2x Laborde Sysadmin DBA at http://www.over-blog.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
On Wed, Dec 2, 2009 at 10:32 AM, Laurent Laborde kerdez...@gmail.com wrote: * without order by, limit 5 : 70ms -- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) LIMIT 5; QUERY PLAN : Limit (cost=0.00..20.03 rows=5 width=1109) (actual time=70.190..70.265 rows=5 loops=1) - Index Scan using idx_article_bitfield on _article (cost=0.00..69290.99 rows=17298 width=1109) (actual time=70.188..70.260 rows=5 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 70.406 ms (4 rows) * without order by, limit 500 (same plan as above) : 371ms -- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) LIMIT 500; QUERY PLAN: Limit (cost=0.00..2002.86 rows=500 width=1109) (actual time=0.087..371.257 rows=500 loops=1) - Index Scan using idx_article_bitfield on _article (cost=0.00..69290.99 rows=17298 width=1109) (actual time=0.086..371.075 rows=500 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 371.369 ms * without order by, limit 5000 (query plan changed) : 1307ms --- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) LIMIT 5000; QUERY PLAN : Limit (cost=138.34..18971.86 rows=5000 width=1109) (actual time=53.782..1307.173 rows=5000 loops=1) - Bitmap Heap Scan on _article (cost=138.34..65294.79 rows=17298 width=1109) (actual time=53.781..1305.565 rows=5000 loops=1) Recheck Cond: (bitfield B'1'::bit varying) - Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.01 rows=17298 width=0) (actual time=53.606..53.606 rows=6743 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 1307.972 ms So... *without* order by, differents limit and different query plan : the queries are fast. * with order by, limit 5 : -- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) ORDER BY _article.id ASC LIMIT 5; QUERY PLAN : Mmmm the query is running since 2h ... waiting, waiting. * with order by, limit 500 : 546ms --- explain analyze SELECT * FROM _article WHERE (_article.bitfield getbit(0)) ORDER BY _article.id ASC LIMIT 500; QUERY PLAN : Limit (cost=66156.73..66157.98 rows=500 width=1109) (actual time=545.671..545.900 rows=500 loops=1) - Sort (cost=66156.73..66199.98 rows=17298 width=1109) (actual time=545.670..545.766 rows=500 loops=1) Sort Key: id Sort Method: top-N heapsort Memory: 603kB - Bitmap Heap Scan on _article (cost=138.34..65294.79 rows=17298 width=1109) (actual time=1.059..541.359 rows=6729 loops=1) Recheck Cond: (bitfield B'1'::bit varying) - Bitmap Index Scan on idx_article_bitfield (cost=0.00..134.01 rows=17298 width=0) (actual time=0.922..0.922 rows=6743 loops=1) Index Cond: (bitfield B'1'::bit varying) Total runtime: 546.163 ms Now... with ordery by, different limit, different query plan, the limit 5 query is insanly *SLOW* (while the limit 500 is super fast). What is think : The query planner do not consider the time taken by the order by... which is *much* slower !! That is certainly not the case. If the query planner did not consider the time required to perform a sort, well, that would have been fixed a lot sooner than now. The problem real problem here is exactly what I said upthread. Without order-by, the query planner picks an index-scan or a bitmap-index-scan and just runs it until it gets enough rows to satisfy the LIMIT. No problem. With order-by, it has to make a decision: should it fetch ALL the rows that satisfy the bitfield condition, sort them by article ID, and then pick the top five? Or should it instead use the index on article ID to start retrieving the lowest-numbered article IDs and hope to find 5 that satisfy the bitfield condition before it goes through too many rows? The answer depends on how frequently the bitfield condition will be satisfied. If most rows in the table satisfy the bitfield condition, then the second plan is better; if very few do, the first plan is better. Somewhat more subtly, the plan also depends on the LIMIT. The first plan requires almost the same amount of work for a small limit as it does for a large one - you still have to find ALL the rows that match the bitfield condition and sort them. Then you return a larger or smaller number of rows from the result of the sort depending on the LIMIT. But the amount of work that the second plan requires varies dramatically depending on the LIMIT. If the LIMIT is only one-hundredth as large (5 instead of 500), then the second plan figures to have to scan only one one-hundredth as many rows, so it takes about a
Re: [PERFORM] Cost of sort/order by not estimated by the query planner
Robert Haas robertmh...@gmail.com writes: The exact break-even point between the two plans will vary depending on what percentage of the rows in the table satisfy the bitmap condition. It's worse than that. The planner is not too bad about understanding the percentage-of-rows problem --- at least, assuming you are using a condition it has statistics for, which it doesn't for bitvector . But whether the indexscan plan is fast will also depend on where the matching rows are in the index ordering. If they're all towards the end you can lose big, and the planner hasn't got stats to let it predict that. It just assumes the filter condition is uncorrelated to the ordering condition. My own advice would be to forget the bitmap field and see if you can't use a collection of plain boolean columns instead. You might still lose if there's a correlation problem, but bitfield B'1' is absolutely positively guaranteed to produce stupid row estimates and hence bad plan choices. Or you could work on introducing a non-stupid selectivity estimator for , but it's not a trivial project. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time
Craig Ringer cr...@postnewspapers.com.au wrote: Some of those tables are views composed of multiple unions, too, by the looks of things. Doesn't the planner have some ... issues ... with estimation of row counts on joins over unions? Or is my memory just more faulty than usual? So far I can't tell if it's views with unions or (as I suspect) inheritance. The views and tables shown so far reference other objects not yet shown: core.inventory h.location h.actor However, I'm pretty sure that the problem is that the estimated row count explodes for no reason that I can see when the Nested Loop Left Join has an Append node from a parent table on the right. 28 rows joined to a 4 row append yields 51160 rows? 51160 rows joined to a 2 row append yields 203176856 rows? 203176856 rows joined to a 2 row append yields 806903677108 rows? Something seems funny with the math. I would have expected 28 times 4 times 2 times 2, equaling 448. Still higher than 15, but only by one order of magnitude -- where it might still make relatively sane plan choices. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time
Kevin Grittner kevin.gritt...@wicourts.gov writes: Estimates extracted from the problem plan: Nested Loop Left Join (rows=806903677108) - Nested Loop Left Join (rows=203176856) - Nested Loop Left Join (rows=51160) - Nested Loop Left Join (rows=28) - Append (rows=4) - Append (rows=2) - Append (rows=2) That does look weird. Do we have a self-contained test case? I wouldn't necessarily expect the join rowcount to be exactly the product of the input rowcounts, but it shouldn't be that far off, I should think. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time
Kevin Grittner wrote: Tom Lane t...@sss.pgh.pa.us wrote: That does look weird. Do we have a self-contained test case? Not at the moment. It seems to only occur with relatively complex joins. Richard, could you capture the schema for the affected tables and views with pg_dump -s and also the related rows from pg_statistic? (The actual table contents aren't needed to see this issue.) Here are the relevant parts of the schema - I've cut this out of the source-tree rather than pg_dump, since it seems more readable. Regarding pg_statistic, I don't understand how to find the relevant rows - what am I looking for? (the pg_statistic table is 247M in size). Thanks for your help, Richard THE PROBLEM QUERY - SELECT * FROM h.inventory WHERE demand_id = 289276563 ORDER BY id; #Note that using core.inventory (which is simpler) does not have the 30-second problem. #In general the h namespace is intended to be a human-readable diagnostic version whereas the core namespace #is optimised for the application h.inventory and core.inventory -- -- -- Inventory -- CREATE TABLE core.inventory ( id bigint NOT NULL DEFAULT core.new_id(), material_id bigint NOT NULL, location_id bigint NOT NULL, qty integer NOT NULL, divergence integer NOT NULL DEFAULT 0, ctime timestamp with time zone NOT NULL DEFAULT now(), actor_id bigint NULL, demand_id bigint NULL, PRIMARY KEY ( id ), FOREIGN KEY ( material_id ) REFERENCES core.__material_id ( id ), FOREIGN KEY ( location_id ) REFERENCES core.__location_id ( id ), FOREIGN KEY ( actor_id ) REFERENCES core.actor ( id ), FOREIGN KEY ( demand_id ) REFERENCES core.demand ( id ) ); CREATE INDEX inventory_material_id ON core.inventory ( material_id ); CREATE INDEX inventory_location_id ON core.inventory ( location_id ); CREATE INDEX inventory_actor_id ON core.inventory ( actor_id ); CREATE INDEX inventory_demand_id ON core.inventory ( demand_id ); CREATE OR REPLACE VIEW h.inventory AS SELECT core.inventory.id, core.inventory.material_id, h_material.tag AS material_tag, h_material.name AS material_name, core.inventory.location_id, h_location.tag AS location_tag, h_location.name AS location_name, core.inventory.qty, core.inventory.divergence, core.inventory.ctime, core.inventory.actor_id, h_actor.tag AS actor_tag, h_actor.name AS actor_name, core.inventory.demand_id, h_demand.target_id, h_demand.target_tag, h_demand.target_name FROM core.inventory LEFT OUTER JOIN h.material AS h_material ON core.inventory.material_id = h_material.id LEFT OUTER JOIN h.location AS h_location ON core.inventory.location_id = h_location.id LEFT OUTER JOIN h.actor AS h_actor ON core.inventory.actor_id = h_actor.id LEFT OUTER JOIN h.demand AS h_demand ON core.inventory.demand_id = h_demand.id; h.material and core.material -- -- Weights, dimensions, and other material data -- -- -- Materials -- CREATE TABLE core.material ( LIKE core.tag INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES, mass integer NOT NULL CHECK ( mass = 0 ), volume integer NOT NULL CHECK ( volume = 0 ) ) INHERITS ( core.tag ); SELECT core.inherit_unique_index ( 'material', 'tag', 'id' ); SELECT core.inherit_unique_index ( 'material', 'tag', 'tag' ); SELECT core.create_inheritable_unique_index ( 'material', 'id', 'bigint' ); COMMENT ON COLUMN core.material.mass IS 'Mass in grams'; COMMENT ON COLUMN core.material.volume IS 'Volume in ml'; CREATE OR REPLACE VIEW h.material AS SELECT core.material.id, core.material.tag, core.material.name, core.material.mass, core.material.volume FROM core.material; h.location and core.location -- -- Locations -- CREATE TABLE core.location ( LIKE core.tag INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES ) INHERITS ( core.tag ); SELECT core.inherit_unique_index ( 'location', 'tag', 'id' ); SELECT core.inherit_unique_index ( 'location', 'tag', 'tag' ); SELECT core.create_inheritable_unique_index ( 'location', 'id', 'bigint' ); CREATE OR REPLACE VIEW h.location AS SELECT core.location.id, core.location.tag, core.location.name FROM core.location; h.actor and core.actor -- -- -- Actors -- CREATE TABLE core.actor ( LIKE core.tag INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES ) INHERITS ( core.tag ); SELECT core.inherit_unique_index ( 'actor', 'tag', 'id' ); SELECT core.inherit_unique_index ( 'actor', 'tag', 'tag' ); CREATE OR REPLACE VIEW h.actor AS SELECT core.actor.id, core.actor.tag, core.actor.name FROM core.actor; h.demand and core.demand - -- -- Demand -- CREATE TABLE core.demand ( id bigint NOT NULL DEFAULT core.new_id(), target_id bigint NOT NULL, material_id bigint NOT NULL, qty integer NOT NULL, -- HACK benefit integer NOT NULL DEFAULT 0, PRIMARY KEY ( id ), UNIQUE ( target_id, material_id ), FOREIGN KEY (
Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time
Kevin Grittner wrote: Richard Neill rn...@cam.ac.uk wrote: Regarding pg_statistic, I don't understand how to find the relevant rows - what am I looking for? (the pg_statistic table is 247M in size). I think the only relevant rows would be the ones with starelid = pg_class.oid for a table used in the query, and I think you could further limit it to rows where staattnum = pg_attribute.attnum for a column referenced in the WHERE clause or a JOIN's ON clause (including in the views). To help match them up, and to cover all the bases, listing the related pg_class and pg_attribute rows would help. Hopefully that will allow us to generate the same plan in an EXPLAIN, and then see how it gets such an overblown estimate of the result rows. Thanks for your explanation. I ran the query: SELECT * from pg_statistic WHERE starelid IN (SELECT oid FROM pg_class where relname IN ('demand','waypoint','actor','location','material','inventory') ); and it's 228kB compressed, so rather than attaching it, I'm placing it here: http://www.richardneill.org/tmp/pg_statistic.bz2 Likewise, the much smaller (16kB) output from: SELECT * from pg_class where relname IN ('demand','waypoint','actor','location','material','inventory'); SELECT * from pg_attribute ; is at: http://www.richardneill.org/tmp/pg_attribute_pg_class.bz2 P.S. Would it be easier for you if I set up SSH access to a spare machine, with a copy of the database? Thanks very much for your help, Richard -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analyse without locking?
Dear All, I'm still puzzled by this one - it looks like it's causing about 5% of queries to rise in duration from ~300ms to 2-6 seconds. On the other hand, the system never seems to be I/O bound. (we have at least 25 MB/sec of write bandwidth, and use a small fraction of that normally). Here's the typical checkpoint logs: 2009-12-03 06:21:21 GMT LOG: checkpoint complete: wrote 12400 buffers (2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; write=149.883 s, sync=5.143 s, total=155.040 s We're using 8.4.1, on ext4 with SSD. Is it possible that something exotic is occurring to do with write barriers (on by default in ext4, and we haven't changed this). Perhaps a low priority IO process for writing the previous WAL to disk is blocking a high-priority transaction (which is trying to write to the new WAL). If the latter is trying to sync, could the large amount of lower priority IO be getting in the way thanks to write barriers? If so, can I safely turn off write barriers? Thanks, Richard P.S. Should I rename this thread? Richard Neill wrote: Dear All, It definitely looks checkpoint-related - the checkpoint timeout is set to 5 minutes, and here is a graph of our response time (in ms) over a 1 hour period. The query is pretty much identical each time. Any ideas what I could do to make checkpoints not hurt performance like this? Thanks, Richard Tom Lane wrote: Richard Neill rn...@cam.ac.uk writes: Now, I understand that increasing checkpoint_segments is generally a good thing (subject to some limit), but doesn't that just mean that instead of say a 1 second outage every minute, it's a 10 second outage every 10 minutes? In recent PG versions you can spread the checkpoint I/O out over a period of time, so it shouldn't be an outage at all, just background load. Other things being equal, a longer checkpoint cycle is better since it improves the odds of being able to coalesce multiple changes to the same page into a single write. The limiting factor is your threshold of pain on how much WAL-replay work would be needed to recover after a crash. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Checkpoint spikes
Richard Neill wrote: On the other hand, the system never seems to be I/O bound. (we have at least 25 MB/sec of write bandwidth, and use a small fraction of that normally). I would bet that if you sampled vmstat or iostat every single second, you'd discover there's a large burst in write speed for the same few seconds that queries are stuck. If you're averaging out the data over a 5 second or longer period, you'll never see it--the spike will get lost in the average. You just can't monitor checkpoint spikes unless you're watching I/O with an extremely tight time resolution. Watching the Writeback figure in /proc/meminfo is helpful too, that is where I normally see everything jammed up. Here's the typical checkpoint logs: 2009-12-03 06:21:21 GMT LOG: checkpoint complete: wrote 12400 buffers (2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; write=149.883 s, sync=5.143 s, total=155.040 s See that sync number there? That's your problem; while that sync operation is going on, everybody else is grinding to a halt waiting for it. Not a coincidence that the duration is about the same amount of time that your queries are getting stuck. This example shows 12400 buffers = 97MB of total data written. Since those writes are pretty random I/O, it's easily possible to get stuck for a few seconds waiting for that much data to make it out to disk. You only gave the write phase a couple of minutes to spread things out over; meanwhile, Linux may not even bother starting to write things out until 30 seconds into that, so the effective time between when writes to disk start and when the matching sync happens on your system is extremely small. That's not good--you have to give that several minutes of breathing room if you want to avoid checkpoint spikes. We're using 8.4.1, on ext4 with SSD. Is it possible that something exotic is occurring to do with write barriers (on by default in ext4, and we haven't changed this). Perhaps a low priority IO process for writing the previous WAL to disk is blocking a high-priority transaction (which is trying to write to the new WAL). If the latter is trying to sync, could the large amount of lower priority IO be getting in the way thanks to write barriers? If so, can I safely turn off write barriers? Linux is pretty dumb in general here. fsync operations will usually end up writing out way more of the OS buffer cache than they need to. And the write cache can get quite big before pdflush decides it should actually do some work, the whole thing is optimized for throughput rather than latency. I don't really trust barriers at all, so I don't know if there's some specific tuning you can do with those to improve things. Your whole system is bleeding edge craziness IMHO--SSD, ext4, write barriers, all stuff that just doesn't work reliably yet far as I'm concerned. ...but that's not what you want to hear. When I can suggest that should help is increasing checkpoint_segments (32), checkpoint_timeout (=10 minutes), checkpoint_completion_target (0.9), and lowering the amount of writes Linux will cache before it gets more aggressive about flushing them. Those things will fight the root cause of the problem, by giving more time between the write and sync phases of the checkpoint. It's ok if write takes a long while, decreasing the sync number is your goal you need to keep your eye on. I've written a couple of articles on this specific topic if you want more background on the underlying issues, it's kind of heavy reading: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm http://www.westnet.com/~gsmith/content/linux-pdflush.htm http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance