Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Matthew Wakeling

On Tue, 1 Dec 2009, Jean-Michel Pouré wrote:

PostgreSQL query analyzer needs to run a couple of times before it can
rewrite and optimize the query. Make sure demand_id, id and join IDs
carry indexes.


Huh? At what point does the planner carry over previous plans and use them 
to further optimise the query?


But perhaps the biggest factor here is calling a five table join a pretty 
simple query.


Matthew

--
Prolog doesn't have enough parentheses. -- Computer Science Lecturer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Laurent Laborde
hummm Adding pgsql-perf :)

On Mon, Nov 30, 2009 at 5:54 PM, Laurent Laborde kerdez...@gmail.com wrote:
 Friendly greetings !
 I use postgresql 8.3.6.

 here is a few info about the table i'm querying :
 -
 - select count(*) from _article : 17301610
 - select count(*) from _article WHERE (_article.bitfield  getbit(0)) : 6729


 Here are both request with problems :
 --

 QUERY 1 : Very fast !
 -

 explain SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 ORDER BY _article.id ASC
 LIMIT 500;
                                             QUERY PLAN
 -
  Limit  (cost=66114.13..66115.38 rows=500 width=1114)
   -  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
         Sort Key: id
         -  Bitmap Heap Scan on _article  (cost=138.32..65252.29
 rows=17296 width=1114)
               Recheck Cond: (bitfield  B'1'::bit varying)
               -  Bitmap Index Scan on idx_article_bitfield
 (cost=0.00..134.00 rows=17296 width=0)
                     Index Cond: (bitfield  B'1'::bit varying)




 QUERY 2 : Endless ... (more than 30mn... i stopped the query)
 -

 explain SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 ORDER BY _article.id ASC
 LIMIT 5;
                                           QUERY PLAN
 -
  Limit  (cost=0.00..2042.87 rows=5 width=1114)
   -  Index Scan using _article_pkey on _article
 (cost=0.00..7066684.46 rows=17296 width=1114)
         Filter: (bitfield  B'1'::bit varying)
 (3 rows)


 With LIMIT 5 and LIMIT 500, the query plan are differents.
 Postgresql estimate that it can do a a simple index scan to find only 5 row.
 With more than LIMIT ~400 it estimate that it's faster to do a more
 complex plan.
 and it make sense !

 The problem is in the order by, of course.
 If i remove the order by the LIMIT 5 is faster (0.044 ms) and do an
 index scan.
 At limit 500 (without order) it still use an index scan and it is
 slightly slower.
 At limit 5000 (without order) it switch to a Bitmap Index Scan +
 Bitmap Heap Scan and it's slower but acceptable (5.275 ms)

 Why, with the QUERY 2, postgresql doesn't estimate the cost of the
 Sort/ORDER BY ?
 Of course, by ignoring the order, both query plan are right and the
 choice for thoses differents plans totally make sense.

 But... if the planner would be kind enough to considerate the cost of
 the order by, it would certainly choose the Bitmap Index + Bitmap Heap
 scan for the limit 5.
 And not an index_scan pkey !

 I have set the statistics to 1000 for _article.bitfield, just in case
 (and ran a vacuum analyze), it doesn't change anything.

 Is that a bug ? any Idea ?

 Thank you :)

 --
 Laurent ker2x Laborde
 Sysadmin  DBA at http://www.over-blog.com/




-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Greg Stark
On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde kerdez...@gmail.com wrote:
                                             QUERY PLAN
 -
  Limit  (cost=66114.13..66115.38 rows=500 width=1114)
   -  Sort  (cost=66114.13..66157.37 rows=17296 width=1114)
         Sort Key: id
         -  Bitmap Heap Scan on _article  (cost=138.32..65252.29
 rows=17296 width=1114)
               Recheck Cond: (bitfield  B'1'::bit varying)
               -  Bitmap Index Scan on idx_article_bitfield
 (cost=0.00..134.00 rows=17296 width=0)
                     Index Cond: (bitfield  B'1'::bit varying)


Uhm, what kind of index is idx_article_bitfield?



-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Greg Stark
On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde kerdez...@gmail.com wrote:
                                           QUERY PLAN
 -
  Limit  (cost=0.00..2042.87 rows=5 width=1114)
   -  Index Scan using _article_pkey on _article
 (cost=0.00..7066684.46 rows=17296 width=1114)
         Filter: (bitfield  B'1'::bit varying)


Ah, I missed this the first time around. It's scanning _article_pkey
here. Ie, it's scanning the table from the oldest to the newest
article assuming that the values wihch satisfy that constraint are
evenly distributed and it'll find five of them pretty quickly. In
reality there's a correlation between this bit being set and the value
of _article.id and all the ones with it set are towards the end.
Postgres doesn't have any statistics on how multiple columns are
related yet so it can't know this.

If this is an important query you might try having an index on
bitfield,id or a partial index on id where bitfield  B'1' . The
latter sounds like what you really need

-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 8:01 AM, Laurent Laborde kerdez...@gmail.com wrote:
 On Wed, Dec 2, 2009 at 1:47 PM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde kerdez...@gmail.com wrote:
                                           QUERY PLAN
 -
  Limit  (cost=0.00..2042.87 rows=5 width=1114)
   -  Index Scan using _article_pkey on _article
 (cost=0.00..7066684.46 rows=17296 width=1114)
         Filter: (bitfield  B'1'::bit varying)


 Ah, I missed this the first time around. It's scanning _article_pkey
 here. Ie, it's scanning the table from the oldest to the newest
 article assuming that the values wihch satisfy that constraint are
 evenly distributed and it'll find five of them pretty quickly. In
 reality there's a correlation between this bit being set and the value
 of _article.id and all the ones with it set are towards the end.
 Postgres doesn't have any statistics on how multiple columns are
 related yet so it can't know this.

 If this is an important query you might try having an index on
 bitfield,id or a partial index on id where bitfield  B'1' . The
 latter sounds like what you really need

 There is, indeed, a lot of tricks and hacks.
 Maybe my question was too confusing.

 The question is : why a limit 5 is much much slower than a limit 500 ?

 The problem is in the order by and not finding enough the data that
 match the filter.
 Even if it's not evenly distributed, the queries without order by
 are much much faster, EVEN when using the pkey query plan.

 without order by using the bitmap - fast
 without order by using the pkey index - fast
 with order by using the bitmap - fast
 with order by using the pkey index - slow

I'm confused.  I think you've only shown us two query plans, so it's
hard to judge what's going on here in the two cases you haven't shown.
 Also, you haven't shown the EXPLAIN ANALYZE output, so it's a bit
tricky to judge what is really happening.

However... as a general rule, the usual reason why the planner makes
bad decisions with small LIMITs is that it overestimates the impact of
the startup cost.  If one plan has a startup cost of 1 and a run cost
of 100, and another plan has a startup cost of 0 and a run cost of
100, the planner will pick the latter plan if a sufficiently small
fraction of the rows are being fetched (less than a millionth of
them).  It's easy for the estimates to be off by enough to make this
is a bad decision, especially if using operations that the planner
doesn't have good estimates for ( may be one such).

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Laurent Laborde
* without order by, limit 5 : 70ms
--
 explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
LIMIT 5;

QUERY PLAN  :
Limit  (cost=0.00..20.03 rows=5 width=1109) (actual
time=70.190..70.265 rows=5 loops=1)
   -  Index Scan using idx_article_bitfield on _article
(cost=0.00..69290.99 rows=17298 width=1109) (actual
time=70.188..70.260 rows=5 loops=1)
 Index Cond: (bitfield  B'1'::bit varying)
 Total runtime: 70.406 ms
(4 rows)

* without order by, limit 500 (same plan as above) : 371ms
--
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
LIMIT 500;

QUERY PLAN:
 Limit  (cost=0.00..2002.86 rows=500 width=1109) (actual
time=0.087..371.257 rows=500 loops=1)
   -  Index Scan using idx_article_bitfield on _article
(cost=0.00..69290.99 rows=17298 width=1109) (actual
time=0.086..371.075 rows=500 loops=1)
 Index Cond: (bitfield  B'1'::bit varying)
 Total runtime: 371.369 ms

* without order by, limit 5000 (query plan changed) : 1307ms
---
 explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
LIMIT 5000;

QUERY PLAN :
 Limit  (cost=138.34..18971.86 rows=5000 width=1109) (actual
time=53.782..1307.173 rows=5000 loops=1)
   -  Bitmap Heap Scan on _article  (cost=138.34..65294.79 rows=17298
width=1109) (actual time=53.781..1305.565 rows=5000 loops=1)
 Recheck Cond: (bitfield  B'1'::bit varying)
 -  Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.01 rows=17298 width=0) (actual time=53.606..53.606
rows=6743 loops=1)
   Index Cond: (bitfield  B'1'::bit varying)
 Total runtime: 1307.972 ms


So... *without* order by, differents limit and different query plan
: the queries are fast.

* with order by, limit 5 :
--
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.id ASC
LIMIT 5;

QUERY PLAN :
Mmmm the query is running since 2h ... waiting, waiting.


* with order by, limit 500 : 546ms
---
explain analyze SELECT *
FROM   _article
WHERE (_article.bitfield  getbit(0))
ORDER BY _article.id ASC
LIMIT 500;
QUERY PLAN :
 Limit  (cost=66156.73..66157.98 rows=500 width=1109) (actual
time=545.671..545.900 rows=500 loops=1)
   -  Sort  (cost=66156.73..66199.98 rows=17298 width=1109) (actual
time=545.670..545.766 rows=500 loops=1)
 Sort Key: id
 Sort Method:  top-N heapsort  Memory: 603kB
 -  Bitmap Heap Scan on _article  (cost=138.34..65294.79
rows=17298 width=1109) (actual time=1.059..541.359 rows=6729 loops=1)
   Recheck Cond: (bitfield  B'1'::bit varying)
   -  Bitmap Index Scan on idx_article_bitfield
(cost=0.00..134.01 rows=17298 width=0) (actual time=0.922..0.922
rows=6743 loops=1)
 Index Cond: (bitfield  B'1'::bit varying)
 Total runtime: 546.163 ms


Now... with ordery by, different limit, different query plan, the
limit 5 query is insanly *SLOW* (while the limit 500 is super fast).

What is think : The query planner do not consider the time taken by
the order by... which is *much* slower !!


-- 
Laurent ker2x Laborde
Sysadmin  DBA at http://www.over-blog.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Robert Haas
On Wed, Dec 2, 2009 at 10:32 AM, Laurent Laborde kerdez...@gmail.com wrote:
 * without order by, limit 5 : 70ms
 --
  explain analyze SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 LIMIT 5;

 QUERY PLAN  :
 Limit  (cost=0.00..20.03 rows=5 width=1109) (actual
 time=70.190..70.265 rows=5 loops=1)
   -  Index Scan using idx_article_bitfield on _article
 (cost=0.00..69290.99 rows=17298 width=1109) (actual
 time=70.188..70.260 rows=5 loops=1)
         Index Cond: (bitfield  B'1'::bit varying)
  Total runtime: 70.406 ms
 (4 rows)

 * without order by, limit 500 (same plan as above) : 371ms
 --
 explain analyze SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 LIMIT 500;

 QUERY PLAN:
  Limit  (cost=0.00..2002.86 rows=500 width=1109) (actual
 time=0.087..371.257 rows=500 loops=1)
   -  Index Scan using idx_article_bitfield on _article
 (cost=0.00..69290.99 rows=17298 width=1109) (actual
 time=0.086..371.075 rows=500 loops=1)
         Index Cond: (bitfield  B'1'::bit varying)
  Total runtime: 371.369 ms

 * without order by, limit 5000 (query plan changed) : 1307ms
 ---
  explain analyze SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 LIMIT 5000;

 QUERY PLAN :
  Limit  (cost=138.34..18971.86 rows=5000 width=1109) (actual
 time=53.782..1307.173 rows=5000 loops=1)
   -  Bitmap Heap Scan on _article  (cost=138.34..65294.79 rows=17298
 width=1109) (actual time=53.781..1305.565 rows=5000 loops=1)
         Recheck Cond: (bitfield  B'1'::bit varying)
         -  Bitmap Index Scan on idx_article_bitfield
 (cost=0.00..134.01 rows=17298 width=0) (actual time=53.606..53.606
 rows=6743 loops=1)
               Index Cond: (bitfield  B'1'::bit varying)
  Total runtime: 1307.972 ms


 So... *without* order by, differents limit and different query plan
 : the queries are fast.

 * with order by, limit 5 :
 --
 explain analyze SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 ORDER BY _article.id ASC
 LIMIT 5;

 QUERY PLAN :
 Mmmm the query is running since 2h ... waiting, waiting.


 * with order by, limit 500 : 546ms
 ---
 explain analyze SELECT *
 FROM   _article
 WHERE (_article.bitfield  getbit(0))
 ORDER BY _article.id ASC
 LIMIT 500;
 QUERY PLAN :
  Limit  (cost=66156.73..66157.98 rows=500 width=1109) (actual
 time=545.671..545.900 rows=500 loops=1)
   -  Sort  (cost=66156.73..66199.98 rows=17298 width=1109) (actual
 time=545.670..545.766 rows=500 loops=1)
         Sort Key: id
         Sort Method:  top-N heapsort  Memory: 603kB
         -  Bitmap Heap Scan on _article  (cost=138.34..65294.79
 rows=17298 width=1109) (actual time=1.059..541.359 rows=6729 loops=1)
               Recheck Cond: (bitfield  B'1'::bit varying)
               -  Bitmap Index Scan on idx_article_bitfield
 (cost=0.00..134.01 rows=17298 width=0) (actual time=0.922..0.922
 rows=6743 loops=1)
                     Index Cond: (bitfield  B'1'::bit varying)
  Total runtime: 546.163 ms


 Now... with ordery by, different limit, different query plan, the
 limit 5 query is insanly *SLOW* (while the limit 500 is super fast).

 What is think : The query planner do not consider the time taken by
 the order by... which is *much* slower !!

That is certainly not the case.  If the query planner did not consider
the time required to perform a sort, well, that would have been fixed
a lot sooner than now.  The problem real problem here is exactly what
I said upthread.   Without order-by, the query planner picks an
index-scan or a bitmap-index-scan and just runs it until it gets
enough rows to satisfy the LIMIT.  No problem.  With order-by, it has
to make a decision: should it fetch ALL the rows that satisfy the
bitfield condition, sort them by article ID, and then pick the top
five?  Or should it instead use the index on article ID to start
retrieving the lowest-numbered article IDs and hope to find 5 that
satisfy the bitfield condition before it goes through too many rows?

The answer depends on how frequently the bitfield condition will be
satisfied.  If most rows in the table satisfy the bitfield condition,
then the second plan is better; if very few do, the first plan is
better.  Somewhat more subtly, the plan also depends on the LIMIT.
The first plan requires almost the same amount of work for a small
limit as it does for a large one - you still have to find ALL the rows
that match the bitfield condition and sort them.  Then you return a
larger or smaller number of rows from the result of the sort depending
on the LIMIT.  But the amount of work that the second plan requires
varies dramatically depending on the LIMIT.  If the LIMIT is only
one-hundredth as large (5 instead of 500), then the second plan
figures to have to scan only one one-hundredth as many rows, so it
takes about a 

Re: [PERFORM] Cost of sort/order by not estimated by the query planner

2009-12-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 The exact break-even point between the two plans will vary depending
 on what percentage of the rows in the table satisfy the bitmap
 condition.

It's worse than that.  The planner is not too bad about understanding
the percentage-of-rows problem --- at least, assuming you are using
a condition it has statistics for, which it doesn't for bitvector .
But whether the indexscan plan is fast will also depend on where the
matching rows are in the index ordering.  If they're all towards the
end you can lose big, and the planner hasn't got stats to let it
predict that.  It just assumes the filter condition is uncorrelated
to the ordering condition.

My own advice would be to forget the bitmap field and see if you can't
use a collection of plain boolean columns instead.  You might still
lose if there's a correlation problem, but bitfield  B'1' is
absolutely positively guaranteed to produce stupid row estimates and
hence bad plan choices.

Or you could work on introducing a non-stupid selectivity estimator
for , but it's not a trivial project.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Kevin Grittner
Craig Ringer cr...@postnewspapers.com.au wrote:
 
 Some of those tables are views composed of multiple unions, too,
 by the looks of things.
 
 Doesn't the planner have some ... issues ... with estimation of
 row counts on joins over unions? Or is my memory just more faulty
 than usual?
 
So far I can't tell if it's views with unions or (as I suspect)
inheritance.  The views and tables shown so far reference other
objects not yet shown:
 
core.inventory
h.location
h.actor
 
However, I'm pretty sure that the problem is that the estimated row
count explodes for no reason that I can see when the Nested Loop
Left Join has an Append node from a parent table on the right.
 
28 rows joined to a 4 row append yields 51160 rows?
51160 rows joined to a 2 row append yields 203176856 rows?
203176856 rows joined to a 2 row append yields 806903677108 rows?
 
Something seems funny with the math.  I would have expected 28 times
4 times 2 times 2, equaling 448.  Still higher than 15, but only by
one order of magnitude -- where it might still make relatively sane
plan choices.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Estimates extracted from the problem plan:
 
   Nested Loop Left Join  (rows=806903677108)
 -  Nested Loop Left Join  (rows=203176856)
   -  Nested Loop Left Join  (rows=51160)
 -  Nested Loop Left Join  (rows=28)
 -  Append  (rows=4)
   -  Append  (rows=2)
 -  Append  (rows=2)

That does look weird.  Do we have a self-contained test case?

I wouldn't necessarily expect the join rowcount to be exactly the
product of the input rowcounts, but it shouldn't be that far off,
I should think.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Richard Neill



Kevin Grittner wrote:

Tom Lane t...@sss.pgh.pa.us wrote:
 

That does look weird.  Do we have a self-contained test case?


Not at the moment. It seems to only occur with relatively complex joins.

 
Richard, could you capture the schema for the affected tables and

views with pg_dump -s and also the related rows from pg_statistic?
(The actual table contents aren't needed to see this issue.)
 


Here are the relevant parts of the schema - I've cut this out of the 
source-tree rather than pg_dump, since it seems more readable.


Regarding  pg_statistic, I don't understand how to find the relevant 
rows - what am I looking for? (the pg_statistic table is 247M in size).


Thanks for your help,

Richard

THE PROBLEM QUERY
-

SELECT * FROM h.inventory WHERE demand_id = 289276563 ORDER BY id;

#Note that using core.inventory (which is simpler) does not have the 30-second problem.

#In general the h namespace is intended to be a human-readable diagnostic version whereas the core namespace
#is optimised for the application






h.inventory and core.inventory
--

--
-- Inventory
--
CREATE TABLE core.inventory (
	id bigint NOT NULL DEFAULT core.new_id(),
	material_id bigint NOT NULL,
	location_id bigint NOT NULL,
	qty integer NOT NULL,
	divergence integer NOT NULL DEFAULT 0,
	ctime timestamp with time zone NOT NULL DEFAULT now(),
	actor_id bigint NULL,
	demand_id bigint NULL,
	PRIMARY KEY ( id ),
	FOREIGN KEY ( material_id ) REFERENCES core.__material_id ( id ),
	FOREIGN KEY ( location_id ) REFERENCES core.__location_id ( id ),
	FOREIGN KEY ( actor_id ) REFERENCES core.actor ( id ),
	FOREIGN KEY ( demand_id ) REFERENCES core.demand ( id )
);
CREATE INDEX inventory_material_id ON core.inventory ( material_id );
CREATE INDEX inventory_location_id ON core.inventory ( location_id );
CREATE INDEX inventory_actor_id ON core.inventory ( actor_id );
CREATE INDEX inventory_demand_id ON core.inventory ( demand_id );
CREATE OR REPLACE VIEW h.inventory AS
SELECT	core.inventory.id,
	core.inventory.material_id,
	h_material.tag AS material_tag,
	h_material.name AS material_name,
	core.inventory.location_id,
	h_location.tag AS location_tag,
	h_location.name AS location_name,
	core.inventory.qty,
	core.inventory.divergence,
	core.inventory.ctime,
	core.inventory.actor_id,
	h_actor.tag AS actor_tag,
	h_actor.name AS actor_name,
	core.inventory.demand_id,
	h_demand.target_id,
	h_demand.target_tag,
	h_demand.target_name
FROM	core.inventory
	LEFT OUTER JOIN h.material AS h_material
		ON core.inventory.material_id = h_material.id
	LEFT OUTER JOIN h.location AS h_location
		ON core.inventory.location_id = h_location.id
	LEFT OUTER JOIN h.actor AS h_actor
		ON core.inventory.actor_id = h_actor.id
	LEFT OUTER JOIN h.demand AS h_demand
		ON core.inventory.demand_id = h_demand.id;





h.material and core.material


--
-- Weights, dimensions, and other material data
--

--
-- Materials
--
CREATE TABLE core.material (
	LIKE core.tag
		INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES,
	mass integer NOT NULL CHECK ( mass = 0 ),
	volume integer NOT NULL CHECK ( volume = 0 )
) INHERITS ( core.tag );
SELECT core.inherit_unique_index ( 'material', 'tag', 'id' );
SELECT core.inherit_unique_index ( 'material', 'tag', 'tag' );
SELECT core.create_inheritable_unique_index ( 'material', 'id', 'bigint' );
COMMENT ON COLUMN core.material.mass IS
	'Mass in grams';
COMMENT ON COLUMN core.material.volume IS
	'Volume in ml';
CREATE OR REPLACE VIEW h.material AS
SELECT	core.material.id,
	core.material.tag,
	core.material.name,
	core.material.mass,
	core.material.volume
FROM	core.material;












h.location and core.location


--
-- Locations
--
CREATE TABLE core.location (
	LIKE core.tag
		INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ( core.tag );
SELECT core.inherit_unique_index ( 'location', 'tag', 'id' );
SELECT core.inherit_unique_index ( 'location', 'tag', 'tag' );
SELECT core.create_inheritable_unique_index ( 'location', 'id', 'bigint' );
CREATE OR REPLACE VIEW h.location AS
SELECT	core.location.id,
	core.location.tag,
	core.location.name
FROM	core.location;







h.actor and core.actor
--

--
-- Actors
--
CREATE TABLE core.actor (
	LIKE core.tag
		INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES
) INHERITS ( core.tag );
SELECT core.inherit_unique_index ( 'actor', 'tag', 'id' );
SELECT core.inherit_unique_index ( 'actor', 'tag', 'tag' );
CREATE OR REPLACE VIEW h.actor AS
SELECT	core.actor.id,
	core.actor.tag,
	core.actor.name
FROM	core.actor;





h.demand and core.demand
-

--
-- Demand
--
CREATE TABLE core.demand (
	id bigint NOT NULL DEFAULT core.new_id(),
	target_id bigint NOT NULL,
	material_id bigint NOT NULL,
	qty integer NOT NULL,
	-- HACK
	benefit integer NOT NULL DEFAULT 0,
	PRIMARY KEY ( id ),
	UNIQUE ( target_id, material_id ),
	FOREIGN KEY ( 

Re: [PERFORM] Order by (for 15 rows) adds 30 seconds to query time

2009-12-02 Thread Richard Neill



Kevin Grittner wrote:

Richard Neill rn...@cam.ac.uk wrote:
 

Regarding  pg_statistic, I don't understand how to find the
relevant rows - what am I looking for? (the pg_statistic table is
247M in size).
 
I think the only relevant rows would be the ones with starelid =

pg_class.oid for a table used in the query, and I think you could
further limit it to rows where staattnum = pg_attribute.attnum for a
column referenced in the WHERE clause or a JOIN's ON clause
(including in the views).  To help match them up, and to cover all
the bases, listing the related pg_class and pg_attribute rows would
help.
 
Hopefully that will allow us to generate the same plan in an

EXPLAIN, and then see how it gets such an overblown estimate of the
result rows.



Thanks for your explanation. I ran the query:

SELECT * from pg_statistic WHERE starelid IN
  (SELECT oid FROM pg_class where relname IN
('demand','waypoint','actor','location','material','inventory')
  );

and it's 228kB compressed, so rather than attaching it, I'm placing it 
here:  http://www.richardneill.org/tmp/pg_statistic.bz2



Likewise, the much smaller (16kB) output from:

SELECT * from pg_class where relname IN
  ('demand','waypoint','actor','location','material','inventory');

SELECT * from pg_attribute ;

is at: http://www.richardneill.org/tmp/pg_attribute_pg_class.bz2



P.S. Would it be easier for you if I set up SSH access to a spare 
machine, with a copy of the database?



Thanks very much for your help,

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyse without locking?

2009-12-02 Thread Richard Neill

Dear All,

I'm still puzzled by this one - it looks like it's causing about 5% of 
queries to rise in duration from ~300ms to 2-6 seconds.


On the other hand, the system never seems to be I/O bound. (we have at 
least 25 MB/sec of write bandwidth, and use a small fraction of that 
normally).


Here's the typical checkpoint logs:

2009-12-03 06:21:21 GMT LOG:  checkpoint complete: wrote 12400 buffers 
(2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; 
write=149.883 s, sync=5.143 s, total=155.040 s


We're using 8.4.1, on ext4 with SSD. Is it possible that something 
exotic is occurring to do with write barriers (on by default in ext4, 
and we haven't changed this).


Perhaps a low priority IO process for writing the previous WAL to disk 
is blocking a high-priority transaction (which is trying to write to the 
new WAL). If the latter is trying to sync, could the large amount of 
lower priority IO be getting in the way thanks to write barriers?


If so, can I safely turn off write barriers?

Thanks,

Richard


P.S. Should I rename this thread?




Richard Neill wrote:

Dear All,

It definitely looks checkpoint-related - the checkpoint timeout is set 
to 5 minutes, and here is a graph of our response time (in ms) over a 1 
hour period. The query is pretty much identical each time.


Any ideas what I could do to make checkpoints not hurt performance like 
this?


Thanks,

Richard



Tom Lane wrote:

Richard Neill rn...@cam.ac.uk writes:
Now, I understand that increasing checkpoint_segments is generally a 
good thing (subject to some limit), but doesn't that just mean that 
instead of say a 1 second outage every minute, it's a 10 second 
outage every 10 minutes?


In recent PG versions you can spread the checkpoint I/O out over a
period of time, so it shouldn't be an outage at all, just background
load.  Other things being equal, a longer checkpoint cycle is better
since it improves the odds of being able to coalesce multiple changes
to the same page into a single write.  The limiting factor is your
threshold of pain on how much WAL-replay work would be needed to recover
after a crash.







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Checkpoint spikes

2009-12-02 Thread Greg Smith

Richard Neill wrote:
On the other hand, the system never seems to be I/O bound. (we have at 
least 25 MB/sec of write bandwidth, and use a small fraction of that 
normally).
I would bet that if you sampled vmstat or iostat every single second, 
you'd discover there's a large burst in write speed for the same few 
seconds that queries are stuck.  If you're averaging out the data over a 
5 second or longer period, you'll never see it--the spike will get lost 
in the average.  You just can't monitor checkpoint spikes unless you're 
watching I/O with an extremely tight time resolution.  Watching the 
Writeback figure in /proc/meminfo is helpful too, that is where I 
normally see everything jammed up.



Here's the typical checkpoint logs:
2009-12-03 06:21:21 GMT LOG:  checkpoint complete: wrote 12400 buffers 
(2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled; 
write=149.883 s, sync=5.143 s, total=155.040 s
See that sync number there?  That's your problem; while that sync 
operation is going on, everybody else is grinding to a halt waiting for 
it.  Not a coincidence that the duration is about the same amount of 
time that your queries are getting stuck.  This example shows 12400 
buffers = 97MB of total data written.  Since those writes are pretty 
random I/O, it's easily possible to get stuck for a few seconds waiting 
for that much data to make it out to disk.  You only gave the write 
phase a couple of minutes to spread things out over; meanwhile, Linux 
may not even bother starting to write things out until 30 seconds into 
that, so the effective time between when writes to disk start and when 
the matching sync happens on your system is extremely small.  That's not 
good--you have to give that several minutes of breathing room if you 
want to avoid checkpoint spikes.


We're using 8.4.1, on ext4 with SSD. Is it possible that something 
exotic is occurring to do with write barriers (on by default in ext4, 
and we haven't changed this).
Perhaps a low priority IO process for writing the previous WAL to disk 
is blocking a high-priority transaction (which is trying to write to 
the new WAL). If the latter is trying to sync, could the large amount 
of lower priority IO be getting in the way thanks to write barriers?

If so, can I safely turn off write barriers?
Linux is pretty dumb in general here.  fsync operations will usually end 
up writing out way more of the OS buffer cache than they need to.  And 
the write cache can get quite big before pdflush decides it should 
actually do some work, the whole thing is optimized for throughput 
rather than latency.  I don't really trust barriers at all, so I don't 
know if there's some specific tuning you can do with those to improve 
things.  Your whole system is bleeding edge craziness IMHO--SSD, ext4, 
write barriers, all stuff that just doesn't work reliably yet far as I'm 
concerned.


...but that's not what you want to hear.  When I can suggest that should 
help is increasing checkpoint_segments (32), checkpoint_timeout (=10 
minutes), checkpoint_completion_target (0.9), and lowering the amount of 
writes Linux will cache before it gets more aggressive about flushing 
them.  Those things will fight the root cause of the problem, by giving 
more time between the write and sync phases of the checkpoint.  It's 
ok if write takes a long while, decreasing the sync number is your 
goal you need to keep your eye on.


I've written a couple of articles on this specific topic if you want 
more background on the underlying issues, it's kind of heavy reading:


http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
http://www.westnet.com/~gsmith/content/linux-pdflush.htm
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance