Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Matthew Wakeling

On Wed, 25 Nov 2009, Grzegorz Jaśkiewicz wrote:

the out of order data layout is primary reason for index bloat. And that 
happens , and
gets worse over time once data is more and more distributed. (random deletes, 
etc).


That's not index bloat. Sure, having the table not in the same order as 
the index will slow down an index scan, but that's a completely different 
problem altogether.


Index bloat is caused by exactly the same mechanism as table bloat. The 
index needs to have an entry for every row in the table that may be 
visible by anyone. As with the table, it is not possible to 
deterministically delete the rows as they become non-visible, so the 
index (and the table) will be left with dead entries on delete and update. 
The vacuum command performs garbage collection and marks these dead rows 
and index entries as free, so that some time in the future more data can 
be written to those places.


Index bloat is when there is an excessive amount of dead space in an 
index. It can be prevented by (auto)vacuuming regularly, but can only be 
reversed by REINDEX (or of course deleting the index, or adding loads of 
new entries to fill up the dead space after vacuuming).


Matthew

--
for a in past present future; do
  for b in clients employers associates relatives neighbours pets; do
  echo The opinions here in no way reflect the opinions of my $a $b.
done; done
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Sergey Aleynikov
Hello,

2009/11/25 Richard Neill rn...@cam.ac.uk:

It's a simple query, but using a complex view. So I can't really re-order it.
View is inserted directly into your query by PG, and then reordered
according to from_collapse_limit. Probably, problems lies in the view?
How good is it performing? Or from_collapse_limit is _too low_, so
view isn't expanded right?

Are you saying that this means that the query planner frequently makes the 
wrong choice here?
Look at explain analyze. If on some step estimation from planner
differs by (for start) two order of magnitude from what's really
retrieved, then there's a wrong statistics count. But if, on every
step, estimation is not too far away from reality - you suffer from
what i've described - planner can't reoder efficiently enough query.
Because of it happen sometimes - i suspect gego. Or wrong statistics.

I hadn't changed it from the defaults; now I've changed it to:
 autovacuum_max_workers = 6
 autovacuum_vacuum_scale_factor = 0.002
 autovacuum_analyze_scale_factor = 0.001

If your tables are not 100mln rows, that's agressive enough. On
100mln rows, this'd analyze table every 100k changed
(inserted/updated/deleted) rows. Is this enough for you? Default on
large tables are definatly too low. If you get now consistent times -
then you've been hit by wrong statistics.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Sergey Aleynikov
Hello,

2009/11/25 Richard Neill rn...@cam.ac.uk:

Also, if you find odd statistics of freshly analyzed table - try
increasing statistics target, using
ALTER TABLE .. ALTER COLUMN .. SET STATISTICS ...

If you're using defaults - it's again low for large tables. Start with
200, for example.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Richard Neill



Sergey Aleynikov wrote:

Hello,

2009/11/25 Richard Neill rn...@cam.ac.uk:

Also, if you find odd statistics of freshly analyzed table - try
increasing statistics target, using
ALTER TABLE .. ALTER COLUMN .. SET STATISTICS ...

If you're using defaults - it's again low for large tables. Start with
200, for example.


Thanks. I already had it set way up: 3000.

Is there a good description of exactly what analyse does, and how?
(in particular, what sort of statistics it gathers).

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill



Matthew Wakeling wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:
Worse still, doing a cluster of most of the tables and vacuum full 
analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to indexes.


We'd left it too long, and the DB was reaching 90% of disk space. I 
didn't realise that vacuum full was ever actively bad, only sometimes 
unneeded. I do now - thanks for the tip.




If you have run a cluster command, then running vacuum full will make 
the table and index layout worse, not better.




So, having managed to bloat the indexes in this way, what can I do to 
fix it? Will a regular vacuum do the job?


Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill



Matthew Wakeling wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:
Worse still, doing a cluster of most of the tables and vacuum full 
analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to indexes.


We'd left it too long, and the DB was reaching 90% of disk space. I
didn't realise that vacuum full was ever actively bad, only sometimes
unneeded. I do now - thanks for the tip.



If you have run a cluster command, then running vacuum full will make 
the table and index layout worse, not better.




So, having managed to bloat the indexes in this way, what can I do to
fix it? Will a regular vacuum do the job?

Richard


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Matthew Wakeling

On Wed, 25 Nov 2009, Richard Neill wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:

Worse still, doing a cluster of most of the tables and vacuum full analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to indexes.


We'd left it too long, and the DB was reaching 90% of disk space. I
didn't realise that vacuum full was ever actively bad, only sometimes
unneeded. I do now - thanks for the tip.


The problem is that vacuum full does a full compact of the table, but it 
has to update all the indexes as it goes. This makes it slow, and causes 
bloat to the indexes. There has been some discussion of removing the 
command or at least putting a big warning next to it.



So, having managed to bloat the indexes in this way, what can I do to
fix it? Will a regular vacuum do the job?


In fact, cluster is exactly the command you are looking for. It will drop 
the indexes, do a complete table rewrite (in the correct order), and then 
recreate all the indexes again.


In normal operation, a regular vacuum will keep the table under control, 
but if you actually want to shrink the database files in exceptional 
circumstances, then cluster is the tool for the job.


Matthew

--
Matthew: That's one of things about Cambridge - all the roads keep changing
  names as you walk along them, like Hills Road in particular.
Sagar:   Yes, Sidney Street is a bit like that too.
Matthew: Sidney Street *is* Hills Road.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill



Matthew Wakeling wrote:

On Wed, 25 Nov 2009, Richard Neill wrote:

On Sun, 22 Nov 2009, Richard Neill wrote:
Worse still, doing a cluster of most of the tables and vacuum full 
analyze


In fact, cluster is exactly the command you are looking for. It will 
drop the indexes, do a complete table rewrite (in the correct order), 
and then recreate all the indexes again.


In normal operation, a regular vacuum will keep the table under control, 
but if you actually want to shrink the database files in exceptional 
circumstances, then cluster is the tool for the job.




Thanks - now I understand.

In terms of just index bloat, does a regular vacuum help?

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Richard Neill

Sergey Aleynikov wrote:

Hello,


* Is there any way I can nail the query planner to a particular query plan,
rather than have it keep changing its mind?


All these setting leads to choosing different plans. If you have small
number of complex sensitive queires, you can run explain on them with
correct settings, then re-order query (joins, subselects) according to
given query plan, and, before running it, call

set local join_collapse_limit = 1;
set local from_collapse_limit = 1;


It's a simple query, but using a complex view. So I can't really 
re-order it.



This will prevent joins/subselects reordering inside current
transaction block, leading to consistent plans. But that gives no 100%
guarantee for chosing, for example, hash join over nested loop.


Are you saying that this means that the query planner frequently makes 
the wrong choice here?




Worse still, doing a cluster of most of the tables and vacuum full analyze  
 made most of the queries respond much better, but the vox query 
became very slow again, until I set it to A (which, a few days ago, did 
not work well).


Is your autovacuuming tuned correctly? For large tables, i set it
running much more agressivly then in default install.


I hadn't changed it from the defaults; now I've changed it to:

autovacuum_max_workers = 6
autovacuum_vacuum_scale_factor = 0.002
autovacuum_analyze_scale_factor = 0.001

is that enough?

The DB isn't growing that much, but  it does seem to need frequent 
vacuum/analyze.



Richard


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Kevin Grittner
Richard Neill rn...@cam.ac.uk wrote:
 
 In terms of just index bloat, does a regular vacuum help?
 
You might want to use the REINDEX command to correct serious index
bloat.  A regular vacuum will make dead space available for re-use,
but won't eliminate bloat directly.  (If run regularly, it will
prevent bloat.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Grzegorz Jaśkiewicz
On Wed, Nov 25, 2009 at 4:26 PM, Kevin Grittner kevin.gritt...@wicourts.gov
 wrote:

 Richard Neill rn...@cam.ac.uk wrote:

  In terms of just index bloat, does a regular vacuum help?

 You might want to use the REINDEX command to correct serious index
 bloat.  A regular vacuum will make dead space available for re-use,
 but won't eliminate bloat directly.  (If run regularly, it will
 prevent bloat.)

 for that reason, it makes sense to actually partition your data - even tho
you don't see performance degradation because of data size, but purely
because of nature of data.
Other way, is to perform regular cluster  reindex - but this blocks
relations you are clustering..



-- 
GJ


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Kevin Grittner
Grzegorz Jaœkiewiczgryz...@gmail.com wrote:
 
 Other way, is to perform regular cluster  reindex
 
If you CLUSTER there is no reason to REINDEX; indexes are rebuilt by
the CLUSTER command.
 
Also, if you do a good job with regular VACUUMs, there isn't any bloat
to fix.  In that case a regular CLUSTER would only be needed if it was
worth the cost to keep data physically organized in the index
sequence.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-25 Thread Robert Haas
On Wed, Nov 25, 2009 at 7:27 AM, Richard Neill rn...@cam.ac.uk wrote:
 Sergey Aleynikov wrote:

 Hello,

 * Is there any way I can nail the query planner to a particular query
 plan,
 rather than have it keep changing its mind?

 All these setting leads to choosing different plans. If you have small
 number of complex sensitive queires, you can run explain on them with
 correct settings, then re-order query (joins, subselects) according to
 given query plan, and, before running it, call

 set local join_collapse_limit = 1;
 set local from_collapse_limit = 1;

 It's a simple query, but using a complex view. So I can't really re-order
 it.

Almost all queries can be reordered to some degree, but you might have
to inline the view into the main query to actually be able to do it.
Forcing a particular query plan in the manner described here is
generally sort of a last resort, though.  Usually you want to figure
out how to tune things so that the query planner picks the right plan
by itself - that's sort of the point of having a query planner...

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-23 Thread Matthew Wakeling

On Sun, 22 Nov 2009, Richard Neill wrote:

Worse still, doing a cluster of most of the tables and vacuum full analyze


Why are you doing a vacuum full? That command is not meant to be used 
except in the most unusual of circumstances, as it causes bloat to 
indexes.


If you have run a cluster command, then running vacuum full will make the 
table and index layout worse, not better.


Matthew

--
Riker: Our memory pathways have become accustomed to your sensory input.
Data:  I understand - I'm fond of you too, Commander. And you too Counsellor

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-23 Thread Robert Haas
On Sun, Nov 22, 2009 at 10:31 AM, Richard Neill rn...@cam.ac.uk wrote:
 Dear All,

 Thanks for your help earlier with the previous question. I wonder if I might
 ask another.


 We have various queries that need to run, of which I'm going to focus on 2,
 vox and du_report.

 Both of them are extremely sensitive to the precise values of
 random_page_cost and seq_page_cost. Experimentally, I've used:

  A:  seq_page_cost = 0.25;  random_page_cost = 0.75
  B:  seq_page_cost = 0.5;  random_page_cost = 2
  C: seq_page_cost = 1;  random_page_cost = 4

 (and a few in between).


 If I pick the wrong one, then either vox becomes 2 orders of magnitude
 slower (22ms - 3.5 seconds), or du_report becomes 10x slower. I can't use
 the same setting for both.

 So, as a very ugly hack, I've tuned the sweet spots for each query.
 Vox normally sits at B; du_report at C.


 Now, the real killer is that the position of that sweet spot changes over
 time as the DB ages over a few days (even though autovacuum is on).

 Worse still, doing a cluster of most of the tables and vacuum full analyze
 made most of the queries respond much better, but the vox query became very
 slow again, until I set it to A (which, a few days ago, did not work well).


 *  Why is the query planner so precisely sensitive to the combination of
 page costs and time since last vacuum full?

It sounds like your tables are getting bloated.  If you have
autovacuum turned on, this shouldn't be happening.  What sort of
workload is this?  What PG version?

 * Why is it that what improves one query can make another get so much worse?

Because it changes the plan you get.

 * Is there any way I can nail the query planner to a particular query plan,
 rather than have it keep changing its mind?

See other responses.

 * Is it normal to keep having to tune the query-planner's settings, or
 should it be possible to set it once, and leave it?

Leave it.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query times change by orders of magnitude as DB ages

2009-11-22 Thread Richard Neill

Dear All,

Thanks for your help earlier with the previous question. I wonder if I 
might ask another.



We have various queries that need to run, of which I'm going to focus on 
2, vox and du_report.


Both of them are extremely sensitive to the precise values of 
random_page_cost and seq_page_cost. Experimentally, I've used:


 A:  seq_page_cost = 0.25;  random_page_cost = 0.75
 B:  seq_page_cost = 0.5;  random_page_cost = 2
 C: seq_page_cost = 1;  random_page_cost = 4

(and a few in between).


If I pick the wrong one, then either vox becomes 2 orders of magnitude 
slower (22ms - 3.5 seconds), or du_report becomes 10x slower. I can't 
use the same setting for both.


So, as a very ugly hack, I've tuned the sweet spots for each query.
Vox normally sits at B; du_report at C.


Now, the real killer is that the position of that sweet spot changes 
over time as the DB ages over a few days (even though autovacuum is on).


Worse still, doing a cluster of most of the tables and vacuum full 
analyze   made most of the queries respond much better, but the vox 
query became very slow again, until I set it to A (which, a few days 
ago, did not work well).



*  Why is the query planner so precisely sensitive to the combination of 
page costs and time since last vacuum full?


* Why is it that what improves one query can make another get so much worse?

* Is there any way I can nail the query planner to a particular query 
plan, rather than have it keep changing its mind?


* Is it normal to keep having to tune the query-planner's settings, or 
should it be possible to set it once, and leave it?



Tuning this feels rather like adjusting several old radios, which are 
exceptionally finicky about the precise settings, having a very sharp 
resonance peak (in different places), and which drift out of tune at 
different rates. I must be doing something wrong, but what?


Thanks for your advice,

Richard



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-22 Thread Sergey Aleynikov
Hello,

 * Is there any way I can nail the query planner to a particular query plan,
 rather than have it keep changing its mind?

All these setting leads to choosing different plans. If you have small
number of complex sensitive queires, you can run explain on them with
correct settings, then re-order query (joins, subselects) according to
given query plan, and, before running it, call

set local join_collapse_limit = 1;
set local from_collapse_limit = 1;

This will prevent joins/subselects reordering inside current
transaction block, leading to consistent plans. But that gives no 100%
guarantee for chosing, for example, hash join over nested loop.

You can, as noted in presiouse message, experiment with gego_*
constants - especially, lower geqo_threshold to catch better plans
(but this can take many runs). Or, for production, set geqo=off - this
can dramatically increasy query planning, but results would be more
consistent.

Is it normal to keep having to tune the query-planner's settings, or should it 
be possible to set it once, and leave it?

I have collapse limits set for some complex reporting queries, and
think it's adequate solutuon.

Worse still, doing a cluster of most of the tables and vacuum full analyze   
made most of the queries respond much better, but the vox query became very 
slow again, until I set it to A (which, a few days ago, did not work well).

Is your autovacuuming tuned correctly? For large tables, i set it
running much more agressivly then in default install.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance