Re: [GENERAL] Strange statistics

Henrik Thu, 05 Jun 2008 01:34:20 -0700


3 jun 2008 kl. 23.31 skrev Joris Dobbelsteen:

Henrik wrote:
Hi list,
I'm having a table with a lots of file names in it. (Aprox 3million) in a 8.3.1 db.Doing this simple query shows that the statistics is way of but Ican get them right even when I raise the statistics to 1000.
db=# alter table tbl_file alter file_name set statistics 1000;
ALTER TABLE
db=# analyze tbl_file;
ANALYZE
db=# explain analyze select * from tbl_file where lower(file_name)like lower('to%');
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------- BitmapHeap Scan on tbl_file (cost=23.18..2325.13 rows=625 width=134)(actual time=7.938..82.386 rows=17553 loops=1)
  Filter: (lower((file_name)::text) ~~ 'to%'::text)
-> Bitmap Index Scan on tbl_file_idx (cost=0.00..23.02 rows=625width=0) (actual time=6.408..6.408 rows=17553 loops=1)Index Cond: ((lower((file_name)::text) ~>=~ 'to'::text) AND(lower((file_name)::text) ~<~ 'tp'::text))
Total runtime: 86.230 ms
(5 rows)
How can it be off by a magnitude of 28??
These are statistics and represent an only estimate! In this case,the planner seems to be doing the right thing(tm) anyway.
Statistics is a frequently misunderstood subject and usuallyprovides excellent material to draw plain wrong conclusions. Thereis a good chance that due to the physical layout of your data, thealgorithms in the statistics collector, the existence of uncertaintyand some more unknown factors your statistics will be biased. Thisis a situations where you noticed it.
Running "SELECT * FROM pg_stats;" will give you the statistics theplanner uses and can provide some hints to why the planner haschosen these estimates.Probably statistics will vary between ANALYZE runs. Its alsopossible to try "CLUSTER" and friends. Try different queries andlook at the deviations.

Thanks Joris for your input. You are the second person that suggestsCLUSTER for me. Maybe I should take a look. The problem is that ourselect queries are kinda random. Would CLUSTER help then also? ShouldI just CLUSTER on the moste used index or?


Thanks
/henke

All in all, you should really start worrying when the planner startsplanning inefficient queries. Since its a filename, it might behighly irregular (random) and a low statistics target might be goodenough anyways.
Unfortunately I'm not a statistics expert...

- Joris



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Strange statistics

Reply via email to