Tom Lane wrote:
Alessandro Baretta [EMAIL PROTECTED] writes:
Tom Lane wrote:
I'm curious to see how many rows the planner thinks this will produce,
and whether it will use the index or not.
dmd-freerp-1-alex=# EXPLAIN ANALYZE SELECT * FROM articolo WHERE
articolo.xdbs_modified '2006-01-08 18:25:00+01';
QUERY PLAN
---
Index Scan using articolo_modified_index on articolo (cost=0.00..3914.91
rows=17697 width=653) (actual time=0.032..0.032 rows=0 loops=1)
Index Cond: (xdbs_modified '2006-01-08 18:25:00'::timestamp without time
zone)
Total runtime: 0.150 ms
(3 rows)
Well, there's your problem: 17697 rows estimated vs 0 actual. With a
closer row estimate it would've probably done the right thing for the
join problem.
Hmmm, what you are telling me is very interesting, Tom. So, let me see if I got
this straight: the first 'rows=... in the result from EXPLAIN ANALYZE gives me
estimates, while the second gives the actual cardinality of the selected record
set. Correct? If this is true, two questions arise: why is the estimated number
of rows completele wrong, and why, given such a large estimated record set does
PostgreSQL schedule an Index Scan as opposed to a Seq Scan?
How many rows are really in the table, anyway? Could we see the
pg_stats row for articolo.xdbs_modified?
dmd-freerp-1-alex=# select count(*) from articolo;
count
---
53091
(1 row)
dmd-freerp-1-alex=# explain analyze select * from articolo;
QUERY PLAN
-
Seq Scan on articolo (cost=0.00..1439.91 rows=53091 width=653) (actual
time=0.013..151.189 rows=53091 loops=1)
Total runtime: 295.152 ms
(2 rows)
Now let me get the pg_stats for xdbs_modified.
dmd-freerp-1-alex=# select * from pg_stats where tablename = 'articolo' and
attname = 'xdbs_modified';
schemaname | tablename |attname| null_frac | avg_width | n_distinct |
most_common_vals| most_common_freqs | histogram_bounds | correlation
+---+---+---+---+++---+--+-
public | articolo | xdbs_modified | 0 | 8 | 1 |
{2006-01-10 08:12:58.605327} | {1} | | 1
(1 row)
For sake of simplicity I have re-timestamped all tuples in the table with the
current timestamp, as you can see above. Now, obviously, the planner must
estimate ~0 rows for queries posing a selection condition on xdbs_modified, for
any value other than 2006-01-10 08:12:58.605327. Let me try selecting from
articolo first.
dmd-freerp-1-alex=# EXPLAIN ANALYZE SELECT * FROM articolo WHERE
articolo.xdbs_modified '2006-01-10 18:25:00+01';
QUERY PLAN
Index Scan using articolo_modified_index on articolo (cost=0.00..2.01 rows=1
width=653) (actual time=0.139..0.139 rows=0 loops=1)
Index Cond: (xdbs_modified '2006-01-10 18:25:00'::timestamp without time
zone)
Total runtime: 0.257 ms
(3 rows)
The planner produces a sensible estimate of the number of rows and consequently
chooses the appropriate query plan. Now, the join.
dmd-freerp-1-alex=# explain analyze SELECT * FROM articolo JOIN ubicazione
USING (id_ente, id_produttore, id_articolo) WHERE articolo.id_ente = 'dmd' AND
allarme IS NULL AND manutenzione IS NULL AND articolo.xdbs_modified
'2006-01-10 18:25:00+01';
QUERY PLAN
Nested Loop (cost=0.00..5.05 rows=1 width=1146) (actual time=0.043..0.043
rows=0 loops=1)
- Index Scan using articolo_modified_index on articolo (cost=0.00..2.02
rows=1 width=653) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: (xdbs_modified '2006-01-10 18:25:00'::timestamp without
time zone)
Filter: (id_ente = 'dmd'::text)
- Index Scan using ubicazione_fkey_articolo on ubicazione
(cost=0.00..3.02 rows=1 width=536) (never executed)
Index Cond: (('dmd'::text = ubicazione.id_ente) AND
(outer.id_produttore = ubicazione.id_produttore) AND (outer.id_articolo =
ubicazione.id_articolo))
Filter: ((allarme IS NULL) AND (manutenzione IS NULL))
Total runtime: 0.382 ms
(8 rows)
Dear Tom, you're my hero! I have no clue as to how or why the statistics were
wrong yesterday--as I vacuum-analyzed continuously out of lack of any