Re: [HACKERS] Selectivity estimation for equality and range queries

2008-01-03 Thread Peter Eisentraut
Am Freitag, 28. Dezember 2007 schrieb Tom Lane:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  I have been observing a case where the row count estimation for LIKE
  'foo' is (much) higher than for LIKE 'foo%', the rest of the query being
  the same. This is a special case of the estimation for equality being
  higher than for a range query that includes the value used in the
  equality.

 Not really --- LIKE estimation is only weakly related to range
 estimation.

Here is a narrowed down example.

* Pattern search

EXPLAIN ANALYZE
SELECT id FROM person
WHERE lower(person.name) LIKE 'foo%'
AND person.follow_nr=0
AND person.person_type='P' AND person.batch_nr=0;

Index Scan using person_idx_3 on person  (cost=0.01..6.03 rows=1 width=8) 
(actual time=0.276..4.917 rows=188 loops=1)
  Index Cond: ((lower((name)::text) ~=~ 'foo'::text) AND (lower((name)::text) 
~~ 'fop'::text) AND (person_type = 'P'::bpchar) AND (batch_nr = 0) AND 
(follow_nr = 0))
  Filter: (lower((name)::text) ~~ 'foo%'::text)

* Equality search

EXPLAIN ANALYZE
SELECT id FROM person
WHERE lower(person.name) LIKE 'foo'
AND person.follow_nr=0
AND person.person_type='P' AND person.batch_nr=0;

Index Scan using person_idx_3 on person  (cost=0.00..2527.84 rows=627 width=8) 
(actual time=0.043..0.072 rows=7 loops=1)
  Index Cond: ((lower((name)::text) ~=~ 'foo'::text) AND (person_type = 
'P'::bpchar) AND (batch_nr = 0) AND (follow_nr = 0))
  Filter: (lower((name)::text) ~~ 'foo'::text)

So it expects 1 row for the pattern search and 627 rows for the equality
search, which doesn't make mathematical sense.

What I had meant earlier with range and equality estimation is that this is
(presumably) about the same as guessing 1 row for (x = 5 AND x  6), but 627
rows for (x = 5).  Somehow, these two estimation methods should be talking
to each other.

PostgreSQL is version 8.1.9.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Selectivity estimation for equality and range queries

2008-01-03 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Here is a narrowed down example.

In what locale/encoding?  Can we see the pg_stats row for person_idx_3?

 PostgreSQL is version 8.1.9.

So it hasn't got the LIKE estimation fixes I put in two months ago ...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Selectivity estimation for equality and range queries

2007-12-28 Thread Peter Eisentraut
I have been observing a case where the row count estimation for LIKE 'foo' is 
(much) higher than for LIKE 'foo%', the rest of the query being the same.  
This is a special case of the estimation for equality being higher than for a 
range query that includes the value used in the equality.

I haven't been able to get a copy of the data from the client yet, but 
considering the nature of the data and the description of the selectivity 
estimation algorithms 
(http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html), 
this behavior appears to be mathematically plausible.  I have been wondering 
whether in general the eqsel should try to compare its result with the 
estimation of (x = 'foo' AND x = 'foo') and use that as a ceiling or 
something.

Has anyone else observed something similar?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Selectivity estimation for equality and range queries

2007-12-28 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 I have been observing a case where the row count estimation for LIKE 'foo' is
 (much) higher than for LIKE 'foo%', the rest of the query being the same.  
 This is a special case of the estimation for equality being higher than for a
 range query that includes the value used in the equality.

Not really --- LIKE estimation is only weakly related to range
estimation.

Relevant questions here include exactly which PG version is in use and
what's the database encoding/locale.  If it's not C locale, the fixes
I made in selfuncs.c during November might be relevant.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly