Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Tom Lane
I wrote:
 I seem to recall bringing up the question of whether
 we could find a less implementation-specific way of commanding this
 behavior, but I can't find it in the archives right now.

Ah, here it is:
http://archives.postgresql.org/pgsql-hackers/2004-03/msg00502.php

No responses :-(

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Michael Fuhr
On Sat, Oct 01, 2005 at 02:19:06AM -0400, Tom Lane wrote:
 I wrote:
  I seem to recall bringing up the question of whether
  we could find a less implementation-specific way of commanding this
  behavior, but I can't find it in the archives right now.
 
 Ah, here it is:
 http://archives.postgresql.org/pgsql-hackers/2004-03/msg00502.php
 
 No responses :-(

Would an ALTER INDEX SET STATISTICS form be possible?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 On Sat, Oct 01, 2005 at 02:19:06AM -0400, Tom Lane wrote:
 Ah, here it is:
 http://archives.postgresql.org/pgsql-hackers/2004-03/msg00502.php

 Would an ALTER INDEX SET STATISTICS form be possible?

It's not so much the table/index misnomer that's bothering me, it's
the lack of a clean way to identify which column of the index you
are talking about.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Michael Fuhr
On Sat, Oct 01, 2005 at 02:42:32AM -0400, Tom Lane wrote:
 Michael Fuhr [EMAIL PROTECTED] writes:
  Would an ALTER INDEX SET STATISTICS form be possible?
 
 It's not so much the table/index misnomer that's bothering me, it's
 the lack of a clean way to identify which column of the index you
 are talking about.

Ah, I see -- I wasn't thinking about expressions in multicolumn
indexes.  What about identifying the column with the expression
itself, ala quote_ident(pg_get_indexdef())?  That might be tedious
for the user to type but it would be attractive from a self-documentation
standpoint.

ALTER INDEX indexname ALTER COLUMN the expression SET STATISTICS 100;

I do see that indexes allow multiple instances of the same expression,
so this approach could be ambiguous.  Or should such repetition be
prohibited as it is with column names?

test= CREATE TABLE foo (x integer);
CREATE TABLE
test= CREATE INDEX foo1_idx ON foo (x, x);
ERROR:  duplicate key violates unique constraint 
pg_attribute_relid_attnam_index
test= CREATE INDEX foo2_idx ON foo (abs(x), abs(x));
CREATE INDEX

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Bruce Momjian
Tom Lane wrote:
 Michael Fuhr [EMAIL PROTECTED] writes:
  I've noticed that row count estimates for expression indexes appear
  to rely on default_statistics_target rather than on a column's
  actual statistics target.  That is, if I use ALTER TABLE SET
  STATISTICS to increase a column's statistics target and then run
  ANALYZE, then estimates for non-expression-index queries improve
  as expected.  However, queries that use an expression index remain
  accurate for only around the N most common values, where N is the
  default_statistics_target that was in effect when ANALYZE ran.
 
 The code does in fact honor per-column statistics targets attached to
 expression indexes, viz
 
 alter table myfuncindex alter column pg_expression_1 set statistics 100;
 
 This isn't documented, mainly because pg_dump doesn't promise to dump
 such things, which it doesn't do because I didn't want to see the
 pg_expression_N naming for expression index columns become graven on
 stone tablets.  I seem to recall bringing up the question of whether
 we could find a less implementation-specific way of commanding this
 behavior, but I can't find it in the archives right now.

Is this a TODO?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 ALTER INDEX indexname ALTER COLUMN the expression SET STATISTICS 100;

Yeah, that could probably be made to work.

 I do see that indexes allow multiple instances of the same expression,
 so this approach could be ambiguous.

I can't think of an actual use for that, though, so we could just ignore
the possible ambiguity.  Or we could have the ALTER update all columns
matching the given expression.

 test= CREATE INDEX foo1_idx ON foo (x, x);
 ERROR:  duplicate key violates unique constraint 
 pg_attribute_relid_attnam_index

Hmm, seems like there should be a more direct check for this ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Michael Fuhr
I've noticed that row count estimates for expression indexes appear
to rely on default_statistics_target rather than on a column's
actual statistics target.  That is, if I use ALTER TABLE SET
STATISTICS to increase a column's statistics target and then run
ANALYZE, then estimates for non-expression-index queries improve
as expected.  However, queries that use an expression index remain
accurate for only around the N most common values, where N is the
default_statistics_target that was in effect when ANALYZE ran.  I'm
still rummaging through the archives looking for past discussion;
is this behavior a known limitation or just an oversight?

CREATE TABLE foo (x integer);

CREATE INDEX foo_x_idx ON foo (x);
CREATE INDEX foo_abs_x_idx ON foo (abs(x));

INSERT INTO foo (x)
  SELECT r1 % r2
  FROM generate_series(1, 100) AS g1(r1),
   generate_series(1, 100) AS g2(r2);

SET default_statistics_target TO 15;
ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
ANALYZE foo;

SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
  most_common_vals   
-
 {0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18}
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
  QUERY PLAN
  
--
 Bitmap Heap Scan on foo  (cost=2.72..50.28 rows=205 width=4) (actual 
time=0.370..1.766 rows=220 loops=1)
   Recheck Cond: (x = 13)
   -  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.72 rows=205 width=0) 
(actual time=0.314..0.314 rows=220 loops=1)
 Index Cond: (x = 13)
 Total runtime: 2.905 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
QUERY PLAN  
  
--
 Bitmap Heap Scan on foo  (cost=2.72..50.80 rows=205 width=4) (actual 
time=0.358..1.720 rows=220 loops=1)
   Recheck Cond: (abs(x) = 13)
   -  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.72 rows=205 width=0) 
(actual time=0.305..0.305 rows=220 loops=1)
 Index Cond: (abs(x) = 13)
 Total runtime: 2.875 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
  QUERY PLAN
  
--
 Bitmap Heap Scan on foo  (cost=2.60..49.75 rows=172 width=4) (actual 
time=0.312..1.442 rows=180 loops=1)
   Recheck Cond: (x = 18)
   -  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.60 rows=172 width=0) 
(actual time=0.262..0.262 rows=180 loops=1)
 Index Cond: (x = 18)
 Total runtime: 2.393 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
   QUERY PLAN   
 
-
 Bitmap Heap Scan on foo  (cost=2.22..43.65 rows=63 width=4) (actual 
time=0.313..1.436 rows=180 loops=1)
   Recheck Cond: (abs(x) = 18)
   -  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.22 rows=63 width=0) 
(actual time=0.263..0.263 rows=180 loops=1)
 Index Cond: (abs(x) = 18)
 Total runtime: 2.418 ms
(5 rows)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Bruce Momjian

This is expected.  The main TODO items is:

* Allow accurate statistics to be collected on indexes with more than
  one column or expression indexes, perhaps using per-index statistics

Basically, we don't have multi-column or expression statistics.  ANALYZE
just analyzes columns, even if an expression index exists.

---

Michael Fuhr wrote:
 I've noticed that row count estimates for expression indexes appear
 to rely on default_statistics_target rather than on a column's
 actual statistics target.  That is, if I use ALTER TABLE SET
 STATISTICS to increase a column's statistics target and then run
 ANALYZE, then estimates for non-expression-index queries improve
 as expected.  However, queries that use an expression index remain
 accurate for only around the N most common values, where N is the
 default_statistics_target that was in effect when ANALYZE ran.  I'm
 still rummaging through the archives looking for past discussion;
 is this behavior a known limitation or just an oversight?
 
 CREATE TABLE foo (x integer);
 
 CREATE INDEX foo_x_idx ON foo (x);
 CREATE INDEX foo_abs_x_idx ON foo (abs(x));
 
 INSERT INTO foo (x)
   SELECT r1 % r2
   FROM generate_series(1, 100) AS g1(r1),
generate_series(1, 100) AS g2(r2);
 
 SET default_statistics_target TO 15;
 ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
 ANALYZE foo;
 
 SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
   most_common_vals   
 -
  {0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18}
 (1 row)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
   QUERY PLAN  
 
 --
  Bitmap Heap Scan on foo  (cost=2.72..50.28 rows=205 width=4) (actual 
 time=0.370..1.766 rows=220 loops=1)
Recheck Cond: (x = 13)
-  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.72 rows=205 width=0) 
 (actual time=0.314..0.314 rows=220 loops=1)
  Index Cond: (x = 13)
  Total runtime: 2.905 ms
 (5 rows)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
 QUERY PLAN
 
 --
  Bitmap Heap Scan on foo  (cost=2.72..50.80 rows=205 width=4) (actual 
 time=0.358..1.720 rows=220 loops=1)
Recheck Cond: (abs(x) = 13)
-  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.72 rows=205 width=0) 
 (actual time=0.305..0.305 rows=220 loops=1)
  Index Cond: (abs(x) = 13)
  Total runtime: 2.875 ms
 (5 rows)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
   QUERY PLAN  
 
 --
  Bitmap Heap Scan on foo  (cost=2.60..49.75 rows=172 width=4) (actual 
 time=0.312..1.442 rows=180 loops=1)
Recheck Cond: (x = 18)
-  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.60 rows=172 width=0) 
 (actual time=0.262..0.262 rows=180 loops=1)
  Index Cond: (x = 18)
  Total runtime: 2.393 ms
 (5 rows)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
QUERY PLAN 

 -
  Bitmap Heap Scan on foo  (cost=2.22..43.65 rows=63 width=4) (actual 
 time=0.313..1.436 rows=180 loops=1)
Recheck Cond: (abs(x) = 18)
-  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.22 rows=63 width=0) 
 (actual time=0.263..0.263 rows=180 loops=1)
  Index Cond: (abs(x) = 18)
  Total runtime: 2.418 ms
 (5 rows)
 
 -- 
 Michael Fuhr
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes:
 I've noticed that row count estimates for expression indexes appear
 to rely on default_statistics_target rather than on a column's
 actual statistics target.  That is, if I use ALTER TABLE SET
 STATISTICS to increase a column's statistics target and then run
 ANALYZE, then estimates for non-expression-index queries improve
 as expected.  However, queries that use an expression index remain
 accurate for only around the N most common values, where N is the
 default_statistics_target that was in effect when ANALYZE ran.

The code does in fact honor per-column statistics targets attached to
expression indexes, viz

alter table myfuncindex alter column pg_expression_1 set statistics 100;

This isn't documented, mainly because pg_dump doesn't promise to dump
such things, which it doesn't do because I didn't want to see the
pg_expression_N naming for expression index columns become graven on
stone tablets.  I seem to recall bringing up the question of whether
we could find a less implementation-specific way of commanding this
behavior, but I can't find it in the archives right now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Michael Fuhr
On Fri, Sep 30, 2005 at 11:59:26PM -0400, Bruce Momjian wrote:
 This is expected.  The main TODO items is:
   
   * Allow accurate statistics to be collected on indexes with more than
 one column or expression indexes, perhaps using per-index statistics
 
 Basically, we don't have multi-column or expression statistics.  ANALYZE
 just analyzes columns, even if an expression index exists.

But the row count estimates imply that expression index queries do
use column statistics, presumably as a proxy in the absence of
expression statistics.  This looks like a relevant commit:

http://archives.postgresql.org/pgsql-committers/2004-02/msg00124.php

The behavior I observed is that the planner does appear to use
column statistics when planning an expression index query, but it
doesn't appear to honor a column's non-default statistics target.
In other words:

* Row count estimates for expression index queries (at least simple
  ones) are reasonably accurate for the N most common column values,
  where N is the value of default_statistics_target when ANALYZE
  was run.

* Specifically setting the column's statistics target with ALTER
  TABLE SET STATISTICS doesn't result in better statistics for
  expression index queries.

That difference in behavior seems odd: if default_statistics_target
has an effect, why doesn't ALTER TABLE SET STATISTICS?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Michael Fuhr
On Sat, Oct 01, 2005 at 12:53:03AM -0400, Tom Lane wrote:
 The code does in fact honor per-column statistics targets attached to
 expression indexes, viz
 
 alter table myfuncindex alter column pg_expression_1 set statistics 100;

Aha -- that's the piece I didn't know about.  I was wondering where
those statistics were being stored, since they were affected by
default_statistics_target but not by per-column statistics targets.
And now I see them when I don't restrict queries against pg_stats
by just the table or column name.  Thanks.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings