Re: [GENERAL] forcing use of a specific (expression) index?

2007-05-03 Thread Dan Weber

Thanks to you and Tom.  The partial index solution is working splendidly.

On 5/3/07, Richard Huxton <[EMAIL PROTECTED]> wrote:

Dan Weber wrote:
> I made an expression index specifically for that where clause:
>
> CREATE INDEX special_testing_idx on my_table (((bool_1 or
> int_1 = 0) AND (int_2 IS NULL) AND (int_3 IS
> NULL) AND (protocol = 2)));

No, you haven't. What you've done here is create an index *for that
expression*. And it's not terribly useful because your expression will
only have two possible values: true, false.

What you were after is something like:

CREATE INDEX special_idx2 ON my_table (id) WHERE (...long expression...)

--
   Richard Huxton
   Archonet Ltd



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] forcing use of a specific (expression) index?

2007-05-02 Thread Dan Weber

I've got a table with about 15 columns and 200,000 rows.  I have indexes on a
lot of my columns, but postgres doesn't seem to be grabbing the ideal index --
in this case, an expression index that exactly matches my WHERE clause.

I have the following query:

SELECT columns FROM my_table WHERE (bool_1 or int_1 = 0)
AND (int_2 IS NULL) AND (int_3 IS NULL) AND (protocol =
2) ORDER BY id LIMIT 1;

Here is an explain analyze:

Limit (cost=0.00..8.43 rows=1 width=111) (actual
  time=17511.939..17511.940 rows=1 loops=1)

 -> Index Scan using my_table_id_key on my_table
  (cost=0.00..384000.58 rows=45562 width=111) (actual
  time=17511.935..17511.935 rows=1 loops=1)

   Filter: ((bool_1 OR (int_1 = 0)) AND (int_2
  IS NULL) AND (int_3 IS NULL) AND (protocol = 2))

Total runtime: 17512.031 ms

I made an expression index specifically for that where clause:

CREATE INDEX special_testing_idx on my_table (((bool_1 or
int_1 = 0) AND (int_2 IS NULL) AND (int_3 IS
NULL) AND (protocol = 2)));

It looks like the database is letting the ORDER BY id LIMIT 1 override
the above index.  I suppose I could live with that, except removing both
the ORDER and LIMIT clauses doesn't make it use that index.

Here is the EXPLAIN ANALYZE without the ORDER or LIMIT:

Bitmap Heap Scan on my_table (cost=1558.71..10154.01 rows=45562
  width=111) (actual time=7442.835..14391.969 rows=678 loops=1)

 Filter: ((bool_1 OR (int_1 = 0)) AND (int_2 IS
  NULL) AND (int_3 IS NULL) AND (protocol = 2))

 -> Bitmap Index Scan on my_table_int_2_null_idx
  (cost=0.00..1558.71 rows=122487 width=0) (actual
  time=7081.952..7081.952 rows=123001 loops=1)

   Index Cond: ((int_2 IS NULL) = true)

Total runtime: 14392.966 ms

I've done lots of ANALYZEs of my_table, as well as tried setting
enable_seqscan to off.

Have I just created too many indexes?  Is there some syntax I can use
to make the database recognize my big hairy WHERE clause?  Would I be
better of making a special boolean column that resolves to that expression,
and then indexing that column?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] query takes 65 times longer if I add 1 column (explain attached)

2006-11-02 Thread Dan Weber
What have you got work_mem set to?The default, 1024.
 The fact that the estimates diverge so far from reality can no doubt beblamed on the horridly bad rowcount estimates: 3 rows estimated vs
179 actual for table_one, and 128592 estimated vs 4202 actual fortable_two.  Have you ANALYZEd these tables lately?Nope.  Doing that caused the first operation to respond very quickly.(And the second as well, of course.)
Thanks for the assistance.


[GENERAL] query takes 65 times longer if I add 1 column (explain attached)

2006-11-02 Thread Dan Weber
I've got a query that takes a very long time to execute if I add anadditional column to the SELECT clause. Here is Query A, the slow one.  I've numbered the rowsof the EXPLAIN for convenience.psql=# explain analyze SELECT timestamp, src_ip, dst_ip, id,
  string_one, string_two FROM table_one LEFT JOIN table_two USING(id)  WHERE TRUE AND src_ip = '192.168.10.128' ORDER BY timestamp DESC  LIMIT 10; QUERY PLAN
---a1. Limit (cost=18954.93..18954.95 rows=10 width=136)    (actual time=169038.823..169038.851 rows=10 loops=1)a2.   ->  Sort  (cost=
18954.93..18959.75 rows=1929 width=136)      (actual time=169038.816..169038.825 rows=10 loops=1)a3. Sort Key: table_one."timestamp"a4. ->  Nested Loop Left Join        (cost=
0.00..18849.66 rows=1929 width=136)         (actual time=862.661..169036.636 rows=179 loops=1)a5.   Join Filter: ("outer".id = "inner".id)a6.   ->  Index Scan using table_one_type_key on
          table_one  (cost=0.00..17.70 rows=3 width=72)          (actual time=0.083..864.386 rows=179 loops=1)a7. Index Cond: (src_ip = '192.168.10.128
'::inet)a8.   ->  Seq Scan on table_two  (cost=0.00..4669.92          rows=128592 width=68)          (actual time=842.166..933.496 rows=4202 loops=179) Total runtime: 169039.368
 ms(9 rows)Time: 169041.495 msIt takes almost 3 minutes to run.  If I knock off either string_one orstring_two, the query runs in <3 seconds.  Here is Query B:psql=# explain analyze SELECT timestamp, src_ip, dst_ip, id,
  string_one FROM table_one LEFT JOIN table_two USING(id) WHERE TRUE  AND src_ip = '192.168.10.128' ORDER BY timestamp DESC LIMIT 10; QUERY PLAN---
b1. Limit  (cost=17617.45..17617.47 rows=10 width=104    (actual time=2644.975..2645.003 rows=10 loops=1)b2.   ->  Sort  (cost=17617.45..17622.27 rows=1929 width=104)      (actual time=2644.968..2644.978
 rows=10 loops=1)b3. Sort Key: table_one."timestamp"b4. ->  Nested Loop Left Join  (cost=5810.75..17512.19         rows=1929 width=104)        (actual time=956.129..2643.458
 rows=179 loops=1)b5.   Join Filter: ("outer".id = "inner".id)b6.   ->  Bitmap Heap Scan on table_one          (cost=7.24..13.72 rows=3 width=72) 
          (actual time=39.745..67.818 rows=179 loops=1)b7. Recheck Cond: (src_ip = '192.168.10.128'::inet)b8. ->  Bitmap Index Scan on table_one_type_key  
                (cost=0.00..7.24 rows=3 width=0)                 (actual time=29.399..29.399 rows=179 loops=1)b9.   Index Cond: (src_ip = '192.168.10.128
'::inet)b10.  ->  Materialize  (cost=5803.51..8094.43           rows=128592 width=36)          (actual time=4.573..9.462 rows=4202 loops=179)b11.    ->  Seq Scan on table_two  
                (cost=0.00..4669.92 rows=128592 width=36)                 (actual time=818.216..895.127             rows=4202 loops=1) Total runtime: 2646.899 ms(12 rows)If we compare each EXPLAIN, they're about the same down through line
5, the Join Filter.  But Query A joins things from an "Index Scan" anda "Seq Scan," while Query B joins things from a "Bitmap Index Scan"and a "Materialize."1. Why is the additional column causing such a big change in the
   searching technique, given that the column is not involved in    the WHERE at all?2. Besides indices, how can I coerce Postgres to act like Query   B instead of like Query A?