You're right, a lot of information is missing but I'm unsure that the other 
information will make too much difference.
I could drop all the other indexes on the table which aren't used here and the 
queries would still use the indexes they are currently using.

I appreciate the idea that a boolean column selectivity might not be great. 
I've just tried creating indexes as follows:
CREATE INDEX messageq17
  ON messageq_table
  USING btree
  (staff_ty, staff_id, incoming, inactive, entity_id);

CREATE INDEX messageq18
  ON messageq_table
  USING btree
  (staff_ty, staff_id);

When running query 2 as it stands the same thing happens, it still uses the 
messageq1 index.

The query is logically the same as using max, you are correct, but it's 
generated on the fly so the limit or the queried column may change.

The query plans were for the second query as I'm unsure that the first query is 
really relevant, it was simply there to justify the messageq1 index.

Thanks,

From: Robert Klemme [mailto:shortcut...@googlemail.com]
Sent: 03 August 2012 10:18
To: Russell Keane; pgsql-performance
Subject: Re: [PERFORM] query using incorrect index


On Thu, Aug 2, 2012 at 4:54 PM, Russell Keane 
<russell.ke...@inps.co.uk<mailto:russell.ke...@inps.co.uk>> wrote:

Using PG 9.0 and given 2 queries (messageq_current is a view on the 
messageq_table):

select entity_id from messageq_current
where entity_id = 123456;

select entity_id from messageq_current
where incoming = true
and inactive = false
and staff_ty = 2
and staff_id = 2
order by entity_id desc
limit 1;

and 2 indexes (there are 15 indexes in total but they are left out here for 
brevity):

messageq1:
CREATE INDEX messageq1
  ON messageq_table
  USING btree
  (entity_id);

And messageq4:

CREATE INDEX messageq4
  ON messageq_table
  USING btree
  (inactive, staff_ty, staff_id, incoming, tran_dt);

Of course a lot of detail is missing (full schema of table, all the other 
indexes) but with "inactive" a boolean column I suspect selectivity might not 
be too good here and so having it as a first column in a covering index is at 
least questionable.  If query 2 is frequent you might also want to consider 
creating a partial index only on (staff_ty, staff_id) with filtering criteria 
on incoming and active as present in query 2.

Btw, why don't you formulate query 2 as max query?
select max(entity_id) as entity_id
from messageq_current
where incoming = true
and inactive = false
and staff_ty = 2
and staff_id = 2;


With the messageq1 index present, query 1 is very quick (0.094ms) and query 2 
is very slow (241.515ms).
If I remove messageq1 then query 2 uses messageq4 and is very quick (0.098ms) 
but then query 1 must use a different index and is therefore slower (> 5ms).

So, to the Query plans:

Of which query?  Shouldn't there be four plans in total?  I'd post plans here:
http://explain.depesz.com/

With messageq1:
"Limit  (cost=0.00..2670.50 rows=1 width=4) (actual time=241.481..241.481 
rows=0 loops=1)"
"  Output: messageq_table.entity_id"
"  Buffers: shared hit=32 read=18870 written=12"
"  ->  Index Scan Backward using messageq1 on prac_live_10112.messageq_table  
(cost=0.00..66762.53 rows=25 width=4) (actual time=241.479..241.479 rows=0 
loops=1)"
"        Output: messageq_table.entity_id"
"        Filter: (messageq_table.incoming AND (NOT messageq_table.inactive) AND 
(messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND 
(aud_status_to_flag(messageq_table.aud_status) = 1))"
"        Buffers: shared hit=32 read=18870 written=12"
"Total runtime: 241.515 ms"

Without messageq1:
"Limit  (cost=12534.45..12534.45 rows=1 width=4) (actual time=0.055..0.055 
rows=0 loops=1)"
"  Output: messageq_table.entity_id"
"  Buffers: shared read=3"
"  ->  Sort  (cost=12534.45..12534.51 rows=25 width=4) (actual 
time=0.054..0.054 rows=0 loops=1)"
"        Output: messageq_table.entity_id"
"        Sort Key: messageq_table.entity_id"
"        Sort Method:  quicksort  Memory: 17kB"
"        ->  Bitmap Heap Scan on prac_live_10112.messageq_table  
(cost=174.09..12534.32 rows=25 width=4) (actual time=0.043..0.043 rows=0 
loops=1)"
"              Output: messageq_table.entity_id"
"              Recheck Cond: ((messageq_table.staff_ty = 2) AND 
(messageq_table.staff_id = 2))"
"              Filter: (messageq_table.incoming AND (NOT 
messageq_table.inactive) AND (aud_status_to_flag(messageq_table.aud_status) = 
1))"
"              Buffers: shared read=3"
"              ->  Bitmap Index Scan on messageq4  (cost=0.00..174.08 rows=4920 
width=0) (actual time=0.040..0.040 rows=0 loops=1)"
"                    Index Cond: ((messageq_table.inactive = false) AND 
(messageq_table.staff_ty = 2) AND (messageq_table.staff_id = 2) AND 
(messageq_table.incoming = true))"
"                    Buffers: shared read=3"
"Total runtime: 0.098 ms"

Clearly the statistics are off somehow but I really don't know where to start.

Any help you can give me would be very much appreciated.

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Reply via email to