Tom,

I've found the problem with TPC-R query #19.  And it, unfortunately, appears 
to be a problem in the PostgreSQL query planner.

To sum up the below:  it appears that whenever a set of WHERE conditions 
exceeds a certain level of complexity, the planner just ignores all 
applicable indexes and goes for a seq scan.   While this may be unavoidable 
to some degree, it seems to me that we need to raise the threshold of 
complexity at which it does this.

tpcr=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3 20030226 
(prerelease) (SuSE Linux)
(1 row)

I've tested a number of indexes on the query, and found the two most efficient 
on subsets of the query.  Thus:

explain analyze
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#33'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 8 and l_quantity <= 8 + 10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        );
                                                                                       
           
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=10380.70..10380.70 rows=1 width=30) (actual 
time=161.61..161.61 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..10380.67 rows=13 width=30) (actual 
time=81.54..161.47 rows=17 loops=1)
         ->  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=62 
width=4) (actual time=81.21..137.24 rows=98 loops=1)
               Index Cond: (p_brand = 'Brand#33'::bpchar)
               Filter: (((p_container = 'SM CASE'::bpchar) OR (p_container = 
'SM BOX'::bpchar) OR (p_container = 'SM PACK'::bpchar) OR (p_container = 'SM 
PKG'::bpchar)) AND (p_size >= 1) AND (p_size <= 5))
         ->  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.84 
rows=1 width=26) (actual time=0.22..0.24 rows=0 loops=98)
               Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND 
(lineitem.l_quantity >= 8::numeric) AND (lineitem.l_quantity <= 18::numeric))
               Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR 
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 161.71 msec



This also works for a similar query:

explain analyze
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#52'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 14 and l_quantity <= 14 + 10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        );

 Aggregate  (cost=11449.36..11449.36 rows=1 width=30) (actual 
time=195.72..195.72 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..11449.29 rows=28 width=30) (actual 
time=56.42..195.39 rows=48 loops=1)
         ->  Index Scan using idx_part_1 on part  (cost=0.00..9466.33 rows=139 
width=4) (actual time=56.15..153.17 rows=166 loops=1)
               Index Cond: (p_brand = 'Brand#52'::bpchar)
               Filter: (((p_container = 'MED BAG'::bpchar) OR (p_container = 
'MED BOX'::bpchar) OR (p_container = 'MED PKG'::bpchar) OR (p_container = 
'MED PACK'::bpchar)) AND (p_size >= 1) AND (p_size <= 10))
         ->  Index Scan using idx_lineitem_3 on lineitem  (cost=0.00..14.29 
rows=1 width=26) (actual time=0.23..0.25 rows=0 loops=166)
               Index Cond: (("outer".p_partkey = lineitem.l_partkey) AND 
(lineitem.l_quantity >= 14::numeric) AND (lineitem.l_quantity <= 
24::numeric))
               Filter: (((l_shipmode = 'AIR'::bpchar) OR (l_shipmode = 'AIR 
REG'::bpchar)) AND (l_shipinstruct = 'DELIVER IN PERSON'::bpchar))
 Total runtime: 195.82 msec
(9 rows)


If, however, I combine the two where clauses with an OR, the planner gets 
confused and insists on loading the entire tables into memory (even though I 
don't have that much memory):

explain
select
        sum(l_extendedprice* (1 - l_discount)) as revenue
from
        lineitem,
        part
where
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#33'
                and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
                and l_quantity >= 8 and l_quantity <= 8 + 10
                and p_size between 1 and 5
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        )
        or
        (
                p_partkey = l_partkey
                and p_brand = 'Brand#52'
                and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
                and l_quantity >= 14 and l_quantity <= 14 + 10
                and p_size between 1 and 10
                and l_shipmode in ('AIR', 'AIR REG')
                and l_shipinstruct = 'DELIVER IN PERSON'
        );

 Aggregate  (cost=488301096525.25..488301096525.25 rows=1 width=146)
   ->  Nested Loop  (cost=0.00..488301096525.15 rows=42 width=146)
         Join Filter: ((("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'SM CASE'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND 
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM 
CASE'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) 
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND 
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'SM BOX'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND 
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM 
BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) 
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND 
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'SM PACK'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND 
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM 
PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) 
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND 
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'SM PKG'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#33'::bpchar) AND 
("outer".l_quantity >= 8::numeric) AND ("outer".l_quantity <= 18::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 5) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'SM 
PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#33'::bpchar) AND ("outer".l_quantity >= 8::numeric) 
AND ("outer".l_quantity <= 18::numeric) AND ("inner".p_size >= 1) AND 
("inner".p_size <= 5) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'MED BAG'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND 
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED 
BAG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'MED BOX'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND 
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED 
BOX'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'MED PKG'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND 
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED 
PKG'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)) OR (("outer".l_shipmode = 'AIR'::bpchar) AND 
("inner".p_container = 'MED PACK'::bpchar) AND ("inner".p_partkey = 
"outer".l_partkey) AND ("inner".p_brand = 'Brand#52'::bpchar) AND 
("outer".l_quantity >= 14::numeric) AND ("outer".l_quantity <= 24::numeric) 
AND ("inner".p_size >= 1) AND ("inner".p_size <= 10) AND 
("outer".l_shipinstruct = 'DELIVER IN PERSON'::bpchar)) OR 
(("outer".l_shipmode = 'AIR REG'::bpchar) AND ("inner".p_container = 'MED 
PACK'::bpchar) AND ("inner".p_partkey = "outer".l_partkey) AND 
("inner".p_brand = 'Brand#52'::bpchar) AND ("outer".l_quantity >= 
14::numeric) AND ("outer".l_quantity <= 24::numeric) AND ("inner".p_size >= 
1) AND ("inner".p_size <= 10) AND ("outer".l_shipinstruct = 'DELIVER IN 
PERSON'::bpchar)))
         ->  Seq Scan on lineitem  (cost=0.00..235620.15 rows=6001215 
width=95)
         ->  Seq Scan on part  (cost=0.00..7367.00 rows=200000 width=51)


You'll pardon me for not doing an "ANALYZE", but I didn't want to wait 
overnight.   Manually disabling Seqscan and Nestloop did nothing to affect 
this query plan; neither did removing the aggregate.

Tommorrow I will test 7.4 Beta 4.

How can we fix this?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to