[GENERAL] Evil Nested Loops

2009-06-03 Thread Ow Mun Heng
What can I do about this plan?



HashAggregate  (cost=8035443.21..8035445.17 rows=157 width=24)
  -  Nested Loop  (cost=37680.95..7890528.72 rows=28982898 width=24)  
suspect
Join Filter: ((a.test_run_start_date_time = date.start_time) AND 
(a.test_run_start_date_time = date.end_time))
-  Bitmap Heap Scan on d_trh_pbert a  (cost=37677.22..1369372.99 
rows=1661440 width=24)
  Recheck Cond: ((test_run_start_date_time = '2009-05-08 
07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
'2009-05-15 06:59:59'::timestamp without time zone))
  -  Bitmap Index Scan on idx_d_trh_pbert_sdate  
(cost=0.00..37261.86 rows=1661440 width=0)
Index Cond: ((test_run_start_date_time = '2009-05-08 
07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
'2009-05-15 06:59:59'::timestamp without time zone))
-  Materialize  (cost=3.73..5.30 rows=157 width=24)
  -  Seq Scan on lookup_ww_date2 date  (cost=0.00..3.57 rows=157 
width=24)


The query
select week_id,count(serial_number) 
from d_trh a
inner join lookup_ww_date2 date
on ( a.test_run_start_date_time between start_time and end_time)
where a.test_run_start_date_time between '5/8/2009 7:00:00 AM' and '5/15/2009 
6:59:59 AM'
group by 
week_id

the lookup_ww_date looks like this

( week_id bigint
  start_time timestamp
  end_time timestamp
)

eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM

The whole aim of this exercise is to look at which WW the particular date falls 
into.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Evil Nested Loops

2009-06-03 Thread Ow Mun Heng
On Wed, 2009-06-03 at 01:28 -0600, Scott Marlowe wrote:
 On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote:
  HashAggregate  (cost=8035443.21..8035445.17 rows=157 width=24)
   -  Nested Loop  (cost=37680.95..7890528.72 rows=28982898 width=24)  
  suspect
 Join Filter: ((a.test_run_start_date_time = date.start_time) AND 
  (a.test_run_start_date_time = date.end_time))
 -  Bitmap Heap Scan on d_trh_pbert a  (cost=37677.22..1369372.99 
  rows=1661440 width=24)
   Recheck Cond: ((test_run_start_date_time = '2009-05-08 
  07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
  '2009-05-15 06:59:59'::timestamp without time zone))
   -  Bitmap Index Scan on idx_d_trh_pbert_sdate  
  (cost=0.00..37261.86 rows=1661440 width=0)
 Index Cond: ((test_run_start_date_time = '2009-05-08 
  07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
  '2009-05-15 06:59:59'::timestamp without time zone))
 -  Materialize  (cost=3.73..5.30 rows=157 width=24)
   -  Seq Scan on lookup_ww_date2 date  (cost=0.00..3.57 
  rows=157 width=24)

 OK, looking at your query and the plan, what you're doing is kind of this:
 
 157 Rows times 1661440 Rows (cross product) = 260M or so and then you
 filter out the 157 original rows and their matches.  Note that an
 explain ANALYZE might shed more light, but given the high cost in this
 query for the nested loop I'm guessing the only thing you can do is
 throw more work_mem at it.  But it's fundamentally flawed in design I
 think.

The explain analyze runs 10 mins and then I just aborted it.

WW49 is basically between 5/8 and 5/15, unfortunately, it's not actually
just the base dates, it's also the time.

eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM

The definition of  WW or a day is actually between 
eg: 5/8 7am to 5/9 6:59:59am


 If you're always working with dates maybe joining on
 date_trunc('day',test_run_start_date)=date_trunc('day',startdate')
 with an index on both terms will work?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Evil Nested Loops

2009-06-03 Thread Scott Marlowe
On Wed, Jun 3, 2009 at 2:18 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote:
 On Wed, 2009-06-03 at 01:28 -0600, Scott Marlowe wrote:
 On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote:
  HashAggregate  (cost=8035443.21..8035445.17 rows=157 width=24)
   -  Nested Loop  (cost=37680.95..7890528.72 rows=28982898 width=24)  
  suspect
         Join Filter: ((a.test_run_start_date_time = date.start_time) AND 
  (a.test_run_start_date_time = date.end_time))
         -  Bitmap Heap Scan on d_trh_pbert a  (cost=37677.22..1369372.99 
  rows=1661440 width=24)
               Recheck Cond: ((test_run_start_date_time = '2009-05-08 
  07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
  '2009-05-15 06:59:59'::timestamp without time zone))
               -  Bitmap Index Scan on idx_d_trh_pbert_sdate  
  (cost=0.00..37261.86 rows=1661440 width=0)
                     Index Cond: ((test_run_start_date_time = '2009-05-08 
  07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
  '2009-05-15 06:59:59'::timestamp without time zone))
         -  Materialize  (cost=3.73..5.30 rows=157 width=24)
               -  Seq Scan on lookup_ww_date2 date  (cost=0.00..3.57 
  rows=157 width=24)

 OK, looking at your query and the plan, what you're doing is kind of this:

 157 Rows times 1661440 Rows (cross product) = 260M or so and then you
 filter out the 157 original rows and their matches.  Note that an
 explain ANALYZE might shed more light, but given the high cost in this
 query for the nested loop I'm guessing the only thing you can do is
 throw more work_mem at it.  But it's fundamentally flawed in design I
 think.

 The explain analyze runs 10 mins and then I just aborted it.

 WW49 is basically between 5/8 and 5/15, unfortunately, it's not actually
 just the base dates, it's also the time.

 eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM

 The definition of  WW or a day is actually between
 eg: 5/8 7am to 5/9 6:59:59am


 If you're always working with dates maybe joining on
 date_trunc('day',test_run_start_date)=date_trunc('day',startdate')
 with an index on both terms will work?

Well, if you could transform your 24 hour day to match date_trunc OR
write your own stable / immutable function to break it on those times.
 I think it could be done.  You could then get a useful index on it.
I think you'd have to use timestamps and not timestamps with timezones
to make it immutable.

Anyway, I think that would get rid of that huge nested loop.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Evil Nested Loops

2009-06-03 Thread Scott Marlowe
On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote:
 What can I do about this plan?



 HashAggregate  (cost=8035443.21..8035445.17 rows=157 width=24)
  -  Nested Loop  (cost=37680.95..7890528.72 rows=28982898 width=24)  
 suspect
        Join Filter: ((a.test_run_start_date_time = date.start_time) AND 
 (a.test_run_start_date_time = date.end_time))
        -  Bitmap Heap Scan on d_trh_pbert a  (cost=37677.22..1369372.99 
 rows=1661440 width=24)
              Recheck Cond: ((test_run_start_date_time = '2009-05-08 
 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
 '2009-05-15 06:59:59'::timestamp without time zone))
              -  Bitmap Index Scan on idx_d_trh_pbert_sdate  
 (cost=0.00..37261.86 rows=1661440 width=0)
                    Index Cond: ((test_run_start_date_time = '2009-05-08 
 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = 
 '2009-05-15 06:59:59'::timestamp without time zone))
        -  Materialize  (cost=3.73..5.30 rows=157 width=24)
              -  Seq Scan on lookup_ww_date2 date  (cost=0.00..3.57 rows=157 
 width=24)


 The query
 select week_id,count(serial_number)
 from d_trh a
 inner join lookup_ww_date2 date
 on ( a.test_run_start_date_time between start_time and end_time)
 where a.test_run_start_date_time between '5/8/2009 7:00:00 AM' and '5/15/2009 
 6:59:59 AM'
 group by
 week_id

 the lookup_ww_date looks like this

 ( week_id bigint
  start_time timestamp
  end_time timestamp
 )

 eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM

 The whole aim of this exercise is to look at which WW the particular date 
 falls into.

OK, looking at your query and the plan, what you're doing is kind of this:

157 Rows times 1661440 Rows (cross product) = 260M or so and then you
filter out the 157 original rows and their matches.  Note that an
explain ANALYZE might shed more light, but given the high cost in this
query for the nested loop I'm guessing the only thing you can do is
throw more work_mem at it.  But it's fundamentally flawed in design I
think.

If you're always working with dates maybe joining on
date_trunc('day',test_run_start_date)=date_trunc('day',startdate')
with an index on both terms will work?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general