[GENERAL] Evil Nested Loops
What can I do about this plan? HashAggregate (cost=8035443.21..8035445.17 rows=157 width=24) - Nested Loop (cost=37680.95..7890528.72 rows=28982898 width=24) suspect Join Filter: ((a.test_run_start_date_time = date.start_time) AND (a.test_run_start_date_time = date.end_time)) - Bitmap Heap Scan on d_trh_pbert a (cost=37677.22..1369372.99 rows=1661440 width=24) Recheck Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Bitmap Index Scan on idx_d_trh_pbert_sdate (cost=0.00..37261.86 rows=1661440 width=0) Index Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Materialize (cost=3.73..5.30 rows=157 width=24) - Seq Scan on lookup_ww_date2 date (cost=0.00..3.57 rows=157 width=24) The query select week_id,count(serial_number) from d_trh a inner join lookup_ww_date2 date on ( a.test_run_start_date_time between start_time and end_time) where a.test_run_start_date_time between '5/8/2009 7:00:00 AM' and '5/15/2009 6:59:59 AM' group by week_id the lookup_ww_date looks like this ( week_id bigint start_time timestamp end_time timestamp ) eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM The whole aim of this exercise is to look at which WW the particular date falls into. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Evil Nested Loops
On Wed, 2009-06-03 at 01:28 -0600, Scott Marlowe wrote: On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote: HashAggregate (cost=8035443.21..8035445.17 rows=157 width=24) - Nested Loop (cost=37680.95..7890528.72 rows=28982898 width=24) suspect Join Filter: ((a.test_run_start_date_time = date.start_time) AND (a.test_run_start_date_time = date.end_time)) - Bitmap Heap Scan on d_trh_pbert a (cost=37677.22..1369372.99 rows=1661440 width=24) Recheck Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Bitmap Index Scan on idx_d_trh_pbert_sdate (cost=0.00..37261.86 rows=1661440 width=0) Index Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Materialize (cost=3.73..5.30 rows=157 width=24) - Seq Scan on lookup_ww_date2 date (cost=0.00..3.57 rows=157 width=24) OK, looking at your query and the plan, what you're doing is kind of this: 157 Rows times 1661440 Rows (cross product) = 260M or so and then you filter out the 157 original rows and their matches. Note that an explain ANALYZE might shed more light, but given the high cost in this query for the nested loop I'm guessing the only thing you can do is throw more work_mem at it. But it's fundamentally flawed in design I think. The explain analyze runs 10 mins and then I just aborted it. WW49 is basically between 5/8 and 5/15, unfortunately, it's not actually just the base dates, it's also the time. eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM The definition of WW or a day is actually between eg: 5/8 7am to 5/9 6:59:59am If you're always working with dates maybe joining on date_trunc('day',test_run_start_date)=date_trunc('day',startdate') with an index on both terms will work? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Evil Nested Loops
On Wed, Jun 3, 2009 at 2:18 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote: On Wed, 2009-06-03 at 01:28 -0600, Scott Marlowe wrote: On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote: HashAggregate (cost=8035443.21..8035445.17 rows=157 width=24) - Nested Loop (cost=37680.95..7890528.72 rows=28982898 width=24) suspect Join Filter: ((a.test_run_start_date_time = date.start_time) AND (a.test_run_start_date_time = date.end_time)) - Bitmap Heap Scan on d_trh_pbert a (cost=37677.22..1369372.99 rows=1661440 width=24) Recheck Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Bitmap Index Scan on idx_d_trh_pbert_sdate (cost=0.00..37261.86 rows=1661440 width=0) Index Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Materialize (cost=3.73..5.30 rows=157 width=24) - Seq Scan on lookup_ww_date2 date (cost=0.00..3.57 rows=157 width=24) OK, looking at your query and the plan, what you're doing is kind of this: 157 Rows times 1661440 Rows (cross product) = 260M or so and then you filter out the 157 original rows and their matches. Note that an explain ANALYZE might shed more light, but given the high cost in this query for the nested loop I'm guessing the only thing you can do is throw more work_mem at it. But it's fundamentally flawed in design I think. The explain analyze runs 10 mins and then I just aborted it. WW49 is basically between 5/8 and 5/15, unfortunately, it's not actually just the base dates, it's also the time. eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM The definition of WW or a day is actually between eg: 5/8 7am to 5/9 6:59:59am If you're always working with dates maybe joining on date_trunc('day',test_run_start_date)=date_trunc('day',startdate') with an index on both terms will work? Well, if you could transform your 24 hour day to match date_trunc OR write your own stable / immutable function to break it on those times. I think it could be done. You could then get a useful index on it. I think you'd have to use timestamps and not timestamps with timezones to make it immutable. Anyway, I think that would get rid of that huge nested loop. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Evil Nested Loops
On Wed, Jun 3, 2009 at 12:32 AM, Ow Mun Heng ow.mun.h...@wdc.com wrote: What can I do about this plan? HashAggregate (cost=8035443.21..8035445.17 rows=157 width=24) - Nested Loop (cost=37680.95..7890528.72 rows=28982898 width=24) suspect Join Filter: ((a.test_run_start_date_time = date.start_time) AND (a.test_run_start_date_time = date.end_time)) - Bitmap Heap Scan on d_trh_pbert a (cost=37677.22..1369372.99 rows=1661440 width=24) Recheck Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Bitmap Index Scan on idx_d_trh_pbert_sdate (cost=0.00..37261.86 rows=1661440 width=0) Index Cond: ((test_run_start_date_time = '2009-05-08 07:00:00'::timestamp without time zone) AND (test_run_start_date_time = '2009-05-15 06:59:59'::timestamp without time zone)) - Materialize (cost=3.73..5.30 rows=157 width=24) - Seq Scan on lookup_ww_date2 date (cost=0.00..3.57 rows=157 width=24) The query select week_id,count(serial_number) from d_trh a inner join lookup_ww_date2 date on ( a.test_run_start_date_time between start_time and end_time) where a.test_run_start_date_time between '5/8/2009 7:00:00 AM' and '5/15/2009 6:59:59 AM' group by week_id the lookup_ww_date looks like this ( week_id bigint start_time timestamp end_time timestamp ) eg: 200949|5/8/2009 7:00:00am|5/15/2009 6:59:59AM The whole aim of this exercise is to look at which WW the particular date falls into. OK, looking at your query and the plan, what you're doing is kind of this: 157 Rows times 1661440 Rows (cross product) = 260M or so and then you filter out the 157 original rows and their matches. Note that an explain ANALYZE might shed more light, but given the high cost in this query for the nested loop I'm guessing the only thing you can do is throw more work_mem at it. But it's fundamentally flawed in design I think. If you're always working with dates maybe joining on date_trunc('day',test_run_start_date)=date_trunc('day',startdate') with an index on both terms will work? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general