> -----Original Message-----
> From: Tom Lane [mailto:[email protected]]
> Sent: Friday, July 20, 2012 4:47 PM
> To: David Johnston
> Cc: [email protected]
> Subject: Re: [GENERAL] A Better Way? (Multi-Left Join Lookup)
>
> "David Johnston" <[email protected]> writes:
> > WITH
> > full_set AS ( ) -- 8,500 records
> > , sub_1 AS () -- also about 8,500
> > , sub_2 AS () -- maybe 5,000
> > , sub_3 AS () - - maybe 3,000
> > SELECT full_set.*
> > , COALESCE(sub_1.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > , COALESCE(sub_2.field, FALSE)
> > FROM full_set
> > LEFT JOIN sub_1
> > LEFT JOIN sub_2
> > LEFT JOIN sub_3
>
> > The performance of this query is exponential due to the fact that the
> > sub-queries/CTEs are not indexed and so each subset has to be scanned
> > completely for each record in the full set.
>
> Surely not. Neither merge nor hash joins require an index. What plan is
> getting selected? Are you sure there's at most one match in each "sub"
set
> for each row in the "full" set? If you were getting a large number of
matches
> in some cases, the size of the result could balloon to something
unfortunate
> ... but we have not got enough information to know.
>
> regards, tom lane
The final result, in this case would have 8,500 records AND
sub_1.field would be TRUE for basically all of them and FALSE for the
minimal remainder
sub_2.field would be TRUE for 5,000 of them and FALSE for 3,500 of them
sub_3.field would be TRUE for 3,000 of them and FALSE for 5,500 of them
There is never, in reality, two records in a sub-table for a single record
in the master table. It is possible a record exists in a sub-table but not
in the main table but I do not care about those (thus the LEFT instead of a
FULL OUTER JOIN).
I have attached a scrubbed query and explain/analyze. Let me know if
something more is needed.
I have included two versions of the query, one using CTE and the other using
mostly sub-selects.
I had run ANALYZE on the pertinent tables but the CTE queries all perform
quite quickly when run by themselves.
In looking at the source tables for the data I did notice that I have not
properly defined the relevant INDEXes as being UNIQUE. This applies to two
of the sub-tables. The third sub-table requires the use of "DISTINCT". The
joining columns with each set of data are unique when fed into the LEFT
JOIN. The master CTE/Query is generated via a function call and it also
generates unique keys for the LEFT JOIN.
Thank you for your help!
David J.
QUERY
WITH
scenario_info AS ()
, source_listing AS ()
, detail_listing AS ()
, sequence_listing AS ()
, master_listing AS (
SELECT
-- identifier fields
FROM (
SELECT (func).* FROM (
SELECT fuction_generating_8500_records(...)
) func
FROM scenario_info
) call
) master (function_column_rename)
)
--# end "master_listing" CTE
SELECT
scenariokey AS "Scenario Key"
, invoicenumber AS "Invoice Number"
, COALESCE(has_sequence,FALSE) AS "Has Sequence"
, COALESCE(has_detail,FALSE) AS "Has Detail"
, COALESCE(has_source,FALSE) AS "Has Source"
/* Omit a CASE WHEN + COALESCE That Provides Names To Each of the
Possible Combinations */
FROM master_listing
LEFT JOIN source_listing USING ()
LEFT JOIN detail_listing USING ()
LEFT JOIN sequence_listing USING ()
;
QUERY PLAN
Nested Loop Left Join (cost=57926.57..58810.79 rows=1 width=67) (actual
time=842.368..263801.626 rows=8656 loops=1)
CTE scenario_info {# Support CTE #}
-> Index Scan [...] (cost=0.25..8.52 rows=1 width=65) (actual
time=44.855..44.859 rows=1 loops=1)
CTE source_listing {# Sub 1 #}
-> Nested Loop (cost=120.26..12219.68 rows=3594 width=21) (actual
time=1.831..9.624 rows=2091 loops=1)
Buffers: shared hit=408
-> CTE Scan [...] (cost=0.00..0.02 rows=1 width=62) (actual
time=0.001..0.006 rows=1 loops=1)
-> Bitmap Heap Scan [...] (cost=120.26..12174.74 rows=3594
width=21) (actual time=1.813..6.711 rows=2091 loops=1)
-> Bitmap Index Scan [...] (cost=0.00..119.36 rows=3594
width=0) (actual time=1.604..1.604 rows=2091 loops=1)
CTE detail_listing {# Sub 2 #}
-> HashAggregate (cost=3046.18..3167.02 rows=12084 width=41) (actual
time=129.044..132.509 rows=6042 loops=1)
-> Seq Scan on [...] (cost=0.00..2872.53 rows=23153 width=41)
(actual time=0.004..60.430 rows=23153 loops=1)
CTE sequence_listing {# Sub 3 #}
-> Bitmap Heap Scan [...] (cost=623.58..19768.42 rows=9585 width=36)
(actual time=64.801..77.267 rows=8656 loops=1)
-> Bitmap Index Scan [...] (cost=0.00..621.18 rows=9585 width=0)
(actual time=64.635..64.635 rows=8656 loops=1)
CTE master_listing {# The LEFT side of the multi-joins #}
-> Subquery Scan on call (cost=22762.65..22762.94 rows=1 width=32)
(actual time=619.158..735.559 rows=8656 loops=1)
-> CTE Scan on scenario_info (cost=22762.65..22762.93 rows=1
width=196) (actual time=619.150..658.142 rows=8656 loops=1)
InitPlan 5 (returns $4)
-> HashAggregate (cost=22752.62..22762.65 rows=1003
width=7) (actual time=219.015..219.172 rows=796 loops=1)
-> Append (cost=19790.25..22750.11 rows=1003 width=7)
(actual time=98.569..217.774 rows=1510 loops=1)
-> HashAggregate (cost=19790.25..19790.47
rows=22 width=6) (actual time=98.567..99.428 rows=796 loops=1)
-> Bitmap Heap Scan on [...]
(cost=621.20..19790.01 rows=96 width=6) (actual time=49.550..96.681 rows=796
loops=1)
Recheck Cond: [...]
Filter: [...]
-> Bitmap Index Scan on [...]
(cost=0.00..621.18 rows=9585 width=0) (actual time=49.350..49.350 rows=8656
loops=1)
-> HashAggregate (cost=2939.80..2949.61
rows=981 width=7) (actual time=117.373..117.543 rows=714 loops=1)
-> Seq Scan on [...] (cost=0.00..2930.41
rows=3756 width=7) (actual time=0.582..115.888 rows=3429 loops=1)
{ # The CTE Queries Return Quite Quickly #}
{ # Now We Combine Them #}
-> Nested Loop Left Join (cost=0.00..461.28 rows=1 width=66) (actual
time=700.682..169850.684 rows=8656 loops=1)
Join Filter: [...]
-> Nested Loop Left Join (cost=0.00..125.81 rows=1 width=65) (actual
time=635.866..33832.263 rows=8656 loops=1)
Join Filter: [...]
-> CTE Scan on master_listing (cost=0.00..0.02 rows=1 width=96)
(actual time=619.163..798.343 rows=8656 loops=1)
-> CTE Scan on source_listing (cost=0.00..71.88 rows=3594
width=101) (actual time=0.001..1.335 rows=2091 loops=8656)
-> CTE Scan on sequence_listing (cost=0.00..191.70 rows=9585
width=143) (actual time=0.008..5.505 rows=8656 loops=8656)
-> CTE Scan on detail_listing (cost=0.00..241.68 rows=12084 width=143)
(actual time=0.019..3.790 rows=6042 loops=8656)
Total runtime: 263811.946 ms
# For this plan instead of using CTE the query inside the CTEs was moved inside
the LEFT JOIN
# I.E. LEFT JOIN ( <CTE QUERY> ) sub_n USING (...)
WITH
scenario_info AS ()
, master_listing AS (
SELECT
scenariokey
, invoicenumber
, COALESCE(has_sequence, FALSE) AS has_sequence
, COALESCE(has_detail, FALSE) AS has_detail
, COALESCE(has_source, FALSE) AS has_source
FROM (
SELECT (func).* FROM (
SELECT mage_create_sequencemaster(...)
) func
FROM scenario_info
) call
) master (...)
LEFT JOIN () seq USING (...)
LEFT JOIN () detail USING (...)
LEFT JOIN () src USING (...)
ORDER BY invoicesequence --# Didn't Bother Ordering the other query...
)
SELECT [...] FROM master_listing
;
QUERY PLAN
CTE Scan on master_listing (cost=26258.77..26258.79 rows=1 width=67) (actual
time=541681.104..541690.533 rows=8656 loops=1)
CTE scenario_info
-> Index Scan using magescenario_pkey on magescenario (cost=0.25..8.52
rows=1 width=65) (actual time=40.845..40.852 rows=1 loops=1)
CTE master_listing
-> Sort (cost=26250.24..26250.25 rows=1 width=35) (actual
time=541681.097..541682.659 rows=8656 loops=1)
Sort Method: quicksort Memory: 1111kB
-> Nested Loop Left Join (cost=25809.07..26250.23 rows=1 width=35)
(actual time=860.953..541541.541 rows=8656 loops=1)
-> Nested Loop Left Join (cost=25809.07..26241.73 rows=1
width=34) (actual time=860.769..540748.350 rows=8656 loops=1)
-> Nested Loop Left Join (cost=22762.90..22772.61
rows=1 width=33) (actual time=758.079..415662.947 rows=8656 loops=1)
-> CTE Scan on scenario_info
(cost=22762.65..22762.93 rows=1 width=196) (actual time=715.729..816.423
rows=8656 loops=1)
InitPlan 2 (returns $1)
-> HashAggregate (cost=22752.62..22762.65
rows=1003 width=7) (actual time=243.929..244.647 rows=796 loops=1)
-> Append (cost=19790.25..22750.11
rows=1003 width=7) (actual time=117.295..242.829 rows=1510 loops=1)
-> HashAggregate
(cost=19790.25..19790.47 rows=22 width=6) (actual time=117.294..117.476
rows=796 loops=1)
-> Bitmap Heap Scan on
[...] (cost=621.20..19790.01 rows=96 width=6) (actual time=83.059..116.925
rows=796 loops=1)
-> Bitmap Index
Scan [...] (cost=0.00..621.18 rows=9585 width=0) (actual time=82.989..82.989
rows=8656 loops=1)
-> HashAggregate
(cost=2939.80..2949.61 rows=981 width=7) (actual time=123.224..124.175 rows=714
loops=1)
-> Seq Scan on [...]
(cost=0.00..2930.41 rows=3756 width=7) (actual time=0.171..120.353 rows=3429
loops=1)
-> Index Scan [...] (cost=0.25..9.66 rows=1
width=37) (actual time=0.099..0.110 rows=1 loops=8656)
-> HashAggregate (cost=3046.18..3167.02 rows=12084
width=41) (actual time=0.013..6.093 rows=6042 loops=8656)
-> Seq Scan on [...] (cost=0.00..2872.53
rows=23153 width=41) (actual time=0.005..48.263 rows=23153 loops=1)
-> Index Scan using [...] (cost=0.00..8.49 rows=1 width=22)
(actual time=0.065..0.066 rows=0 loops=8656)
Total runtime: 541694.052 ms
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general