Re: Why could GEQO produce plans with lower costs than the standard_join_search?

Donald Dong Wed, 22 May 2019 12:54:16 -0700

On May 22, 2019, at 11:42 AM, Tom Lane <[email protected]> wrote:
> 
> Donald Dong <[email protected]> writes:
>> I find the cost from cheapest_total_path->total_cost is different
>> from the cost from  queryDesc->planstate->total_cost. What I saw was
>> that GEQO tends to form paths with lower
>> cheapest_total_path->total_cost (aka the fitness of the children).
>> However, standard_join_search is more likely to produce a lower
>> queryDesc->planstate->total_cost, which is the cost we get using
>> explain.
> 
>> I wonder why those two total costs are different? If the total_cost
>> from the planstate is more accurate, could we use that instead as the
>> fitness in geqo_eval?
> 
> You're still asking us to answer hypothetical questions unsupported
> by evidence.  In what case does that really happen?


Hi,

My apologies if this is not the minimal necessary set up. But here's
more information about what I saw using the following query
(JOB/1a.sql):

SELECT MIN(mc.note) AS production_note,
       MIN(t.title) AS movie_title,
       MIN(t.production_year) AS movie_year
FROM company_type AS ct,
     info_type AS it,
     movie_companies AS mc,
     movie_info_idx AS mi_idx,
     title AS t
WHERE ct.kind = 'production companies'
  AND it.info = 'top 250 rank'
  AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
  AND (mc.note LIKE '%(co-production)%'
       OR mc.note LIKE '%(presents)%')
  AND ct.id = mc.company_type_id
  AND t.id = mc.movie_id
  AND t.id = mi_idx.movie_id
  AND mc.movie_id = mi_idx.movie_id
  AND it.id = mi_idx.info_type_id;

I attached the query plan and debug_print_rel output for GEQO and
standard_join_search.

                        planstate->total_cost   cheapest_total_path
GEQO            54190.13                                54239.03
STD                     54179.02                                54273.73

Here I observe GEQO  produces a lower
cheapest_total_path->total_cost, but its planstate->total_cost is higher
than what standard_join_search produces.

Regards,
Donald Dong

 Finalize Aggregate  (cost=54190.12..54190.13 rows=1 width=68)
   ->  Gather  (cost=54189.90..54190.11 rows=2 width=68)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=53189.90..53189.91 rows=1 width=68)
               ->  Nested Loop  (cost=15318.71..53189.55 rows=46 width=45)
                     Join Filter: (mc.movie_id = t.id)
                     ->  Hash Join  (cost=15318.28..53162.39 rows=46 width=32)
                           Hash Cond: (mc.company_type_id = ct.id)
                           ->  Parallel Hash Join  (cost=15317.21..53160.33 
rows=185 width=36)
                                 Hash Cond: (mc.movie_id = mi_idx.movie_id)
                                 ->  Parallel Seq Scan on movie_companies mc  
(cost=0.00..37814.90 rows=7320 width=32)
                                       Filter: (((note)::text !~~ '%(as 
Metro-Goldwyn-Mayer Pictures)%'::text) AND (((note)::text ~~ 
'%(co-production)%'::text) OR ((note)::text ~~ ' %(presents)%'::text)))
                                 ->  Parallel Hash  (cost=15253.60..15253.60 
rows=5089 width=4)
                                       ->  Hash Join  (cost=2.43..15253.60 
rows=5089 width=4)
                                             Hash Cond: (mi_idx.info_type_id = 
it.id)
                                             ->  Parallel Seq Scan on 
movie_info_idx mi_idx  (cost=0.00..13685.15 rows=575015 width=8)
                                             ->  Hash  (cost=2.41..2.41 rows=1 
width=4)
                                                   ->  Seq Scan on info_type it 
 (cost=0.00..2.41 rows=1 width=4)
                                                         Filter: ((info)::text 
= 'top 250 rank'::text)
                           ->  Hash  (cost=1.05..1.05 rows=1 width=4)
                                 ->  Seq Scan on company_type ct  
(cost=0.00..1.05 rows=1 width=4)
                                       Filter: ((kind)::text = 'production 
companies'::text)
                     ->  Index Scan using title_pkey on title t  
(cost=0.43..0.58 rows=1 width=25)
                           Index Cond: (id = mi_idx.movie_id)

 Finalize Aggregate  (cost=54179.01..54179.02 rows=1 width=68)
   ->  Gather  (cost=54178.79..54179.00 rows=2 width=68)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=53178.79..53178.80 rows=1 width=68)
               ->  Nested Loop  (cost=37881.27..53178.44 rows=46 width=45)
                     Join Filter: (mc.movie_id = t.id)
                     ->  Parallel Hash Join  (cost=37880.84..53151.28 rows=46 
width=32)
                           Hash Cond: (mi_idx.movie_id = mc.movie_id)
                           ->  Hash Join  (cost=2.43..15253.60 rows=5089 
width=4)
                                 Hash Cond: (mi_idx.info_type_id = it.id)
                                 ->  Parallel Seq Scan on movie_info_idx mi_idx 
 (cost=0.00..13685.15 rows=575015 width=8)
                                 ->  Hash  (cost=2.41..2.41 rows=1 width=4)
                                       ->  Seq Scan on info_type it  
(cost=0.00..2.41 rows=1 width=4)
                                             Filter: ((info)::text = 'top 250 
rank'::text)
                           ->  Parallel Hash  (cost=37855.54..37855.54 
rows=1830 width=28)
                                 ->  Hash Join  (cost=1.06..37855.54 rows=1830 
width=28)
                                       Hash Cond: (mc.company_type_id = ct.id)
                                       ->  Parallel Seq Scan on movie_companies 
mc  (cost=0.00..37814.90 rows=7320 width=32)
                                             Filter: (((note)::text !~~ '%(as 
Metro-Goldwyn-Mayer Pictures)%'::text) AND (((note)::text ~~ 
'%(co-production)%'::text) OR ((note)::tex
t ~~ '%(presents)%'::text)))
                                       ->  Hash  (cost=1.05..1.05 rows=1 
width=4)
                                             ->  Seq Scan on company_type ct  
(cost=0.00..1.05 rows=1 width=4)
                                                   Filter: ((kind)::text = 
'production companies'::text)
                     ->  Index Scan using title_pkey on title t  
(cost=0.43..0.58 rows=1 width=25)
                           Index Cond: (id = mi_idx.movie_id)

cheapest total path:
NestLoop(ct it mc mi_idx t) rows=111 cost=16318.71..54273.73
  clauses: mc.movie_id = t.id
        HashJoin(ct it mc mi_idx) rows=111 cost=16318.28..54208.19
          clauses: mc.company_type_id = ct.id
                Gather(it mc mi_idx) rows=444 cost=16317.21..54204.73
                        HashJoin(it mc mi_idx) rows=185 cost=15317.21..53160.33
                          clauses: mi_idx.movie_id = mc.movie_id
                                SeqScan(mc) rows=7320 cost=0.00..37814.90
                                HashJoin(it mi_idx) rows=5089 
cost=2.43..15253.60
                                  clauses: it.id = mi_idx.info_type_id
                                        SeqScan(mi_idx) rows=575015 
cost=0.00..13685.15
                                        SeqScan(it) rows=1 cost=0.00..2.41
                SeqScan(ct) rows=1 cost=0.00..1.05
        IdxScan(t) required_outer (mi_idx) rows=1 cost=0.43..0.58
          pathkeys: ((t.id, mc.movie_id, mi_idx.movie_id))

cheapest total path:
NestLoop(ct it mc mi_idx t) rows=111 cost=16318.71..54239.03
  clauses: mc.movie_id = t.id
        Gather(ct it mc mi_idx) rows=111 cost=16318.28..54173.49
                HashJoin(ct it mc mi_idx) rows=46 cost=15318.28..53162.39
                  clauses: mc.company_type_id = ct.id
                        HashJoin(it mc mi_idx) rows=185 cost=15317.21..53160.33
                          clauses: mc.movie_id = mi_idx.movie_id
                                SeqScan(mc) rows=7320 cost=0.00..37814.90
                                HashJoin(it mi_idx) rows=5089 
cost=2.43..15253.60
                                  clauses: it.id = mi_idx.info_type_id
                                        SeqScan(mi_idx) rows=575015 
cost=0.00..13685.15
                                        SeqScan(it) rows=1 cost=0.00..2.41
                        SeqScan(ct) rows=1 cost=0.00..1.05
        IdxScan(t) required_outer (mi_idx) rows=1 cost=0.43..0.58
          pathkeys: ((t.id, mc.movie_id, mi_idx.movie_id))

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

Reply via email to