HI, all hackers:
In the GitHub repository for PostgreSQL’s pg_hint_plan extension, there is an
issue where the generated join order does not match the assigned join order.
After reviewing the source code, I found that this inconsistency with input
hints is due to PostgreSQL’s implementation and is not a bug in pg_hint_plan.
PostgreSQL with pg_hint_plan supports disabling certain operators (e.g., hash
join, seq scan) by setting pg parameters like “set enable_hashjoin = false”.
This setting causes PostgreSQL to add a high disable_cost (e.g., 1e10) to the
estimated cost of the hash join operator, effectively preventing the planner
from selecting hash joins due to the inflated cost. Additionally, pg_hint_plan
supports enforcing specific join orders. To do this, pg_hint_plan disables all
join algorithms when it encounters inconsistent join orders, by adding the
disable_cost to each join operator. As a result, only the assigned join order
will be selected. This is the mechanism behind pg_hint_plan.
Then, we take an example of the GitHub issue to demonstrate this problem:
Here is a query with pg_hint:
/*+
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
MIN(t.title) AS voiced_movie
FROM aka_name AS an,
char_name AS chn,
cast_info AS ci,
company_name AS cn,
info_type AS it,
movie_companies AS mc,
movie_info AS mi,
name AS n,
role_type AS rt,
title AS t
WHERE ci.note IN ('(voice)',
'(voice: Japanese version)',
'(voice) (uncredited)',
'(voice: English version)')
AND cn.country_code ='[us]'
AND it.info = 'release dates'
AND mc.note IS NOT NULL
AND (mc.note LIKE '%(USA)%'
OR mc.note LIKE '%(worldwide)%')
AND mi.info IS NOT NULL
AND (mi.info LIKE 'Japan:%200%'
OR mi.info LIKE 'USA:%200%')
AND n.gender ='f'
AND n.name LIKE '%Ang%'
AND rt.role ='actress'
AND t.production_year BETWEEN 2005 AND 2009
AND t.id = mi.movie_id
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND mc.movie_id = ci.movie_id
AND mc.movie_id = mi.movie_id
AND mi.movie_id = ci.movie_id
AND cn.id = mc.company_id
AND it.id = mi.info_type_id
AND n.id = ci.person_id
AND rt.id = ci.role_id
AND n.id = an.person_id
AND ci.person_id = an.person_id
AND chn.id = ci.person_role_id;
The hint specifies a join order (rt (it ((n (chn (mc (mi (t (ci an)))))) cn))),
but the generated join order is (rt (it ((n ((mc (mi ((ci an) t))) chn)) cn))).
Here, PostgreSQL generates sub-join order ((ci an) t) instead of the assigned
sub-join order (t (ci an)), and ((mc (mi ((ci an) t))) chn) instead of (chn (mc
(mi ((ci an) t)))). This discrepancy arises because PostgreSQL estimates
operator costs in two phases. In the first phase, it filters out paths that are
obviously suboptimal based on estimated costs; however, it does not factor in
disable_cost for disabled operators in this phase, only doing so in the second
phase. As a result, while (t (ci an)) would use a regular nested loop join
with a sequential scan on t, ((ci an) t) uses an index-based nested loop join
with an index scan on t, which is significantly faster. Consequently, (t (ci
an)) is filtered out after the first phase of cost estimation. The same
reasoning applies to (chn (mc (mi ((ci an) t)))).
In the following example, by forcing PostgreSQL to access relations t and chn
with a sequential scan, PostgreSQL generates the assigned join order. This is
because forcing a sequential scan for t and chn prevents PostgreSQL from
considering index-based nested loop joins for them.
/*+
SeqScan(t) SeqScan(chn)
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
MIN(t.title) AS voiced_movie
FROM aka_name AS an,
char_name AS chn,
cast_info AS ci,
company_name AS cn,
info_type AS it,
movie_companies AS mc,
movie_info AS mi,
name AS n,
role_type AS rt,
title AS t
WHERE ci.note IN ('(voice)',
'(voice: Japanese version)',
'(voice) (uncredited)',
'(voice: English version)')
AND cn.country_code ='[us]'
AND it.info = 'release dates'
AND mc.note IS NOT NULL
AND (mc.note LIKE '%(USA)%'
OR mc.note LIKE '%(worldwide)%')
AND mi.info IS NOT NULL
AND (mi.info LIKE 'Japan:%200%'
OR mi.info LIKE 'USA:%200%')
AND n.gender ='f'
AND n.name LIKE '%Ang%'
AND rt.role ='actress'
AND t.production_year BETWEEN 2005 AND 2009
AND t.id = mi.movie_id
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND mc.movie_id = ci.movie_id
AND mc.movie_id = mi.movie_id
AND mi.movie_id = ci.movie_id
AND cn.id = mc.company_id
AND it.id = mi.info_type_id
AND n.id = ci.person_id
AND rt.id = ci.role_id
AND n.id = an.person_id
AND ci.person_id = an.person_id
AND chn.id = ci.person_role_id;
As I explained, the issue arises because PG does not account for the
disable_cost of disabled operators in the initial phase of cost estimation. To
address this, I modified the process to include disable_cost for disabled
operators in the first phase. As a result, the initial query generated the
desired join order.
I want to submit a patch to PG to solve this issue. How do you like this
solution?
Kindly regards,
Qilong.