[jira] [Created] (HIVE-29613) Cross-product join falls back to single-reducer shuffle merge when small-side row estimate marginally exceeds hive.xprod.mapjoin.small.table.rows

Araika Singh (Jira) Thu, 14 May 2026 04:47:06 -0700

Araika Singh created HIVE-29613:
-----------------------------------

             Summary: Cross-product join falls back to single-reducer shuffle 
merge when small-side row estimate marginally exceeds 
hive.xprod.mapjoin.small.table.rows
                 Key: HIVE-29613
                 URL: https://issues.apache.org/jira/browse/HIVE-29613
             Project: Hive
          Issue Type: Bug
            Reporter: Araika Singh
            Assignee: Araika Singh



*Background:*

{color:#4c9aff}ConvertJoinMapJoin{color} decides whether to convert each join 
into a broadcast map join or leave it as a shuffle merge join. For joins the 
optimizer classifies as cross products ({color:#4c9aff}isCrossProduct(joinOp) 
== true{color}), an additional row-count gate is applied: the build side's 
estimated row count must not exceed 
{color:#4c9aff}hive.xprod.mapjoin.small.table.rows{color} (default 1). When the 
estimate exceeds this threshold, the cross-product map-join conversion is 
rejected outright and the join falls back through 
{color:#4c9aff}fallbackToReduceSideJoin / fallbackToMergeJoin{color} into a 
shuffle merge join.

In practice, the build side of a tiny lookup-style table can end up estimated 
at 2 or 3 rows after CBO pushes down predicates and applies NDV-based 
selectivity, even when its actual byte footprint is a few hundred bytes — 
easily within the existing broadcast byte budget 
{color:#4c9aff}hive.auto.convert.join.noconditionaltask.size{color}. NDV-driven 
filter estimates routinely overshoot the row threshold by a small margin on 
tiny tables. When this happens the row-only gate rejects a broadcast that would 
have been entirely safe by the same byte budget that map-join conversion 
already uses elsewhere.

The resulting plan is a {color:#4c9aff}MERGEJOIN{color} with 
{color:#4c9aff}XPROD_EDGE{color} inputs. Because a keyless cross-product offers 
nothing real to shuffle on, the shuffle collapses to a single reduce key, and 
the entire output of the big side is materialised on a single reducer task. The 
downstream symptom on real-world data is a job pinned indefinitely at ~98–99% 
with one reducer running while the rest finish. We have observed this on a join 
of a ~29 Million-row, ~9 GB big-side table against a build side filtered down 
to 2 estimated rows / a few hundred bytes {color:#4c9aff}onlineDataSize{color} 
— the small side fits the byte budget by ~7 orders of magnitude, yet the 
row-only gate rejects the broadcast because 2 > 1:
{code:java}
ConvertJoinMapJoin#getMapJoinConversion

if (parentStats.getNumRows() >
       HiveConf.getIntVar(context.conf, 
HiveConf.ConfVars.XPRODSMALLTABLEROWSTHRESHOLD)) {
     // if any of smaller side is estimated to generate more than
     // threshold rows we would disable mapjoin
     return null;
   } {code}
 

At compile time the optimizer emits, three times in a row, the two log lines:
{code:java}
INFO  optimizer.ConvertJoinMapJoin - Could not get a valid join position. 
Defaulting to position 0
INFO  optimizer.ConvertJoinMapJoin - Fallback to common merge join operator
....
INFO  SessionState - Warning: Shuffle Join MERGEJOIN[141] in Stage 'Reducer 2' 
is a cross product
{code}
The root cause is that the cross-product gate consults only the row count of 
the build side, not its byte footprint, and NDV-driven row estimates on tiny 
tables are exactly the case where the byte footprint is small but the row count 
overshoots the threshold by 1.

*Proposed Fix:*

Relax the cross-product gate — when the row estimate exceeds 
{color:#4c9aff}hive.xprod.mapjoin.small.table.rows{color}, consult a byte check 
against {color:#4c9aff}hive.auto.convert.join.noconditionaltask.size{color} 
(the same budget map-join conversion already uses). If the small side's 
computeOnlineDataSize fits the budget, allow the conversion. If it doesn't, 
reject as before.

The log line is emitted at compile time when the byte-fallback branch admits, 
so the new path is observable in HS2 logs:
{code:java}
INFO  SessionState - Warning: Shuffle Join MERGEJOIN[141] in Stage 'Reducer 2' 
is a cross product{code}
*Risk and scope:*

- The change is confined to the cross-product branch of 
{color:#4c9aff}ConvertJoinMapJoin.getMapJoinConversion{color}. 
Non-cross-product joins are unaffected.
- For cross products whose build side is genuinely large in bytes, the new 
check rejects with the same outcome as today.
- The change is gated by 
{color:#4c9aff}hive.tez.cartesian-product.enabled{color} (the existing flag) — 
clusters that have cartesian-product edges disabled don't enter this branch at 
all.
- No public API or configuration surface is added.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HIVE-29613) Cross-product join falls back to single-reducer shuffle merge when small-side row estimate marginally exceeds hive.xprod.mapjoin.small.table.rows

Reply via email to