[ 
https://issues.apache.org/jira/browse/HIVE-29084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Rebele updated HIVE-29084:
---------------------------------
    Description: 
When CBO is enabled, queries with multiple LATERAL VIEW explode operations and 
WHERE clause filtering return incorrect results. WHERE clause conditions could 
be dropped during post-Calcite PPD processing, causing the query to return rows 
that should be filtered out.

I can reliably reproduce the bug with 2+ occurrences of "LATERAL VIEW 
explode()" *and* a WHERE clause; it does not seem to happen with only one 
"LATERAL VIEW explode()" or without a WHERE clause.

Originally reproduced with published 4.0.1 container, then verified with 
locally built current master branch.

I was *not* able to reproduce it with unit tests.

{{*Minimal Reproduction Case:*}}
{{  }}
{{  DROP TABLE IF EXISTS cbo_lateral_view_where_bug;}}
{{  CREATE TABLE cbo_lateral_view_where_bug (values array<string>);}}
{{  INSERT INTO cbo_lateral_view_where_bug SELECT array('a', 'b');}}
{{{}  ANALYZE TABLE cbo_lateral_view_where_bug COMPUTE STATISTICS;{}}}{\{  **  
}}

{{  *-- Correct result (CBO OFF)*}}
{{  SET hive.cbo.enable=false;}}
{{  SELECT first_val, second_val}}
{{  FROM cbo_lateral_view_where_bug}}
{{  LATERAL VIEW explode(values) lv1 AS first_val}}
{{  LATERAL VIEW explode(values) lv2 AS second_val}}
{{  WHERE first_val != second_val;}}

{{  *-- Incorrect result (CBO ON)*}}
{{  SET hive.cbo.enable=true;}}
{{  SELECT first_val, second_val}}
{{  FROM cbo_lateral_view_where_bug}}
{{  LATERAL VIEW explode(values) lv1 AS first_val}}
{{  LATERAL VIEW explode(values) lv2 AS second_val}}
{{{}  WHERE first_val != second_val;{}}}{\{  **  }}

{{  *–- Expected vs Actual Results:*}}
{{  - CBO OFF (correct): 2 rows: ('a','b') and ('b','a')}}
{{  - CBO ON (incorrect): 4 rows: ('a','b'), ('b','a'), ('a','a'), ('b','b')}}

  *Root Cause:*
  ASTConverter's LV processing assigns the same tableAlias value to all columns 
of the query, even ones generated as LATERAL VIEW. This causes certain 
conditions (OR, "!=") between columns of different LV clauses (and even the LV 
clause and the base table) to be "incorrectly pushable" during PPD processing

  *Impact:*
  - Data Correctness: Queries return wrong results, not just performance issues
  - Scope: Requires 2+ LATERAL VIEW explode operations with WHERE filtering
  - Severity: Silent data corruption - users may not notice incorrect results

  {*}Workaround{*}:
Set {*}hive.cbo.enable{*}=false for queries with multiple lateral view 
cross-products and WHERE clause filtering. Disabling PPD also seems to bypass 
the problem.

 

Attachments:
 - [^hive_cbo_lateral_view_where_bug.sql] - SQL to reproduce
 - Hive log: [^hive_cbo_lateral_view_where_bug.log]

  was:
When CBO is enabled, queries with multiple LATERAL VIEW explode operations and 
WHERE clause filtering return incorrect results. WHERE clause conditions could 
be dropped during post-Calcite PPD processing, causing the query to return rows 
that should be filtered out.

I can reliably reproduce the bug with 2+ occurrences of "LATERAL VIEW 
explode()" *and* a WHERE clause; it does not seem to happen with only one 
"LATERAL VIEW explode()" or without a WHERE clause.

Originally reproduced with published 4.0.1 container, then verified with 
locally built current master branch.

I was *not* able to reproduce it with unit tests.

{{*Minimal Reproduction Case:*}}
{{  }}
{{  DROP TABLE IF EXISTS cbo_lateral_view_where_bug;}}
{{  CREATE TABLE cbo_lateral_view_where_bug (values array<string>);}}
{{  INSERT INTO cbo_lateral_view_where_bug SELECT array('a', 'b');}}
{{{}  ANALYZE TABLE cbo_lateral_view_where_bug COMPUTE STATISTICS;{}}}{\{  **  
}}

{{  *-- Correct result (CBO OFF)*}}
{{  SET hive.cbo.enable=false;}}
{{  SELECT first_val, second_val}}
{{  FROM cbo_lateral_view_where_bug}}
{{  LATERAL VIEW explode(values) lv1 AS first_val}}
{{  LATERAL VIEW explode(values) lv2 AS second_val}}
{{  WHERE first_val != second_val;}}{{  *-- Incorrect result (CBO ON)*}}
{{  SET hive.cbo.enable=true;}}
{{  SELECT first_val, second_val}}
{{  FROM cbo_lateral_view_where_bug}}
{{  LATERAL VIEW explode(values) lv1 AS first_val}}
{{  LATERAL VIEW explode(values) lv2 AS second_val}}
{{{}  WHERE first_val != second_val;{}}}{\{  **  }}

{{  *–- Expected vs Actual Results:*}}
{{  - CBO OFF (correct): 2 rows: ('a','b') and ('b','a')}}
{{  - CBO ON (incorrect): 4 rows: ('a','b'), ('b','a'), ('a','a'), ('b','b')}}

  *Root Cause:*
  ASTConverter's LV processing assigns the same tableAlias value to all columns 
of the query, even ones generated as LATERAL VIEW. This causes certain 
conditions (OR, "!=") between columns of different LV clauses (and even the LV 
clause and the base table) to be "incorrectly pushable" during PPD processing

  *Impact:*
  - Data Correctness: Queries return wrong results, not just performance issues
  - Scope: Requires 2+ LATERAL VIEW explode operations with WHERE filtering
  - Severity: Silent data corruption - users may not notice incorrect results

  {*}Workaround{*}:
Set {*}hive.cbo.enable{*}=false for queries with multiple lateral view 
cross-products and WHERE clause filtering. Disabling PPD also seems to bypass 
the problem.

 

Attachments:
 - [^hive_cbo_lateral_view_where_bug.sql] - SQL to reproduce
 - Hive log: [^hive_cbo_lateral_view_where_bug.log]


> Wrong results for LATERAL VIEW queries due to incorrect WHERE filter removal
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-29084
>                 URL: https://issues.apache.org/jira/browse/HIVE-29084
>             Project: Hive
>          Issue Type: Bug
>          Components: CBO
>    Affects Versions: 4.1.0, 4.0.1, 4.2.0
>         Environment:   - Affected Versions: Hive 4.0.1, 4.2.0-SNAPSHOT 
> (likely all CBO+lateral view versions)
>   - Execution Engine: MapReduce (local mode)
>   - Metastore: Derby embedded
>            Reporter: Konstantin Bereznyakov
>            Assignee: Konstantin Bereznyakov
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0
>
>         Attachments: HIVE-29084-ASTConverter.patch, 
> HIVE-29084-after-fix-filter-schema-different-table-aliases.png, 
> HIVE-29084-before-fix-filter-schema-different-table-aliases.png, 
> hive-29084.POC.patch, hive_29084.q, hive_29084.q.out, 
> hive_cbo_lateral_view_where_bug.log, hive_cbo_lateral_view_where_bug.sql
>
>
> When CBO is enabled, queries with multiple LATERAL VIEW explode operations 
> and WHERE clause filtering return incorrect results. WHERE clause conditions 
> could be dropped during post-Calcite PPD processing, causing the query to 
> return rows that should be filtered out.
> I can reliably reproduce the bug with 2+ occurrences of "LATERAL VIEW 
> explode()" *and* a WHERE clause; it does not seem to happen with only one 
> "LATERAL VIEW explode()" or without a WHERE clause.
> Originally reproduced with published 4.0.1 container, then verified with 
> locally built current master branch.
> I was *not* able to reproduce it with unit tests.
> {{*Minimal Reproduction Case:*}}
> {{  }}
> {{  DROP TABLE IF EXISTS cbo_lateral_view_where_bug;}}
> {{  CREATE TABLE cbo_lateral_view_where_bug (values array<string>);}}
> {{  INSERT INTO cbo_lateral_view_where_bug SELECT array('a', 'b');}}
> {{{}  ANALYZE TABLE cbo_lateral_view_where_bug COMPUTE STATISTICS;{}}}{\{  ** 
>  }}
> {{  *-- Correct result (CBO OFF)*}}
> {{  SET hive.cbo.enable=false;}}
> {{  SELECT first_val, second_val}}
> {{  FROM cbo_lateral_view_where_bug}}
> {{  LATERAL VIEW explode(values) lv1 AS first_val}}
> {{  LATERAL VIEW explode(values) lv2 AS second_val}}
> {{  WHERE first_val != second_val;}}
> {{  *-- Incorrect result (CBO ON)*}}
> {{  SET hive.cbo.enable=true;}}
> {{  SELECT first_val, second_val}}
> {{  FROM cbo_lateral_view_where_bug}}
> {{  LATERAL VIEW explode(values) lv1 AS first_val}}
> {{  LATERAL VIEW explode(values) lv2 AS second_val}}
> {{{}  WHERE first_val != second_val;{}}}{\{  **  }}
> {{  *–- Expected vs Actual Results:*}}
> {{  - CBO OFF (correct): 2 rows: ('a','b') and ('b','a')}}
> {{  - CBO ON (incorrect): 4 rows: ('a','b'), ('b','a'), ('a','a'), ('b','b')}}
>   *Root Cause:*
>   ASTConverter's LV processing assigns the same tableAlias value to all 
> columns of the query, even ones generated as LATERAL VIEW. This causes 
> certain conditions (OR, "!=") between columns of different LV clauses (and 
> even the LV clause and the base table) to be "incorrectly pushable" during 
> PPD processing
>   *Impact:*
>   - Data Correctness: Queries return wrong results, not just performance 
> issues
>   - Scope: Requires 2+ LATERAL VIEW explode operations with WHERE filtering
>   - Severity: Silent data corruption - users may not notice incorrect results
>   {*}Workaround{*}:
> Set {*}hive.cbo.enable{*}=false for queries with multiple lateral view 
> cross-products and WHERE clause filtering. Disabling PPD also seems to bypass 
> the problem.
>  
> Attachments:
>  - [^hive_cbo_lateral_view_where_bug.sql] - SQL to reproduce
>  - Hive log: [^hive_cbo_lateral_view_where_bug.log]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to