[
https://issues.apache.org/jira/browse/HIVE-29516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18072671#comment-18072671
]
Stamatis Zampetakis commented on HIVE-29516:
--------------------------------------------
The test that was added in the PR produces the following stack trace on current
master (d4d166d51d03ecbb1411f4eb701bb0310786a3f9) without the fix:
{noformat}
java.lang.NullPointerException: Cannot invoke "java.util.List.iterator()"
because "colStats" is null
at
org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2036)
at
org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1980)
at
org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:566)
at
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:247)
at
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:182)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:13159)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:13384)
at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:481)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:187)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:358)
at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:224)
at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:109)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:499)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:451)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:415)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:409)
{noformat}
Putting it here for future reference.
> NPE in StatsUtils.updateStats when removing semijoin by benefit and column
> statistics are missing
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-29516
> URL: https://issues.apache.org/jira/browse/HIVE-29516
> Project: Hive
> Issue Type: Bug
> Components: Query Processor, Statistics
> Affects Versions: 4.2.0
> Reporter: Shubham Sharma
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> h3. Problem
> Query compilation fails with {{NullPointerException}} in
> {{StatsUtils.updateStats()}} when column statistics are not available for
> certain operators. This occurs during the semijoin optimization phase in
> {{{}TezCompiler.removeSemijoinOptimizationByBenefit(){}}}.
> The issue is reproducible with TPC-DS queries at scale factors of 100GB or
> higher, where column-level statistics may be incomplete or unavailable for
> some tables.
>
>
> {code:java}
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2067)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1982)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:539)
> at
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:238)
> at
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:174)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:12521)
> at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12739)
> at
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460)
> ... {code}
> h2. How to Reproduce
> # Generate TPC-DS dataset at 100GB or larger scale
> # Run TPC-DS queries that involve semijoin optimizations (queries with
> subqueries or complex joins, eg: 10 17 19 23 24 25 29 32)
> # Ensure column statistics are not fully computed for all tables
> # Observe NPE during query compilation
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)