Shubham Sharma created HIVE-29516:
-------------------------------------

             Summary: NullPointerException in StatsUtils.updateStats when 
column statistics are unavailable during semijoin optimization
                 Key: HIVE-29516
                 URL: https://issues.apache.org/jira/browse/HIVE-29516
             Project: Hive
          Issue Type: Bug
          Components: Query Processor, Statistics
    Affects Versions: 4.2.0
            Reporter: Shubham Sharma
             Fix For: 4.3.0


h3. Problem

Query compilation fails with {{NullPointerException}} in 
{{StatsUtils.updateStats()}} when column statistics are not available for 
certain operators. This occurs during the semijoin optimization phase in 
{{{}TezCompiler.removeSemijoinOptimizationByBenefit(){}}}.

The issue is reproducible with TPC-DS queries at scale factors of 100GB or 
higher, where column-level statistics may be incomplete or unavailable for some 
tables.

 

 
{code:java}
java.lang.NullPointerException
    at 
org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2067)
    at 
org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1982)
    at 
org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:539)
    at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:238)
    at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:174)
    at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.compilePlan(SemanticAnalyzer.java:12521)
    at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12739)
    at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:460)
    ... {code}
h2. How to Reproduce
 # Generate TPC-DS dataset at 100GB or larger scale
 # Run TPC-DS queries that involve semijoin optimizations (queries with 
subqueries or complex joins, eg: 10 17 19 23 24 25 29 32) 
 # Ensure column statistics are not fully computed for all tables
 # Observe NPE during query compilation

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to