John LeBrun created HIVE-21853:
----------------------------------

             Summary: NPE in StatsUtils.getWritableSize() when value passed in 
is null
                 Key: HIVE-21853
                 URL: https://issues.apache.org/jira/browse/HIVE-21853
             Project: Hive
          Issue Type: Bug
         Environment: Hortonworks 
 * Ambari version 2.7.3.0
 * HDP stack version 3.1
 * HDP stack repo version 3.1.0.0
 * stack vdf version 3.1.0.0-78
            Reporter: John LeBrun


getWritableSize(ObjectInspector oi, Object value) method in 
org.apache.hadoop.hive.ql.stats.StatsUtils class fails with NPE when 2nd 
parameter (Object value) is null.

Attached is patch with unit test and fix

Issue was originally found when running UDF query against Hortonworks cluster 
with HDP 3.1 running Hive 3.1.0. The issue occurs when executing the UDF 
against a cluster using the tez execution engine

beeline hive configurations

set hive.execution.engine=tez;
set hive.fetch.task.conversion=none;

Attached is sample code with an implementation of a simple UDF that duplicates 
the behavior.

steps to reproduce

on a Hortonworks cluster with HDP 3.1 deployed

-start beeline Hive session
-set above hive configurations
-add jar containing UDF from sample code
-create table containing one string column
    create table tmptable(col1 string)
    insert into table tmptable values ('somestring')
-create function bugUdf as 'BugUDF';
-select bugUdf from tmptable;
this will result in a null pointer exception similar to this

ql.Driver ()) - FAILED: NullPointerException nulljava.lang.NullPointerException 
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getWritableSize(StatsUtils.java:1373)
 
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getSizeOfStruct(StatsUtils.java:1356)
 
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getSizeOfComplexTypes(StatsUtils.java:1212)
 
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getAvgColLenOf(StatsUtils.java:1140) 
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExpression(StatsUtils.java:1584)
 
at 
org.apache.hadoop.hive.ql.stats.StatsUtils.getColStatisticsFromExprMap(StatsUtils.java:1424)
 
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$SelectStatsRule.process(StatsRulesProcFactory.java:196)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 
at 
org.apache.hadoop.hive.ql.lib.LevelOrderWalker.startWalking(LevelOrderWalker.java:122)
 
at 
org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78)
 
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:397)
 
at 
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:161)
 
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:148) 
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12443)
 
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358)
 
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:285)
 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:664) 
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1863) 
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1810) 
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1805) 
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:197)
 
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262)
 
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) 
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
 
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
 
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
 
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
 
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
 
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
at 
org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
 
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to