Prasanth Jayachandran created HIVE-12712:
--------------------------------------------
Summary: HiveInputFormat may fail to column names to read in some
cases
Key: HIVE-12712
URL: https://issues.apache.org/jira/browse/HIVE-12712
Project: Hive
Issue Type: Bug
Affects Versions: 2.0.0, 2.1.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
The primary issue is when plan is generated pathToAliases map is populated with
directory paths to table aliases. pathToAliases.put() uses path.toString() as
map key. During probing, path.toUri().toString() is used. This can cause probe
misses when path contains spaces in them. path.toUri() will escape the spaces
in the path whereas path.toString() does not escape the spaces. As a result,
HiveInputFormat can trigger a different code path which can fail to set list of
columns to read from the source table. This was causing unexpected NPE in
OrcInputFormat (after refactoring HIVE-11705) which removed null check for
column names. The resulting exception is
{code}
Caused by: java.lang.RuntimeException: ORC split generation failed with
exception: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1288)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1354)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:367)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:457)
at
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:152)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)
at
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.NullPointerException
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1282)
... 15 more
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:422)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:417)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$2000(OrcInputFormat.java:134)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1072)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:919)
... 4 more
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)