[ https://issues.apache.org/jira/browse/HIVE-18441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321883#comment-16321883 ]
Hengyu Dai commented on HIVE-18441: ----------------------------------- I will change a manner by recovering the null schema path instead of to remove all null schema path, as the latter one affects NullScanOptimizer. > NullPointerException due to Hadoop23Shims doesn't compatible with Hadoop 2.2 > ---------------------------------------------------------------------------- > > Key: HIVE-18441 > URL: https://issues.apache.org/jira/browse/HIVE-18441 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 2.1.1, 2.2.0, 2.3.0 > Reporter: Hengyu Dai > Attachments: HIVE-18441.01.patch, HIVE-18441.patch, hadoop2.2.jpg, > hadoop2.9.jpg > > > Hive 2.x is not compatible with hadoop 2.2 (maybe there is same problem in > other hadoop version too) when "nullscan" path is existed. > here is the listStatus() method in Hadoop23Shims.java > {code:java} > protected List<FileStatus> listStatus(JobContext job) throws IOException { > List<FileStatus> result = super.listStatus(job); > Iterator<FileStatus> it = result.iterator(); > while (it.hasNext()) { > FileStatus stat = it.next(); > if (!stat.isFile() || (stat.getLen() == 0 && > !stat.getPath().toUri().getScheme().equals("nullscan"))) { > it.remove(); > } > } > return result; > } > {code} > the first line "super.listStatus(job)" get different FileStatus object from > Hadoop 2.2 and Hadoop 2.9 > I have tested Hive2.1 with Hadoop2.2, Hive2.1 with Hadoop2.9, and NPE occurs > in Hive2.1 with Hadoop2.2 > My test SQL is > {code:java} > select * from (select key from src where false) a left outer join (select key > from srcpart limit 0) b on a.key=b.key; > {code} > it's from optimize_nullscan.q, table src and srcpart in the SQL is created by > q_test_init.sql. > the problem is, in hadoop 2.2, super.listStatus(job) returns a FileStatus > object whose "Path" field doesn't contain a schema for "nullscan" path, so, > "stat.getPath().toUri().getScheme()" in the if statement get NULL, and call > null.equals("nullscan") will lead NPE. > In contrast, super.listStatus(job) will get a valid Path whose schema is > "nullscan". > the debug pictures from Hadoop 2.2 and Hadoop 2.9 is attached, we can see the > result list returned by super.listStatus(job) is different, Hadoop 2.2 gets > "/default.srcpart/part..." and Hadoop 2.9 get > "nullscan://null/default.srcpart/part..." > (this bug is not happened with normal path like "hdfs://..." ) > we should take consideration of stat.getPath().toUri().getScheme() returns > null. -- This message was sent by Atlassian JIRA (v6.4.14#64029)