[
https://issues.apache.org/jira/browse/HIVE-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414212#comment-13414212
]
Zhenxiao Luo commented on HIVE-3257:
------------------------------------
The problem is in
ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java:
in getSchema(), the FileSplit does not have the scheme part of path URI, in
this case, "pfile:".
The matching function pathIsInPartition() is checking whether the split starts
with patitionPath.
In hadoop0.23, partitionPath still holds pfile: prefix, while, FileSplit does
not. So, pathIsInPartition() returns false.
In hadoop0.20, both partitionPath and FileSplit hold pfile: prefix. So,
pathIsInPartition() returns true.
The root of the problem is in:
shims/src/common-secure/java/org/apache/hadoop/hive/shims/HadoopShimsSecure.java
ql/src/java/org/apache/hadoop/hive/ql/io/CombineHiveInputFormat.java
In getSplits(), hadoop0.23 removes scheme part of path URI in the
CombineFileInputFormat, in this case "pfile:". This diffs from hadoop0.20
behavior.
The same problem happens in HIVE-2737, HIVE-2778, HIVE-2784.
We already committed patches, which have workaround including checking whether
the path is schemeless or not.
Will do the same thing for this AvroGenericRecordReader
> Fix avro_joins.q testcase failure when building hive on hadoop0.23
> ------------------------------------------------------------------
>
> Key: HIVE-3257
> URL: https://issues.apache.org/jira/browse/HIVE-3257
> Project: Hive
> Issue Type: Bug
> Reporter: Zhenxiao Luo
> Assignee: Zhenxiao Luo
>
> avro_joins.q is failing when building hive on hadoop0.23 for both MR1 and
> MR2. It has an execution exception:
> This query fails when execution:
> SELECT e.title, e.air_date, d.first_name, d.last_name, d.extra_field,
> e.air_date
> FROM doctors4 d JOIN episodes e ON (d.number=e.doctor)
> ORDER BY d.last_name, e.title
> Execution failed with exit status: 2
> Obtaining error information
> Task failed!
> Task ID:
> Stage-1
> Logs:
> /home/cloudera/Code/hive/build/ql/tmp//hive.log
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira