[
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Prasanth J updated HIVE-6016:
-----------------------------
Status: Patch Available (was: Open)
Making it as patch available for precommit tests.
> Hadoop23Shims has a bug in listLocatedStatus impl.
> --------------------------------------------------
>
> Key: HIVE-6016
> URL: https://issues.apache.org/jira/browse/HIVE-6016
> Project: Hive
> Issue Type: Bug
> Components: Shims
> Affects Versions: 0.13.0
> Reporter: Sushanth Sowmyan
> Assignee: Prasanth J
> Attachments: HIVE-6016.1.patch
>
>
> Prashant and I discovered that the implementation of the wrapping Iterator in
> listLocatedStatus at
> https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
> is broken.
> Basically, if you had files (a,b,_s) , with a filter that is supposed to
> filter out _s, we expect an output result of (a,b). Instead, we get
> (a,b,null), with hasNext looking at the next value to see if it's null, and
> using that to decide if it has any more entries, and thus, (a,b,_s) becomes
> (a,b).
> There's a boundary condition on the very first pick, which causes a (_s,a,b)
> to result in (_s,a,b), bypassing the filter, and thus, we wind up with a
> resultant unfiltered (_s,a,b) which orc breaks on.
> The effect of this bug is that Orc will not be able to read directories where
> there is a _SUCCESS file, say, as the first entry returned by the FileStatus.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)