[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846837#comment-13846837 ]
Ashutosh Chauhan commented on HIVE-6016: ---------------------------------------- Sorry, I was confused those are not 2 loops, but constructor and overloaded method. Patch looks good. +1 > Hadoop23Shims has a bug in listLocatedStatus impl. > -------------------------------------------------- > > Key: HIVE-6016 > URL: https://issues.apache.org/jira/browse/HIVE-6016 > Project: Hive > Issue Type: Bug > Components: Shims > Affects Versions: 0.13.0 > Reporter: Sushanth Sowmyan > Assignee: Prasanth J > Attachments: HIVE-6016.1.patch > > > Prashant and I discovered that the implementation of the wrapping Iterator in > listLocatedStatus at > https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 > is broken. > Basically, if you had files (a,b,_s) , with a filter that is supposed to > filter out _s, we expect an output result of (a,b). Instead, we get > (a,b,null), with hasNext looking at the next value to see if it's null, and > using that to decide if it has any more entries, and thus, (a,b,_s) becomes > (a,b). > There's a boundary condition on the very first pick, which causes a (_s,a,b) > to result in (_s,a,b), bypassing the filter, and thus, we wind up with a > resultant unfiltered (_s,a,b) which orc breaks on. > The effect of this bug is that Orc will not be able to read directories where > there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)