[
https://issues.apache.org/jira/browse/CRUNCH-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015179#comment-14015179
]
Chao Shi commented on CRUNCH-408:
---------------------------------
Thanks Josh. After debugging into the code, I found there is a behavior
difference in globStatus on hadoop 2. Your patch looks good to me.
> HFileSource does not estimate the size of input correctly when there is a
> wildcard in path
> ------------------------------------------------------------------------------------------
>
> Key: CRUNCH-408
> URL: https://issues.apache.org/jira/browse/CRUNCH-408
> Project: Crunch
> Issue Type: Bug
> Affects Versions: 0.8.2, 0.10.0
> Reporter: Chao Shi
> Fix For: 0.10.0, 0.8.3
>
> Attachments: CRUNCH-408b.patch, crunch-408.patch
>
>
> The cause is that it calls FileSystem#listStatus rather than
> FileSystem#globStatus to retrieve the list of files under the given path. So
> the fix is straight forward.
--
This message was sent by Atlassian JIRA
(v6.2#6252)