[ 
https://issues.apache.org/jira/browse/CRUNCH-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015179#comment-14015179
 ] 

Chao Shi commented on CRUNCH-408:
---------------------------------

Thanks Josh. After debugging into the code, I found there is a behavior 
difference in globStatus on hadoop 2. Your patch looks good to me.

> HFileSource does not estimate the size of input correctly when there is a 
> wildcard in path
> ------------------------------------------------------------------------------------------
>
>                 Key: CRUNCH-408
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-408
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.8.2, 0.10.0
>            Reporter: Chao Shi
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: CRUNCH-408b.patch, crunch-408.patch
>
>
> The cause is that it calls FileSystem#listStatus rather than 
> FileSystem#globStatus to retrieve the list of files under the given path. So 
> the fix is straight forward. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to