[ 
https://issues.apache.org/jira/browse/PIG-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2856:
-------------------------------

    Attachment: PIG-2856-2.patch

Regarding why this problem was not caught by testGlob1, there are actually two 
reasons:

# The expected output was incorrect (as mentioned above).
# The job status was not checked at all. So even though the job failed, the 
test still passed if it generated the expected output. In testGlob1, the job 
failed after loading 3 files, but since that happened to be the expected 
output, the test still passed. 

I've updated the patch so that not only is the expected output for testGlob1 
updated, but the job status also is checked.

Thanks!
                
> AvroStorage doesn't load files in the directories when a glob pattern matches 
> both files and directories.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2856
>                 URL: https://issues.apache.org/jira/browse/PIG-2856
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.11
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>         Attachments: PIG-2856-2.patch, PIG-2856.patch
>
>
> This is a regression from PIG-2492.
> When a glob pattern such as '*' matches not only files but also directories, 
> AvroStorage does not load files in the directories. This is a bug in 
> getAllSubDirs() that can be fixed as follows:
> {code}
> static boolean getAllSubDirs(Path path, Job job, Set<Path> paths)
> ...
> FileStatus[] matchedFiles = fs.globStatus(path, PATH_FILTER);
> ...
> for (FileStatus file : matchedFiles) {
>     if (file.isDir()) {
> -        for (FileStatus sub : fs.listStatus(path)) {
> +        for (FileStatus sub : fs.listStatus(file.getPath())) {
>             getAllSubDirs(sub.getPath(), job, paths);
>         }
>     }
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to