[ https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911671#comment-13911671 ]
Jason Lowe commented on MAPREDUCE-5756: --------------------------------------- Are you sure that's the relevant code change? Looking at the patch above, both before and after the change it will recursively process directories. Am I missing something? Also [~jdere] verified in [a comment|https://issues.apache.org/jira/browse/MAPREDUCE-5756?focusedCommentId=13900772&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13900772] that the FileInputFormat.listStatus behavior didn't change between 1.x and 2.x with respect to directories. Instead it appears to be caused by MAPREDUCE-4470 which changed the way CombineFileInputFormat treats files without any blocks. Before it was failing to generate any splits for empty files, and afterwards it looks like it generates a degenerate split for them. Since directories also have no blocks, I'm wondering if that change caused it to also generate a degenerate split for directories as well as empty files. > CombineFileInputFormat.getSplits() including directories in its results > ----------------------------------------------------------------------- > > Key: MAPREDUCE-5756 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Jason Dere > > Trying to track down HIVE-6401, where we see some "is not a file" errors > because getSplits() is giving us directories. I believe the culprit is > FileInputFormat.listStatus(): > {code} > if (recursive && stat.isDirectory()) { > addInputPathRecursively(result, fs, stat.getPath(), > inputFilter); > } else { > result.add(stat); > } > {code} > Which seems to be allowing directories to be added to the results if > recursive is false. Is this meant to return directories? If not, I think it > should look like this: > {code} > if (stat.isDirectory()) { > if (recursive) { > addInputPathRecursively(result, fs, stat.getPath(), > inputFilter); > } > } else { > result.add(stat); > } > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)