[
https://issues.apache.org/jira/browse/HADOOP-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526079
]
Thomas Friol commented on HADOOP-1818:
--------------------------------------
{quote}
1. MultiFileSplit actually is intended to be used in cases where numPaths >>
numSplits, so this issue is an extreme case (just for the record)
{quote}
In our case: we are using MultiFileSplit because the number of input paths is
unknown before the job is running. It can vary from 0 to thousands paths.
{quote}
2. The argument intToWrite in TestFileInputFormat#initFiles() is not used
effectively, so please remove it.
{quote}
Yes it is used:
{noformat}
if (intToWrite == -1) {
intToWrite = rand.nextInt();
}
out.write(intToWrite);
{noformat}
and called from the unit test:
{noformat}
public void testFormatWithLessPathsThanSplits() throws Exception {
MultiFileInputFormat format = new DummyMultiFileInputFormat();
FileSystem fs = FileSystem.getLocal(job);
// Test with no path
initFiles(fs, 0, -1, -1);
assertEquals(0, format.getSplits(job, 2).length);
// Test with 2 path and 4 splits
initFiles(fs, 2, 500, 20);
assertEquals(2, format.getSplits(job, 4).length);
}
{noformat}
> MutliFileInputFormat returns "empty" MultiFileSplit when number of paths <
> number of splits
> -------------------------------------------------------------------------------------------
>
> Key: HADOOP-1818
> URL: https://issues.apache.org/jira/browse/HADOOP-1818
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.14.0, 0.14.1
> Reporter: Thomas Friol
> Attachments: HADOOP-1818.patch
>
>
> Coming with a patch soon.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.