[ 
https://issues.apache.org/jira/browse/HADOOP-1818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526079
 ] 

Thomas Friol commented on HADOOP-1818:
--------------------------------------

{quote}
1. MultiFileSplit actually is intended to be used in cases where numPaths >> 
numSplits, so this issue is an extreme case (just for the record)
{quote}
In our case: we are using MultiFileSplit because the number of input paths is 
unknown before the job is running. It can vary from 0 to thousands paths.

{quote}
2. The argument intToWrite in TestFileInputFormat#initFiles() is not used 
effectively, so please remove it.
{quote}
Yes it is used:

{noformat}
if (intToWrite == -1) {
    intToWrite = rand.nextInt();
}
out.write(intToWrite);
{noformat}

and called from the unit test:
{noformat}
public void testFormatWithLessPathsThanSplits() throws Exception {
  MultiFileInputFormat format = new DummyMultiFileInputFormat();
  FileSystem fs = FileSystem.getLocal(job);     
  
  // Test with no path
  initFiles(fs, 0, -1, -1);    
  assertEquals(0, format.getSplits(job, 2).length);
  
  // Test with 2 path and 4 splits
  initFiles(fs, 2, 500, 20);
  assertEquals(2, format.getSplits(job, 4).length);
}
{noformat}

> MutliFileInputFormat returns "empty" MultiFileSplit when number of paths < 
> number of splits
> -------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1818
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1818
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0, 0.14.1
>            Reporter: Thomas Friol
>         Attachments: HADOOP-1818.patch
>
>
> Coming with a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to