[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422360#comment-13422360
 ] 

Mariappan Asokan commented on MAPREDUCE-4470:
---------------------------------------------

I think it does break other InputFormat implementations.  For example, 
FileInputFormat returns number of splits as 1 for an empty input file and the 
method getLocations() in FileSplit returns a 0 length array.  This can cause 
array index out of bound exception if one is not careful enough to check the 
array length.

There is also another potential problem.  When an MR job is run with an empty 
input, if the number of splits is 0, the number of mappers will be 0 and number 
of reducers can be non-zero.  I am not sure whether the MR job will run 
successfully.  My preference is to have all InputFormat implementations return 
1 split of size 0 when the input is empty.

On a side note, when this test failed, the assertion message
{code}
testForEmptyFile(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat):
 expected:<0> but was:<1>
{code}
was confusing.  It should be:
{code}
testForEmptyFile(org.apache.hadoop.mapreduce.lib.input.TestCombineFileInputFormat):
 expected:<1> but was:<0>
{code}

Upon examining the test, there are several places where the parameter order is 
wrong to assertEquals().  I raised MAPREDUCE-4479 and posted a patch there.
                
> Fix TestCombineFileInputFormat.testForEmptyFile
> -----------------------------------------------
>
>                 Key: MAPREDUCE-4470
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4470
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0-alpha
>            Reporter: Kihwal Lee
>             Fix For: 2.1.0-alpha, 3.0.0
>
>
> TestCombineFileInputFormat.testForEmptyFile started failing after 
> HADOOP-8599. 
> It expects one split on an empty input file, but with HADOOP-8599 it gets 
> zero. The new behavior seems correct, but is it breaking anything else?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to