[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038086#comment-13038086
 ] 

Ramkumar Vadali commented on MAPREDUCE-2186:
--------------------------------------------

The main motivation to open this jira was to allow CombineFileInputFormat to 
work when there are missing blocks. CombineFileInputFormat figures out the 
host/rack information for input blocks and uses that information to create 
input splits. It does not handle the case where a block does not have any 
host/rack information.

The proposed fix to return the location of parity blocks in the case where 
source blocks are missing is not good because it is fixing the problem in the 
wrong place. It also causes us to get false locality. 
Instead of changing RAID FS to handle this case, its better to fix CFIF to 
handle the case when there are missing blocks (MAPREDUCE-2185)

> DistributedRaidFileSystem should implement getFileBlockLocations()
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>
> If a RAIDed file has missing blocks, 
> DistributedRaidFileSystem.getFileBlockLocations() would return no block 
> locations. This could lead a client to believe that the file is not readable. 
> But if parity data is available, the file actually is readable.
> It would be better to implement getFileBlockLocations() and return the 
> location of the parity blocks that would be needed to reconstruct the missing 
> block.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to