This should be the default behavior for configurations when locality
(processing the data on the same node where the input resides) cannot be
achieved anyway.
For example in cases MapReduce job runs on a (small) subset the DFS nodes.
Runping Qi (JIRA) wrote:
It is nice if hadoop provides NonSplitable TextInputFormat and
SequenceFileInputFormat
--------------------------------------------------------------------------------------
Key: HADOOP-1617
URL: https://issues.apache.org/jira/browse/HADOOP-1617
Project: Hadoop
Issue Type: Improvement
Reporter: Runping Qi
As more applications find -reduce NONE option useful, they also want to control
the splitability of the inputs.
A simple way to do this is to implement a class that extends
SequenceFileInputFormat/TextInputFormat class.
It would be nice if the Framework provides such classes.