[
https://issues.apache.org/jira/browse/HADOOP-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting updated HADOOP-1441:
---------------------------------
Status: Open (was: Patch Available)
One can already implement this with either:
{code}
class MyInputFormat extends FileInputFormat {
protected boolean isSplitable(FileSystem fs, Path path) { return false; }
}
job.setInputFormat(MyInputFormat.class);
{code}
or even more simply with
{code}
job.setLong("mapred.min.split.size", Long.MAX_VALUE);
{code}
So I'm not convinced we need this.
Also, if we were to a new FileInputFormat configuration parameter for this,
then it should have a name that indicates it's specific to FileInputFormat,
like "mapred.fileinputformat.splitable", and we should add static methods in
FileInputFormat to get and set it. (We have not been good about this in the
past, but, for new code, that's the preferred style.) And then we probably
don't need to document it in hadoop-default.xml, since it's not something folks
would need to specify in a config file.
> Splittability of input should be controllable by application
> ------------------------------------------------------------
>
> Key: HADOOP-1441
> URL: https://issues.apache.org/jira/browse/HADOOP-1441
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.12.3
> Environment: ALL
> Reporter: Milind Bhandarkar
> Assignee: Milind Bhandarkar
> Fix For: 0.14.0
>
> Attachments: HADOOP-1441_1.patch
>
>
> Currently, isSplittable method of FileInputFormat always returns true. For
> some applications, it becomes necessary that the map task process entire
> file, rather than a block. Therefore, splittability of input (i.e.
> block-level split vs file-level-split) should be controllable by user via a
> configuration variable. The default could be block-level split, as is.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.