[
https://issues.apache.org/jira/browse/HADOOP-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Enis Soztutar resolved HADOOP-1054.
-----------------------------------
Resolution: Duplicate
HADOOP-1515 does exactly the same.
> Add more then one input file per map?
> -------------------------------------
>
> Key: HADOOP-1054
> URL: https://issues.apache.org/jira/browse/HADOOP-1054
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.11.2
> Reporter: Johan Oskarsson
> Priority: Trivial
>
> I've got a problem with mapreduce overhead when it comes to small input files.
> Roughly 100 mb comes in to the dfs every few hours. Then afterwards data
> related to that batch might be added on for another few weeks.
> The problem is that this data is roughly 4-5 kbytes per file. So for every
> reasonably big file we might have 4-5 small ones.
> As far as I understand it each small file will get assigned a task of it's
> own. This causes performance issues since the overhead of such small
> files is pretty big.
> Would it be possible to have hadoop assign multiple files to a map task up
> until a configurable limit?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.