[
https://issues.apache.org/jira/browse/HADOOP-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614408#action_12614408
]
Ari Rabkin commented on HADOOP-657:
-----------------------------------
I don't have strong feelings about whether to do space-consumed measurement in
the TaskTracker or the Task. I figured it made more sense to fill out the
whole TaskStatus in one place.Otherwise you get confused in the TaskTracker
code, whether or not the space-consumed has been filled in yet. I'm open to
doing this the other way 'round, and having TaskTracker responsible for it.
Certainly if there were other similar resource counters being filled in in
TaskTracker, this one ought to be.
I was tempted to use metrics for this, and looked at piggybacking of this sort
of thing more generally on heartbeats. I was promptly shot down. There was a
strong sentiment, notably from Owen and Arun, that Hadoop's core functionality
shouldn't depend on Metrics, and that Metrics should just be for analytics.
> Free temporary space should be modelled better
> ----------------------------------------------
>
> Key: HADOOP-657
> URL: https://issues.apache.org/jira/browse/HADOOP-657
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.17.0
> Reporter: Owen O'Malley
> Assignee: Ari Rabkin
> Fix For: 0.19.0
>
> Attachments: clean_spaceest.patch, diskspaceest.patch,
> diskspaceest_v2.patch, diskspaceest_v3.patch, diskspaceest_v4.patch
>
>
> Currently, there is a configurable size that must be free for a task tracker
> to accept a new task. However, that isn't a very good model of what the task
> is likely to take. I'd like to propose:
> Map tasks: totalInputSize * conf.getFloat("map.output.growth.factor", 1.0) /
> numMaps
> Reduce tasks: totalInputSize * 2 * conf.getFloat("map.output.growth.factor",
> 1.0) / numReduces
> where totalInputSize is the size of all the maps inputs for the given job.
> To start a new task,
> newTaskAllocation + (sum over running tasks of (1.0 - done) * allocation)
> >=
> free disk * conf.getFloat("mapred.max.scratch.allocation", 0.90);
> So in English, we will model the expected sizes of tasks and only task tasks
> that should leave us a 10% margin. With:
> map.output.growth.factor -- the relative size of the transient data relative
> to the map inputs
> mapred.max.scratch.allocation -- the maximum amount of our disk we want to
> allocate to tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.