[
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963278#comment-13963278
]
Jason Lowe commented on MAPREDUCE-5465:
---------------------------------------
Thanks for picking this up, Ming. Some comments on the patch:
- The new interface for TaskAttemptListener seems more complicated than
necessary. IIUC it's not valid to call registerFinishingTask without having
called unregister essentially at the same time, so wondering if we should just
add a single method, something like setTaskFinishing(TaskAttemptId,
WrappedJvmID) to denote when the task JVM has finished and we're simply waiting
for the container to complete. A typical lifecycle could pass a null JVM to
the unregister method if the JVM already finished, and we can still skip the
finishing method and go straight from registerLauchedTask -> unregister if the
task container exits unexpectedly. Either that or it seems the
TaskAttemptExitHandler should not be part of TaskAttemptListener and instead
conveyed separately to the TaskAttemptImpl (possibly via AppContext along with
other things), since it seems tacked on and not related to the other workings
of TaskAttemptListener.
- It looks like we're only handling the successful task case. We don't want to
proactively kill tasks that have reported failure just like we don't want to
proactively kill tasks that have reported success.
- Why are we transitioning from FINISHING_CONTAINER to
SUCCESS_CONTAINER_CLEANUP rather than to SUCCEEDED when we receive a container
completed event? The SUCCESS_CONTAINER_CLEANUP state is for waiting for a
container completed event to arrive, but we are leaving the FINISHING_CONTAINER
state due to the arrival of that very event.
- The new properties should be added to mapred-default.xml for documentation
purposes.
- Suggestion: TaskAttemptFinishingMonitor may be a more accurate name than
TaskAttemptExitHandler
- Nit: The default values for the new properties should be named in MRJobConfig
- Nit: comments should be formatted for 80 columns
- Nit: In the comment for the new FINISHING_CONTAINER state: "That will a
chance" should be "That will give a chance"
- Nit: testFinshingAttemptTimedout -> testFinishingAttemptTimeout
- Nit: testTaskAttemptDiognosticEventOnFinishing ->
testTaskAttemptDiagnosticEventOnFinishing
> Container killed before hprof dumps profile.out
> -----------------------------------------------
>
> Key: MAPREDUCE-5465
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am, mrv2
> Affects Versions: trunk, 2.0.3-alpha
> Reporter: Radim Kolar
> Assignee: Ming Ma
> Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch,
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps
> profile.out at process exit. It is dumped after task signaled to AM that work
> is finished.
> AM kills container with finished work without waiting for hprof to finish
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3
> works) , it could not finish dump in time before being killed making entire
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is
> enabled.
--
This message was sent by Atlassian JIRA
(v6.2#6252)