[
https://issues.apache.org/jira/browse/TEZ-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058560#comment-14058560
]
Jeff Zhang edited comment on TEZ-1122 at 7/11/14 12:31 PM:
-----------------------------------------------------------
Attach the patch.
Changes in this patch
* Move the Task State Transition into getTask(). And make it sync call rather
than async call. This transition is very lightweight (LaunchTransition in
TaskImpl do nothing), so it wouldn't block the RPC call getTask()
* Update the testcase
was (Author: zjffdu):
Attach the patch.
Changes in this patch
* Move the Task State Transition into getTask(). And make it sync call rather
than async call. This transition is very lightweight (LaunchTransition in
TaskImpl is empty), so it wouldn't block the RPC call getTask()
* Update the testcase
> Race between canCommit and Task moving into RUNNING state
> ---------------------------------------------------------
>
> Key: TEZ-1122
> URL: https://issues.apache.org/jira/browse/TEZ-1122
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Siddharth Seth
> Assignee: Jeff Zhang
> Priority: Critical
> Attachments: Tez-1122.patch
>
>
> A task moves into RUNNING state via async events generated after a
> TaskAttempt moves into RUNNING state, which is triggered by getTask().
> canCommit() is a synchronous call on the umbilical - for short running tasks,
> a canCommit can come in before the async events are handled.
> {code}
> 2014-05-15 13:21:15,531 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: TaskAttempt:
> [attempt_1400183444139_0007_1_00_000000_0] started. Is using containerId:
> [container_1400183444139_0007_01_000002] on NM: []
> 2014-05-15 13:21:15,533 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_STARTED]:
> vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0,
> startTime=1400185273335, containerId=container_1400183444139_0007_01_000002,
> nodeId=,
> inProgressLogs=/node/containerlogs/container_1400183444139_0007_01_000002/,
> completedLogs=localhost:19888/jobhistory/logs///container_1400183444139_0007_01_000002/v_datagen_attempt_1400183444139_0007_1_00_000000_0/
> 2014-05-15 13:21:15,534 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl:
> attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from
> START_WAIT to RUNNING due to event TA_STARTED_REMOTELY
> 2014-05-15 13:21:15,534 INFO [IPC Server handler 6 on 61779]
> org.apache.tez.dag.app.dag.impl.TaskImpl: Task not running. Issuing kill to
> bad commit attempt attempt_1400183444139_0007_1_00_000000_0
> 2014-05-15 13:21:15,534 INFO [AMRM Callback Handler Thread]
> org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu: -1
> taskAllocations: 1
> 2014-05-15 13:21:15,537 INFO [AsyncDispatcher event handler]
> org.apache.tez.common.counters.Limits: Counter limits initialized with
> parameters: GROUP_NAME_MAX=128, MAX_GROUPS=500, COUNTER_NAME_MAX=64,
> MAX_COUNTERS=1200
> 2014-05-15 13:21:15,541 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskImpl: task_1400183444139_0007_1_00_000000
> Task Transitioned from SCHEDULED to RUNNING
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_FINISHED]:
> vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0,
> startTime=1400185273335, finishTime=1400185275542, timeTaken=2207,
> status=KILLED, diagnostics=, counters=Counters: 0
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl:
> attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from
> RUNNING to KILL_IN_PROGRESS due to event TA_KILL_REQUEST
> 2014-05-15 13:21:15,546 INFO [TaskSchedulerEventHandlerThread]
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event
> EventType: S_TA_ENDED
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)