[
https://issues.apache.org/jira/browse/TEZ-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000433#comment-14000433
]
Bikas Saha commented on TEZ-1122:
---------------------------------
We should look at Class (Vertex Task Attempt) specific dispatchers so that
slowness in one does not affect the others. That does not solve this jira but
goes towards improving general performance of the dispatcher model.
> Race between canCommit and Task moving into RUNNING state
> ---------------------------------------------------------
>
> Key: TEZ-1122
> URL: https://issues.apache.org/jira/browse/TEZ-1122
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.4.0
> Reporter: Siddharth Seth
> Priority: Critical
>
> A task moves into RUNNING state via async events generated after a
> TaskAttempt moves into RUNNING state, which is triggered by getTask().
> canCommit() is a synchronous call on the umbilical - for short running tasks,
> a canCommit can come in before the async events are handled.
> {code}
> 2014-05-15 13:21:15,531 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl: TaskAttempt:
> [attempt_1400183444139_0007_1_00_000000_0] started. Is using containerId:
> [container_1400183444139_0007_01_000002] on NM: []
> 2014-05-15 13:21:15,533 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_STARTED]:
> vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0,
> startTime=1400185273335, containerId=container_1400183444139_0007_01_000002,
> nodeId=,
> inProgressLogs=/node/containerlogs/container_1400183444139_0007_01_000002/,
> completedLogs=localhost:19888/jobhistory/logs///container_1400183444139_0007_01_000002/v_datagen_attempt_1400183444139_0007_1_00_000000_0/
> 2014-05-15 13:21:15,534 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl:
> attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from
> START_WAIT to RUNNING due to event TA_STARTED_REMOTELY
> 2014-05-15 13:21:15,534 INFO [IPC Server handler 6 on 61779]
> org.apache.tez.dag.app.dag.impl.TaskImpl: Task not running. Issuing kill to
> bad commit attempt attempt_1400183444139_0007_1_00_000000_0
> 2014-05-15 13:21:15,534 INFO [AMRM Callback Handler Thread]
> org.apache.tez.dag.app.rm.TaskScheduler: App total resource memory: 0 cpu: -1
> taskAllocations: 1
> 2014-05-15 13:21:15,537 INFO [AsyncDispatcher event handler]
> org.apache.tez.common.counters.Limits: Counter limits initialized with
> parameters: GROUP_NAME_MAX=128, MAX_GROUPS=500, COUNTER_NAME_MAX=64,
> MAX_COUNTERS=1200
> 2014-05-15 13:21:15,541 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskImpl: task_1400183444139_0007_1_00_000000
> Task Transitioned from SCHEDULED to RUNNING
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.history.HistoryEventHandler:
> [HISTORY][DAG:dag_1400183444139_0007_1][Event:TASK_ATTEMPT_FINISHED]:
> vertexName=datagen, taskAttemptId=attempt_1400183444139_0007_1_00_000000_0,
> startTime=1400185273335, finishTime=1400185275542, timeTaken=2207,
> status=KILLED, diagnostics=, counters=Counters: 0
> 2014-05-15 13:21:15,544 INFO [AsyncDispatcher event handler]
> org.apache.tez.dag.app.dag.impl.TaskAttemptImpl:
> attempt_1400183444139_0007_1_00_000000_0 TaskAttempt Transitioned from
> RUNNING to KILL_IN_PROGRESS due to event TA_KILL_REQUEST
> 2014-05-15 13:21:15,546 INFO [TaskSchedulerEventHandlerThread]
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event
> EventType: S_TA_ENDED
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)