[ 
https://issues.apache.org/jira/browse/TEZ-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17582702#comment-17582702
 ] 

zhangbutao commented on TEZ-4445:
---------------------------------

[~zhengchenyu] sure, please see attachment *Tez-task-stuck-full-jstack.txt* 

> Tez task can get stuck when waiting for all initializers on 
> LogicalIOProcessorRuntimeTask:initialize
> ----------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-4445
>                 URL: https://issues.apache.org/jira/browse/TEZ-4445
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.10.2
>            Reporter: zhangbutao
>            Assignee: zhangbutao
>            Priority: Major
>         Attachments: 
> Tez-task-stuck-LogicalIOProcessorRuntimeTask-initialize.jpg, 
> Tez-task-stuck-full-jstack.txt, Tez-task-stuck-log.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Cluster environment: Haoop 3.1.0, Hive 3.1.0, Tez 0.9.2
> In a busy cluster, i find some tez tasks can get stuck on 
> LogicalIOProcessorRuntimeTask:initialize and wait for all initializers to be 
> finished. This bad tez task can cause entire tez job to run forever. If i 
> kill the tez job and resubmit it, the job often can run successfully. Please 
> see more infomation from task jstack attachement 
> _*Tez-task-stuck-LogicalIOProcessorRuntimeTask-initialize.jpg*_  and task log 
> _*Tez-task-stuck-log.png*_
> I have not find root cause which leaded to the task getting stuck, but i 
> think it is a good way to add a timeout when waiting for initializers. In 
> this way, the stuck task can be interupped  beyond a certain time, and the 
> attempt task can be launched immediately.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to