[ 
https://issues.apache.org/jira/browse/TEZ-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated TEZ-1962:
--------------------------------
    Attachment: TEZ-1962.1.txt

Patch to fix this.

The main reason here is a NPE in a log line in case of an Interrupt. The 
exception causes TezChild.run to fall off without shutting down the executor 
and TaskReporter threads.

The patch fixes the NPE, adds some checks to ensure shutdown is called, and 
changes LocalContainerLauncher to invoke a TezChild shutdown in case of an 
error from TezChild.

I'm going to open a couple of follow up jiras to change the way tasks are 
cancelled.

Tested locally, and there's no hung threads after this.

[~hitesh] - please review.


> Running out of threads in tez local mode
> ----------------------------------------
>
>                 Key: TEZ-1962
>                 URL: https://issues.apache.org/jira/browse/TEZ-1962
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Gunther Hagleitner
>            Assignee: Siddharth Seth
>            Priority: Critical
>         Attachments: TEZ-1962.1.txt, stack5.txt
>
>
> I've been trying to port the hive ut to tez local mode. However, local mode 
> seems to leak threads which causes tests to crash after a while (oom). See 
> attached stack trace - there are a lot of "TezChild" threads still hanging 
> around.
> ([~sseth] as discussed offline)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to