[ 
https://issues.apache.org/jira/browse/DRILL-4595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz closed DRILL-4595.
---------------------------------
    Resolution: Fixed

> FragmentExecutor.fail() should interrupt the fragment thread to avoid 
> possible query hangs
> ------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4595
>                 URL: https://issues.apache.org/jira/browse/DRILL-4595
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.4.0
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>             Fix For: Future
>
>
> When a fragment fails it's assumed it will be able to close itself and send 
> it's FAILED state to the foreman which will cancel any running fragments. 
> FragmentExecutor.cancel() will interrupt the thread making sure those 
> fragment don't stay blocked.
> However, if a fragment is already blocked when it's fail method is called the 
> foreman may never be notified about this and the query will hang forever. One 
> such scenario is the following:
> - generally it's a CTAS running on a large cluster (lot's of writers running 
> in parallel)
> - logs show that the user channel was closed and UserServer caused the root 
> fragment to move to a FAILED state
> - jstack shows that the root fragment is blocked in it's receiver waiting for 
> data
> - jstack also shows that ALL other fragments are no longer running, and the 
> logs show that all of them succeeded
> - the foreman waits *forever* for the root fragment to finish



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to