[ 
https://issues.apache.org/jira/browse/HAWQ-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Guo updated HAWQ-1334:
---------------------------
    Description: 
In QD thread function dispmgt_thread_func_run(), if there are failures either 
due to QE or QD itself, it will cancel the query and then clean up. The main 
process for the query needs the error code of meleeResults be set so that it 
soon proceeds to cancel the query, else we have to wait for timeout. Typically 
dispmgt_thread_func_run() should set the error code, however I found there are 
some cases who do not handle this, e.g. if poll() fails with ENOMEM. One 
symptom of this issue is that we could sometimes see hang if a query is 
canceled for some reasons.

The potential solution is that:

1) We expect each branch jump ("goto error_cleanup") set proper error code 
itself. It is not an easy job.
2) We add a "guard" function in the error_cleanup code to set an error code if 
it is not set, i.e. 1) is not well done.

I'd this JIRA cares about 2).

In general, the cleanup code in QD seems to be really obscure and not elegant. 
Maybe we should file another JIRA to refactor the error handling logic in it. 

  was:
In QD thread dispmgt_thread_func_run(), if there are failures either due to QE 
or QD itself, it will cancel the query and then clean up. The main process for 
the query need to have the error code of meleeResults be set so that it soon 
proceed to cancel the query, else we have to wait for timeout. Typically 
dispmgt_thread_func_run() should set the error code, however I found there are 
some cases who do not handle this, e.g. if poll() fails with ENOMEM. One 
symptom of this issue is that we could sometimes see hang if a query is 
canceled for some reasons.

The potential solution is that:

1) We expect each branch jump ("goto error_cleanup") should set proper error 
code it self.
2) We add a "guard" function in the error_cleanup code to set an error code if 
it is not set.

In general, the cleanup code in QD seems to be really obscure and not elegant. 
Maybe we should file another JIRA to refactor the error handling logic in it. 


> QD thread should set error code if failing so that the main process for the 
> query could exit soon
> -------------------------------------------------------------------------------------------------
>
>                 Key: HAWQ-1334
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1334
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Dispatcher
>            Reporter: Paul Guo
>            Assignee: Ed Espino
>
> In QD thread function dispmgt_thread_func_run(), if there are failures either 
> due to QE or QD itself, it will cancel the query and then clean up. The main 
> process for the query needs the error code of meleeResults be set so that it 
> soon proceeds to cancel the query, else we have to wait for timeout. 
> Typically dispmgt_thread_func_run() should set the error code, however I 
> found there are some cases who do not handle this, e.g. if poll() fails with 
> ENOMEM. One symptom of this issue is that we could sometimes see hang if a 
> query is canceled for some reasons.
> The potential solution is that:
> 1) We expect each branch jump ("goto error_cleanup") set proper error code 
> itself. It is not an easy job.
> 2) We add a "guard" function in the error_cleanup code to set an error code 
> if it is not set, i.e. 1) is not well done.
> I'd this JIRA cares about 2).
> In general, the cleanup code in QD seems to be really obscure and not 
> elegant. Maybe we should file another JIRA to refactor the error handling 
> logic in it. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to