[ 
https://issues.apache.org/jira/browse/IMPALA-6662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-6662 stopped by Tim Armstrong.
---------------------------------------------
> Make stress test resilient to hangs due to client crashes
> ---------------------------------------------------------
>
>                 Key: IMPALA-6662
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6662
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>            Reporter: Sailesh Mukil
>            Assignee: Sailesh Mukil
>            Priority: Critical
>
> The concurrent_select.py process starts multiple sub processes (called query 
> runners), to run the queries. It also starts 2 threads called the query 
> producer thread and the query consumer thread. The query producer thread adds 
> queries to a query queue and the query consumer thread pulls off the queue 
> and feeds the queries to the query runners.
> The query runner, once it gets queries, does the following:
> {code:java}
> (pseudo code. Real code here: 
> https://github.com/apache/impala/blob/d49f629c447ea59ad73ceeb0547fde4d41c651d1/tests/stress/concurrent_select.py#L583-L595)
> with _submit_query_lock:
>     increment(num_queries_started)
> run_query()    # One runner crashes here.
> increment(num_queries_finished)
> {code}
> One of the runners crash inside run_query(), thereby never incrementing 
> num_queries_finished.
> Another thread that's supposed to check for memory leaks (but actually 
> doesn't), periodically acquires '_submit_query_lock' and waits for the number 
> of running queries to reach 0 before releasing the lock:
> https://github.com/apache/impala/blob/d49f629c447ea59ad73ceeb0547fde4d41c651d1/tests/stress/concurrent_select.py#L449-L511
> However, in the above case, the number of running queries will never reach 0 
> because one of the query runners hasn't incremented 'num_queries_finished' 
> and exited. Therefore, the poll_mem_usage() function will hold the lock 
> indefinitely, causing no new queries to be submitted, nor the stress test to 
> complete running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to