[ 
https://issues.apache.org/jira/browse/IMPALA-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16707533#comment-16707533
 ] 

Lars Volker commented on IMPALA-6591:
-------------------------------------

I've seen this again. Here's the code loop with the failed assertion:

 
{code}
# In practice, sending SIGINT to the shell process doesn't always seem to get 
caught
# (and a search shows up some bugs in Python where SIGINT might be ignored). So 
retry
# for 30s until one signal takes.
while impalad.get_num_in_flight_queries() == 1:
  time.sleep(1)
  LOG.info("Sending signal...")
  os.kill(p.pid(), signal.SIGINT)
  num_tries += 1
  assert num_tries < 30, "SIGINT was not caught by shell within 30s"
{code}

{{p}} is an {{ImpalaShell}} object. There seems to be a possibility that the 
shell process has been terminated but the query is still registered. I think we 
should at least improve the code to log if the shell process is still alive to 
tell the two cases apart and better understand where to look next.

[~csringhofer] - I picked you randomly; feel free to find another person or 
assign back to me if you're swamped.

> TestClientSsl hung for a long time
> ----------------------------------
>
>                 Key: IMPALA-6591
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6591
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Distributed Exec
>    Affects Versions: Impala 2.12.0, Impala 3.1.0, Impala 3.2.0
>            Reporter: Tim Armstrong
>            Assignee: Sailesh Mukil
>            Priority: Critical
>              Labels: broken-build, flaky, hang
>
> {noformat}
> 18:49:13 
> custom_cluster/test_catalog_wait.py::TestCatalogWait::test_delayed_catalog 
> PASSED
> 18:49:53 
> custom_cluster/test_client_ssl.py::TestClientSsl::test_ssl[exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] Build timed 
> out (after 1,440 minutes). Marking the build as failed.
> 12:20:15 Build was aborted
> 12:20:15 Archiving artifacts
> {noformat}
> I unfortunately wasn't able to get any logs...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to