[ 
https://issues.apache.org/jira/browse/IMPALA-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853196#comment-17853196
 ] 

ASF subversion and git services commented on IMPALA-12616:
----------------------------------------------------------

Commit 1935f9e1a199c958c5fb12ad53277fa720d6ae5c in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=1935f9e1a ]

IMPALA-12616: Fix test_restart_services.py::TestRestart tests for S3

The test_restart_catalogd_while_handling_rpc_response* tests
from custom_cluster/test_restart_services.py have been failing
consistently on s3. The alter table statement is expected to
succeed, but instead it fails with:
"CatalogException: Detected catalog service ID changes"
This manifests as a timeout waiting for the statement to reach
the finished state.

The test relies on specific timing with a sleep injected via a
debug action. The failure stems from the catalog being slower
on s3. The alter table wakes up before the catalog service ID
change has fully completed, and it fails when it sees the
catalog service ID change.

This increases two sleep times:
1. This increases the sleep time before restarting the catalogd
   from 0.5 seconds to 5 seconds. This gives the catalogd longer
   to receive the message about the alter table and respond back
   to the impalad.
2. This increases the WAIT_BEFORE_PROCESSING_CATALOG_UPDATE
   sleep from 10 seconds to 30 seconds so the alter table
   statement doesn't wake up until the catalog service ID change
   is finalized.
The test is verifying that the right messages are in the impalad
logs, so we know this is still testing the same condition.

This modifies the tests to use wait_for_finished_timeout()
rather than wait_for_state(). This bails out immediately if the
query fails rather than waiting unnecessarily for the full timeout.
This also clears the query options so that later statements
don't inherit the debug_action that the alter table statement
used.

Testing:
 - Ran the tests 100x in a loop on s3
 - Ran the tests 100x in a loop on HDFS

Change-Id: Ieb5699b8fb0b2ad8bad4ac30922a7b4d7fa17d29
Reviewed-on: http://gerrit.cloudera.org:8080/21485
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Reviewed-by: Daniel Becker <daniel.bec...@cloudera.com>


> test_restart_catalogd_while_handling_rpc_response* tests fail not reaching 
> expected states
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-12616
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12616
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 1.4.2
>            Reporter: Andrew Sherman
>            Assignee: Daniel Becker
>            Priority: Critical
>
> There are failures in both 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_timeout
>  and 
> custom_cluster.test_restart_services.TestRestart.test_restart_catalogd_while_handling_rpc_response_with_max_iters,
>  both look the same:
> {code:java}
> custom_cluster/test_restart_services.py:232: in 
> test_restart_catalogd_while_handling_rpc_response_with_timeout
>     self.wait_for_state(handle, self.client.QUERY_STATES["FINISHED"], 
> max_wait_time)
> common/impala_test_suite.py:1181: in wait_for_state
>     self.wait_for_any_state(handle, [expected_state], timeout, client)
> common/impala_test_suite.py:1199: in wait_for_any_state
>     raise Timeout(timeout_msg)
> E   Timeout: query '6a4e0bad9b511ccf:bf93de6800000000' did not reach one of 
> the expected states [4], last known state 5
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to