[jira] [Commented] (IMPALA-14227) In HA failover, passive catalogd should apply pending HMS events before being active

ASF subversion and git services (Jira) Wed, 30 Jul 2025 12:13:05 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18011015#comment-18011015
 ]


ASF subversion and git services commented on IMPALA-14227:
----------------------------------------------------------

Commit 5bdd9c7f392af0273f73ba9be4c4f549dc54af8a in impala's branch 
refs/heads/master from stiga-huang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=5bdd9c7f3 ]

IMPALA-14227: (Addendum) Add more tests for catalogd HA warm failover

This adds more tests in test_catalogd_ha.py for warm failover.
Refactored _test_metadata_after_failover to run in the following way:
 - Run DDL/DML in the active catalogd.
 - Kill the active catalogd and wait until the failover finishes.
 - Verify the DDL/DML results in the new active catalogd.
 - Restart the killed catalogd
It accepts two methods in parameters to perform the DDL/DML and the
verifier. In the last step, the killed catalogd is started so we keep
having 2 catalogd and can merge these into a single test by invoking
_test_metadata_after_failover for different method pairs. This saves
some test time.

The following DDL/DML statements are tested:
 - CreateTable
 - AddPartition
 - REFRESH
 - DropPartition
 - INSERT
 - DropTable
After each failover, the table is verified to be warmed up (i.e. loaded).

Also validate flags in startup to make sure enable_insert_events and
enable_reload_events are both set to true when warm failover is enabled,
i.e. --catalogd_ha_reset_metadata_on_failover=false.

Change-Id: I6b20adeb0bd175592b425e521138c41196347600
Reviewed-on: http://gerrit.cloudera.org:8080/23206
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Wenzhe Zhou <[email protected]>


> In HA failover, passive catalogd should apply pending HMS events before being 
> active
> ------------------------------------------------------------------------------------
>
>                 Key: IMPALA-14227
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14227
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Blocker
>             Fix For: Impala 5.0.0
>
>
> After IMPALA-14074, the passive catalogd can have a warmed up metadata cache 
> during failover (with catalogd_ha_reset_metadata_on_failover=false). However, 
> it could still have pending HMS events that are not applied and so is using a 
> stale metadata cache.
> For instance, the active catalogd creates a table and then crash. The passive 
> catalogd should apply the CREATE_TABLE events before being active. Otherwise, 
> Impala queries might see stale metadata in a while (until the new catalogd 
> catch up with HMS events generated by the previous active catalogd).
> There is a test failure caused by this:
> {code:python}
> custom_cluster/test_catalogd_ha.py:540: in 
> test_warmed_up_metadata_after_failover
>     latest_catalogd = self._test_metadata_after_failover(unique_database, 
> True)
> custom_cluster/test_catalogd_ha.py:584: in _test_metadata_after_failover
>     self.execute_query_expect_success(self.client, "describe %s.tbl" % 
> unique_database)
> common/impala_test_suite.py:1121: in wrapper
>     return function(*args, **kwargs)
> common/impala_test_suite.py:1131: in execute_query_expect_success
>     result = cls.__execute_query(impalad_client, query, query_options, user)
> common/impala_test_suite.py:1294: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:687: in execute
>     cursor.execute(sql_stmt, configuration=self.__query_options)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:392:
>  in execute
>     configuration=configuration)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:443:
>  in execute_async
>     self._execute_async(op)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:462:
>  in _execute_async
>     operation_fn()
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:440:
>  in op
>     run_async=True)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:1324:
>  in execute
>     return self._operation('ExecuteStatement', req, False)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:1244:
>  in _operation
>     resp = self._rpc(kind, request, safe_to_retry)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:1181:
>  in _rpc
>     err_if_rpc_not_ok(response)
> ../infra/python/env-gcc10.4.0/lib/python2.7/site-packages/impala/hiveserver2.py:867:
>  in err_if_rpc_not_ok
>     raise HiveServer2Error(resp.status.errorMessage)
> E   HiveServer2Error: Query eb405217bbb418ee:a1033c0000000000 failed:
> E   AnalysisException: Could not resolve path: 
> 'test_warmed_up_metadata_after_failover_452d93b4.tbl'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-14227) In HA failover, passive catalogd should apply pending HMS events before being active

Reply via email to