Quanlong Huang created IMPALA-14283:
---------------------------------------
Summary: Coordinator could use stale metadata in catalogd warm
failover
Key: IMPALA-14283
URL: https://issues.apache.org/jira/browse/IMPALA-14283
Project: IMPALA
Issue Type: Bug
Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang
There is a corner case in local catalog mode that when catalogd warm failover
finishes, coordinator fetches the new table object from the new active catalogd
and it might coincidently "activate" the stale metadata in the cache. This
happens when matching all the following conditions:
* The new table object has a catalog version that has been used in the
previous active catalogd and it was assigned to the same table name.
* The coordinator has loaded table metadata associated to that version from
the previous active catalogd, e.g. partition list, partitions or column stats
are loaded and still in the cache (not aged out or evicted yet).
This bug is found by *TestCatalogdHA.test_warmed_up_metadata_failover_catchup*
{code:python}
tests/custom_cluster/test_catalogd_ha.py:640: in
test_warmed_up_metadata_failover_catchup
db, self._add_new_partition, self._verify_new_partition)
tests/custom_cluster/test_catalogd_ha.py:762: in _test_metadata_after_failover
assert str(active_catalogd.service.service_port) in str(e)
E assert '26001' in "assert 'year=2025/month=1' in
'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'"
E + where '26001' = str(26001)
E + where 26001 = <tests.common.impala_service.CatalogdService object at
0x7f5d8c24fd10>.service_port
E + where <tests.common.impala_service.CatalogdService object at
0x7f5d8c24fd10> = <CatalogdProcess PID: None
(/root/Impala-upstream/be/build/latest/service/cata...te_store_subscriber_port=23021
-webserver_
port=25021 -enable_catalogd_ha=true)>.service
E + and "assert 'year=2025/month=1' in
'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'" = str(AssertionError(u"assert
'year=2025/month=1' in 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'",)){code}
The test runs an AddPartition statement and kills the active catalogd, then
runs ShowPartitions in the new active catalogd. It fails since coordinator
returns an empty partition list. Test logs:
{noformat}
-- 2025-07-31 17:10:00,755 INFO MainThread: hs2: executing against Impala
at quanlong-ha-test.vpc.cloudera.com:21050. session:
ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None
alter table warmup_test_db.tbl add partition(year=2025, month=1);
-- 2025-07-31 17:10:01,119 INFO MainThread:
b84a8a27014a3340:af77289200000000: query started
-- 2025-07-31 17:10:01,120 INFO MainThread:
b84a8a27014a3340:af77289200000000: getting log for operation
-- 2025-07-31 17:10:01,120 INFO MainThread:
b84a8a27014a3340:af77289200000000: getting runtime profile operation
-- 2025-07-31 17:10:01,120 INFO MainThread:
b84a8a27014a3340:af77289200000000: closing query for operation
-- 2025-07-31 17:10:01,147 INFO MainThread: Found PID 102378 for
/root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5 -v=1
-max_log_files=10 -log_rotation_match_pid=true -log_filename=catalogd_n
ode1 -log_dir=/tmp/ -kudu_master_hosts localhost --catalog_topic_mode=minimal
--catalogd_ha_reset_metadata_on_failover=false
--debug_actions=catalogd_event_processing_delay:SLEEP@1000
--enable_reload_events=true --
warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
-catalog_service_port=26001 -state_store_subscriber_port=23021
-webserver_port=25021 -enable_catalogd_ha=true
-- 2025-07-31 17:10:01,172 INFO MainThread: Killing <CatalogdProcess PID:
102378 (/root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5
-v=1 -max_log_files=10 -log_rotation_match_pid=true -log_fi
lename=catalogd_node1 -log_dir=/tmp/ -kudu_master_hosts localhost
--catalog_topic_mode=minimal --catalogd_ha_reset_metadata_on_failover=false
--debug_actions=catalogd_event_processing_delay:SLEEP@1000 --enable_relo
ad_events=true
--warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
-catalog_service_port=26001 -state_store_subscriber_port=23021
-webserver_port=25021 -enable_catalogd_ha=
true)> with signal 9
-- 2025-07-31 17:10:01,177 INFO MainThread: Getting metric:
catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020
-- 2025-07-31 17:10:01,180 INFO MainThread: Waiting for metric value
'catalog-server.active-status'=True. Current value: False. total_wait: 0s
-- 2025-07-31 17:10:01,180 INFO MainThread: Sleeping 1s before next retry.
-- 2025-07-31 17:10:02,181 INFO MainThread: Getting metric:
catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020
-- 2025-07-31 17:10:02,184 INFO MainThread: Metric
'catalog-server.active-status' has reached desired value: True. total_wait:
1.00396800041s
-- 2025-07-31 17:10:02,188 INFO MainThread: hs2: executing against Impala
at quanlong-ha-test.vpc.cloudera.com:21050. session:
ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None
show partitions warmup_test_db.tbl;
-- 2025-07-31 17:10:03,737 INFO MainThread:
664e4d97f4712dfb:e4d0877500000000: query started
-- 2025-07-31 17:10:03,738 INFO MainThread:
664e4d97f4712dfb:e4d0877500000000: getting log for operation
-- 2025-07-31 17:10:03,738 INFO MainThread:
664e4d97f4712dfb:e4d0877500000000: getting runtime profile operation
-- 2025-07-31 17:10:03,738 INFO MainThread:
664e4d97f4712dfb:e4d0877500000000: closing query for operation
-- 2025-07-31 17:10:03,739 INFO MainThread: partition result:
['Total\t\t-1\t0\t0B\t0B\t\t\t\t\t']{noformat}
Profile of ShowPartitions indicates some stale metadata are used:
{noformat}
Original Table Versions: warmup_test_db.tbl, 314, 1754007002807, Thu Jul
31 17:10:02 PDT 2025
- CatalogFetch.ColumnStats.Hits: 13 <--- This shouldn't hit
- CatalogFetch.ColumnStats.Misses: 0
- CatalogFetch.ColumnStats.Requests: 13
- CatalogFetch.ColumnStats.Time: 0
- CatalogFetch.DatabaseList.Hits: 1
- CatalogFetch.DatabaseList.Misses: 0
- CatalogFetch.DatabaseList.Requests: 1
- CatalogFetch.DatabaseList.Time: 0
- CatalogFetch.PartitionLists.Hits: 1 <--- This shouldn't hit
- CatalogFetch.PartitionLists.Misses: 0
- CatalogFetch.PartitionLists.Requests: 1
- CatalogFetch.PartitionLists.Time: 0
- CatalogFetch.RPCs.Bytes: 1.78 KB (1818)
- CatalogFetch.RPCs.Requests: 2
- CatalogFetch.RPCs.Time: 1s531ms
- CatalogFetch.StorageLoad.Time: 22ms
- CatalogFetch.TableList.Hits: 1
- CatalogFetch.TableList.Misses: 1
- CatalogFetch.TableList.Requests: 2
- CatalogFetch.TableList.Time: 1s516ms
- CatalogFetch.Tables.Hits: 0
- CatalogFetch.Tables.Misses: 1
- CatalogFetch.Tables.Requests: 1
- CatalogFetch.Tables.Time: 15ms
- ExecutorGroupsConsidered: 1 (1){noformat}
In logs of the new active catalogd (catalogd.INFO), the catalog version of the
table is 314:
{noformat}
I20250731 17:10:02.760269 103771 CatalogOpExecutor.java:4872] Found 1/1
partitions to add in table warmup_test_db.tbl from event 888
I20250731 17:10:02.807034 103771 HdfsTable.java:781] Loaded file and block
metadata for warmup_test_db.tbl partitions: year=2025/month=1. Time taken:
27.856ms
I20250731 17:10:02.807621 103771 MetastoreEvents.java:860] EventId: 888
EventType: ADD_PARTITION Target: warmup_test_db.tbl. Successfully added 1
partitions to table warmup_test_db.tbl
I20250731 17:10:02.807871 103771 MetastoreEventsProcessor.java:1578] Time
elapsed in processing event batch: 1s156ms
I20250731 17:10:03.673661 103802 JniUtil.java:167] getEventProcessorSummary
request: Getting event processor summary
I20250731 17:10:03.674260 103802 JniUtil.java:178] Finished
getEventProcessorSummary request: Getting event processor summary. Time spent:
1ms
I20250731 17:10:03.674295 103802 catalog-server.cc:978] EventProcessor is
synced up with HMS event id during failover: 888
I20250731 17:10:03.674309 103802 catalog-server.cc:887] This catalogd instance
is changed to active status
I20250731 17:10:03.677537 103773 JniUtil.java:167] getCatalogDelta request:
Getting catalog delta from version 0
I20250731 17:10:03.687700 103773 catalog-server.cc:1615] Collected update:
2:DATABASE:default, version=2, original size=48, compressed size=49
...
I20250731 17:10:03.720110 103773 CatalogServiceCatalog.java:1965] Skipped 0
partitions of table warmup_test_db.tbl in the incremental update
I20250731 17:10:03.720285 103773 CatalogServiceCatalog.java:1978] Collected 1
partition update(s): 2:HDFS_PARTITION:warmup_test_db.tbl:year=2025/month=1,
version=314, original size=100, compressed size=101
I20250731 17:10:03.720324 103773 catalog-server.cc:1615] Collected update:
2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed
size=66{noformat}
In the previous catalogd logs (catalogd_node1.INFO), the table version before
AddPartition is also 314!
{noformat}
I20250731 17:09:58.553951 103647 HdfsTable.java:1256] Loading metadata for
table definition and all partition(s) of warmup_test_db.tbl (needed by
coordinator)
I20250731 17:09:58.600651 103647 HdfsTable.java:1919] Loaded 13 columns from
HMS. Actual columns: 13
I20250731 17:09:58.623989 103647 HdfsTable.java:2987] Load Valid Write Id List
Done. Time taken: 24.253us
I20250731 17:09:58.624092 103647 HdfsTable.java:1303] Fetching partition
metadata from the Metastore: warmup_test_db.tbl
I20250731 17:09:58.628580 103647 MetaStoreUtil.java:191] Fetching 0 partitions
for: warmup_test_db.tbl using partition batch size: 1000
I20250731 17:09:58.628685 103647 HdfsTable.java:1310] Fetched partition
metadata from the Metastore: warmup_test_db.tbl
I20250731 17:09:58.649619 103647 HdfsTable.java:781] Loaded file and block
metadata for warmup_test_db.tbl partitions: . Time taken: 3.319ms
I20250731 17:09:58.651077 103647 Table.java:1106] last refreshed event id for
table: warmup_test_db.tbl set to: -1
I20250731 17:09:58.651185 103647 TableLoader.java:187] Loaded metadata for:
warmup_test_db.tbl (129ms)
I20250731 17:09:58.685761 102954 JniUtil.java:167] getCatalogDelta request:
Getting catalog delta from version 313
I20250731 17:09:58.693717 102954 CatalogServiceCatalog.java:1965] Skipped 0
partitions of table warmup_test_db.tbl in the incremental update
I20250731 17:09:58.693792 102954 catalog-server.cc:1615] Collected update:
2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed
size=66{noformat}
So coordinator has loaded the empty partition list of it in cache. It's
associated with table name "warmup_test_db.tbl" and catalog version 314. Then
coincidently reused after catalogd warm failover.
We should add the catalog service id in the getPartialCatalogObject response
and coordinator should clear the cache when the id changes, just like what we
do in updateCatalogCache() ->
[witnessCatalogServiceId()|https://github.com/apache/impala/blob/5414c3009890a736901b6ae3b56b9953b1992de7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L1614].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)