Quanlong Huang created IMPALA-14283:
---------------------------------------

             Summary: Coordinator could use stale metadata in catalogd warm 
failover
                 Key: IMPALA-14283
                 URL: https://issues.apache.org/jira/browse/IMPALA-14283
             Project: IMPALA
          Issue Type: Bug
          Components: Catalog
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


There is a corner case in local catalog mode that when catalogd warm failover 
finishes, coordinator fetches the new table object from the new active catalogd 
and it might coincidently "activate" the stale metadata in the cache. This 
happens when matching all the following conditions:
 * The new table object has a catalog version that has been used in the 
previous active catalogd and it was assigned to the same table name.
 * The coordinator has loaded table metadata associated to that version from 
the previous active catalogd, e.g. partition list, partitions or column stats 
are loaded and still in the cache (not aged out or evicted yet).

This bug is found by *TestCatalogdHA.test_warmed_up_metadata_failover_catchup*
{code:python}
tests/custom_cluster/test_catalogd_ha.py:640: in 
test_warmed_up_metadata_failover_catchup                                        
                                                                                
     
    db, self._add_new_partition, self._verify_new_partition)                    
                                                                                
                                                      
tests/custom_cluster/test_catalogd_ha.py:762: in _test_metadata_after_failover  
                                                                                
                                                      
    assert str(active_catalogd.service.service_port) in str(e)                  
                                                                                
                                                      
E   assert '26001' in "assert 'year=2025/month=1' in 
'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'"                                   
                                                                                
 
E    +  where '26001' = str(26001)                                              
                                                                                
                                                      
E    +    where 26001 = <tests.common.impala_service.CatalogdService object at 
0x7f5d8c24fd10>.service_port                                                    
                                                       
E    +      where <tests.common.impala_service.CatalogdService object at 
0x7f5d8c24fd10> = <CatalogdProcess PID: None 
(/root/Impala-upstream/be/build/latest/service/cata...te_store_subscriber_port=23021
 -webserver_
port=25021 -enable_catalogd_ha=true)>.service                                   
                                                                                
                                                      
E    +  and   "assert 'year=2025/month=1' in 
'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'" = str(AssertionError(u"assert 
'year=2025/month=1' in 'Total\\t\\t-1\\t0\\t0B\\t0B\\t\\t\\t\\t\\t'",)){code}
The test runs an AddPartition statement and kills the active catalogd, then 
runs ShowPartitions in the new active catalogd. It fails since coordinator 
returns an empty partition list. Test logs:
{noformat}
-- 2025-07-31 17:10:00,755 INFO     MainThread: hs2: executing against Impala 
at quanlong-ha-test.vpc.cloudera.com:21050. session: 
ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None                  
   
                                                                                
                                                                                
                                                      
alter table warmup_test_db.tbl add partition(year=2025, month=1);               
                                                                                
                                                      
                                                                                
                                                                                
                                                      
-- 2025-07-31 17:10:01,119 INFO     MainThread: 
b84a8a27014a3340:af77289200000000: query started                                
                                                                                
      
-- 2025-07-31 17:10:01,120 INFO     MainThread: 
b84a8a27014a3340:af77289200000000: getting log for operation                    
                                                                                
      
-- 2025-07-31 17:10:01,120 INFO     MainThread: 
b84a8a27014a3340:af77289200000000: getting runtime profile operation            
                                                                                
      
-- 2025-07-31 17:10:01,120 INFO     MainThread: 
b84a8a27014a3340:af77289200000000: closing query for operation                  
                                                                                
      
-- 2025-07-31 17:10:01,147 INFO     MainThread: Found PID 102378 for 
/root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5 -v=1 
-max_log_files=10 -log_rotation_match_pid=true -log_filename=catalogd_n
ode1 -log_dir=/tmp/ -kudu_master_hosts localhost --catalog_topic_mode=minimal 
--catalogd_ha_reset_metadata_on_failover=false 
--debug_actions=catalogd_event_processing_delay:SLEEP@1000 
--enable_reload_events=true --
warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
 -catalog_service_port=26001 -state_store_subscriber_port=23021 
-webserver_port=25021 -enable_catalogd_ha=true             
-- 2025-07-31 17:10:01,172 INFO     MainThread: Killing <CatalogdProcess PID: 
102378 (/root/Impala-upstream/be/build/latest/service/catalogd -logbufsecs=5 
-v=1 -max_log_files=10 -log_rotation_match_pid=true -log_fi
lename=catalogd_node1 -log_dir=/tmp/ -kudu_master_hosts localhost 
--catalog_topic_mode=minimal --catalogd_ha_reset_metadata_on_failover=false 
--debug_actions=catalogd_event_processing_delay:SLEEP@1000 --enable_relo
ad_events=true 
--warmup_tables_config_file=file:///root/Impala-upstream/testdata/data/warmup_test_config.txt
 -catalog_service_port=26001 -state_store_subscriber_port=23021 
-webserver_port=25021 -enable_catalogd_ha=
true)> with signal 9                                                            
                                                                                
                                                      
-- 2025-07-31 17:10:01,177 INFO     MainThread: Getting metric: 
catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020       
           
-- 2025-07-31 17:10:01,180 INFO     MainThread: Waiting for metric value 
'catalog-server.active-status'=True. Current value: False. total_wait: 0s
-- 2025-07-31 17:10:01,180 INFO     MainThread: Sleeping 1s before next retry.  
                                                                                
                                                      
-- 2025-07-31 17:10:02,181 INFO     MainThread: Getting metric: 
catalog-server.active-status from quanlong-ha-test.vpc.cloudera.com:25020       
                                                                      
-- 2025-07-31 17:10:02,184 INFO     MainThread: Metric 
'catalog-server.active-status' has reached desired value: True. total_wait: 
1.00396800041s
-- 2025-07-31 17:10:02,188 INFO     MainThread: hs2: executing against Impala 
at quanlong-ha-test.vpc.cloudera.com:21050. session: 
ea4e2fca5d5b35f3:c51bd83bdb2417a9 main_cursor: True user: None
                                                                                
                                                                                
                                                      
show partitions warmup_test_db.tbl;
                                                                                
                                                                                
                                                      
-- 2025-07-31 17:10:03,737 INFO     MainThread: 
664e4d97f4712dfb:e4d0877500000000: query started           
-- 2025-07-31 17:10:03,738 INFO     MainThread: 
664e4d97f4712dfb:e4d0877500000000: getting log for operation
-- 2025-07-31 17:10:03,738 INFO     MainThread: 
664e4d97f4712dfb:e4d0877500000000: getting runtime profile operation   
-- 2025-07-31 17:10:03,738 INFO     MainThread: 
664e4d97f4712dfb:e4d0877500000000: closing query for operation      
-- 2025-07-31 17:10:03,739 INFO     MainThread: partition result: 
['Total\t\t-1\t0\t0B\t0B\t\t\t\t\t']{noformat}
Profile of ShowPartitions indicates some stale metadata are used:
{noformat}
      Original Table Versions: warmup_test_db.tbl, 314, 1754007002807, Thu Jul 
31 17:10:02 PDT 2025
       - CatalogFetch.ColumnStats.Hits: 13  <--- This shouldn't hit
       - CatalogFetch.ColumnStats.Misses: 0
       - CatalogFetch.ColumnStats.Requests: 13
       - CatalogFetch.ColumnStats.Time: 0
       - CatalogFetch.DatabaseList.Hits: 1
       - CatalogFetch.DatabaseList.Misses: 0
       - CatalogFetch.DatabaseList.Requests: 1
       - CatalogFetch.DatabaseList.Time: 0
       - CatalogFetch.PartitionLists.Hits: 1  <--- This shouldn't hit
       - CatalogFetch.PartitionLists.Misses: 0
       - CatalogFetch.PartitionLists.Requests: 1
       - CatalogFetch.PartitionLists.Time: 0
       - CatalogFetch.RPCs.Bytes: 1.78 KB (1818)
       - CatalogFetch.RPCs.Requests: 2
       - CatalogFetch.RPCs.Time: 1s531ms
       - CatalogFetch.StorageLoad.Time: 22ms
       - CatalogFetch.TableList.Hits: 1
       - CatalogFetch.TableList.Misses: 1
       - CatalogFetch.TableList.Requests: 2
       - CatalogFetch.TableList.Time: 1s516ms
       - CatalogFetch.Tables.Hits: 0
       - CatalogFetch.Tables.Misses: 1
       - CatalogFetch.Tables.Requests: 1
       - CatalogFetch.Tables.Time: 15ms
       - ExecutorGroupsConsidered: 1 (1){noformat}
In logs of the new active catalogd (catalogd.INFO), the catalog version of the 
table is 314:
{noformat}
I20250731 17:10:02.760269 103771 CatalogOpExecutor.java:4872] Found 1/1 
partitions to add in table warmup_test_db.tbl from event 888
I20250731 17:10:02.807034 103771 HdfsTable.java:781] Loaded file and block 
metadata for warmup_test_db.tbl partitions: year=2025/month=1. Time taken: 
27.856ms
I20250731 17:10:02.807621 103771 MetastoreEvents.java:860] EventId: 888 
EventType: ADD_PARTITION Target: warmup_test_db.tbl. Successfully added 1 
partitions to table warmup_test_db.tbl
I20250731 17:10:02.807871 103771 MetastoreEventsProcessor.java:1578] Time 
elapsed in processing event batch: 1s156ms
I20250731 17:10:03.673661 103802 JniUtil.java:167] getEventProcessorSummary 
request: Getting event processor summary
I20250731 17:10:03.674260 103802 JniUtil.java:178] Finished 
getEventProcessorSummary request: Getting event processor summary. Time spent: 
1ms
I20250731 17:10:03.674295 103802 catalog-server.cc:978] EventProcessor is 
synced up with HMS event id during failover: 888
I20250731 17:10:03.674309 103802 catalog-server.cc:887] This catalogd instance 
is changed to active status
I20250731 17:10:03.677537 103773 JniUtil.java:167] getCatalogDelta request: 
Getting catalog delta from version 0
I20250731 17:10:03.687700 103773 catalog-server.cc:1615] Collected update: 
2:DATABASE:default, version=2, original size=48, compressed size=49
...
I20250731 17:10:03.720110 103773 CatalogServiceCatalog.java:1965] Skipped 0 
partitions of table warmup_test_db.tbl in the incremental update
I20250731 17:10:03.720285 103773 CatalogServiceCatalog.java:1978] Collected 1 
partition update(s): 2:HDFS_PARTITION:warmup_test_db.tbl:year=2025/month=1, 
version=314, original size=100, compressed size=101
I20250731 17:10:03.720324 103773 catalog-server.cc:1615] Collected update: 
2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed 
size=66{noformat}
In the previous catalogd logs (catalogd_node1.INFO), the table version before 
AddPartition is also 314!
{noformat}
I20250731 17:09:58.553951 103647 HdfsTable.java:1256] Loading metadata for 
table definition and all partition(s) of warmup_test_db.tbl (needed by 
coordinator)
I20250731 17:09:58.600651 103647 HdfsTable.java:1919] Loaded 13 columns from 
HMS. Actual columns: 13
I20250731 17:09:58.623989 103647 HdfsTable.java:2987] Load Valid Write Id List 
Done. Time taken: 24.253us
I20250731 17:09:58.624092 103647 HdfsTable.java:1303] Fetching partition 
metadata from the Metastore: warmup_test_db.tbl
I20250731 17:09:58.628580 103647 MetaStoreUtil.java:191] Fetching 0 partitions 
for: warmup_test_db.tbl using partition batch size: 1000
I20250731 17:09:58.628685 103647 HdfsTable.java:1310] Fetched partition 
metadata from the Metastore: warmup_test_db.tbl
I20250731 17:09:58.649619 103647 HdfsTable.java:781] Loaded file and block 
metadata for warmup_test_db.tbl partitions: . Time taken: 3.319ms
I20250731 17:09:58.651077 103647 Table.java:1106] last refreshed event id for 
table: warmup_test_db.tbl set to: -1
I20250731 17:09:58.651185 103647 TableLoader.java:187] Loaded metadata for: 
warmup_test_db.tbl (129ms)
I20250731 17:09:58.685761 102954 JniUtil.java:167] getCatalogDelta request: 
Getting catalog delta from version 313
I20250731 17:09:58.693717 102954 CatalogServiceCatalog.java:1965] Skipped 0 
partitions of table warmup_test_db.tbl in the incremental update
I20250731 17:09:58.693792 102954 catalog-server.cc:1615] Collected update: 
2:TABLE:warmup_test_db.tbl, version=314, original size=65, compressed 
size=66{noformat}
So coordinator has loaded the empty partition list of it in cache. It's 
associated with table name "warmup_test_db.tbl" and catalog version 314. Then 
coincidently reused after catalogd warm failover.

We should add the catalog service id in the getPartialCatalogObject response 
and coordinator should clear the cache when the id changes, just like what we 
do in updateCatalogCache() -> 
[witnessCatalogServiceId()|https://github.com/apache/impala/blob/5414c3009890a736901b6ae3b56b9953b1992de7/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L1614].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to