Ryan19929 opened a new issue, #61063: URL: https://github.com/apache/doris/issues/61063
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version Doris 3.0.8 ### What's Wrong? ## Problem CCR fails to ingest binlog with **get binlog info failed** when fetching from a tablet that was recently cloned to a new BE. ``` The requested URL returned error: get binlog info failed, binlog_version=2 url=http://<BE_IP>:8040/api/_binlog/_download?method=get_binlog_info&tablet_id=x&binlog_version=2 ``` ## Current situation - Physical binlog files (`.dat`, `.idx`) exist in `_binlog/` directory - `get_binlog_info` returns empty `rowset_id` and `num_segments:-1`, indicating **binlog metadata is missing in RocksDB** - One day later, `_binlog/` files still remain on this BE while other BEs have already expired and cleaned up ## Root Cause Analysis I suspect a **race condition** during the clone process: the GC thread deletes binlog metadata before the tablet registration completes. Below is the evidence. ### Log Evidence ```text I20260303 14:35:33.846063 1312384 engine_clone_task.cpp:264] clone tablet not exist, begin clone a new tablet from remote be. signature=1771885698583, tablet_id=1771885698583, visible_version=2, req replica=1771885705351 I20260303 14:35:33.928215 1312384 tablet_manager.cpp:960] begin to load tablet from dir. tablet_id=1771885698583 schema_hash=1428677769 path = /mnt/data11/dorisdb_pub/be/storage/data/69/1771885698583/1428677769 force = 0 restore = 0 I20260303 14:35:33.929246 1312155 storage_engine.cpp:992] failed to find tablet 1771885698583 for binlog rowset: 0, tablet may be dropped I20260303 14:35:33.955437 1312155 storage_engine.cpp:1010] remove 1 invalid binlog meta from dir: /mnt/data11/dorisdb_pub/be/storage ``` **Key observation**: Different thread IDs — the GC thread (1312155) deletes the metadata just **1ms** after the clone thread (1312384) starts loading the tablet. ### Timeline | Time | Event | |------|-------| | T1 | TRUNCATE TABLE creates new tablet on BE_X/Y/Z | | T2 | INSERT publishes txn with binlog (files + RocksDB meta) | | T3 | Load balancer clones tablet to BE_A | | T4-T5 | BE_A downloads snapshot and ingests binlog metas | | T6 | **GC thread deletes binlog meta** (`get_tablet()` returns nullptr) | | T7 | Tablet registered to `tablet_map` | | T8 | **CCR requests binlog → 404** | ### Code Analysis: The Race Condition The race occurs in `TabletManager::load_tablet_from_dir`: ```cpp // tablet_manager.cpp: load_tablet_from_dir TabletUid tablet_uid = TabletUid::gen_uid(); // Line 994 // ... read rowset_binlog_metas.pb, rename binlog files ... if (contain_binlog) { auto* meta = store->get_meta(); // ========================================== // Lines 1047-1048: binlog meta written to RocksDB // At this point, tablet is NOT yet in tablet_map // ========================================== RETURN_IF_ERROR( RowsetMetaManager::ingest_binlog_metas(meta, tablet_uid, &rowset_binlog_metas_pb)); } tablet_meta->set_shard_id(shard); tablet_meta->set_tablet_uid(std::move(tablet_uid)); std::string meta_binary; tablet_meta->serialize(&meta_binary); // ========================================== // Lines 1059-1062: tablet added to tablet_map // Only here is the tablet registered to tablet_manager // ========================================== RETURN_NOT_OK_STATUS_WITH_WARN( load_tablet_from_meta(store, tablet_id, schema_hash, meta_binary, true, force, restore, true), ...); ``` **The gap**: `ingest_binlog_metas` (line 1048) executes **before** `load_tablet_from_meta` (line 1060), leaving a window where GC can delete the metadata while the tablet remains unregistered. ### GC Logic: Why It Deletes Valid Metadata `StorageEngine::_clean_unused_binlog_metas` (`be/src/olap/storage_engine.cpp`): ```cpp // storage_engine.cpp: _clean_unused_binlog_metas void StorageEngine::_clean_unused_binlog_metas() { auto unused_binlog_collector = [this, &unused_binlog_key_suffixes]( std::string_view key, std::string_view value, bool need_check) -> bool { if (need_check) { BinlogMetaEntryPB binlog_meta_pb; if (UNLIKELY(!binlog_meta_pb.ParseFromArray(...))) { // Parse failed, mark for deletion } else if (_tablet_manager->get_tablet( binlog_meta_pb.tablet_id()) == nullptr) { // ========================================== // Sole condition: tablet_id not in tablet_map // No time window protection whatsoever // ========================================== LOG(INFO) << "failed to find tablet " << binlog_meta_pb.tablet_id() << " for binlog rowset: " << binlog_meta_pb.rowset_id() << ", tablet may be dropped"; } else { return false; // tablet exists, do not clean } } unused_binlog_key_suffixes.emplace_back(...); return true; // mark for deletion }; // ... execute deletion } ``` The GC's **only check is `get_tablet(tablet_id) == nullptr`**, with no delay protection or grace period. This makes the race window between `ingest_binlog_metas` and tablet registration fatal. ### What You Expected? Binlog metadata in RocksDB should remain consistent with physical files on disk during the entire clone lifecycle ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [x] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
