This is an automated email from the ASF dual-hosted git repository.
yiguolei pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 7feb5ea6ad7 [fix](chore) path gc should consider tablet migration
(#30095)
7feb5ea6ad7 is described below
commit 7feb5ea6ad7a0623d85948eb86529a978f2d1a03
Author: zhannngchen <[email protected]>
AuthorDate: Tue Jan 23 19:42:33 2024 +0800
[fix](chore) path gc should consider tablet migration (#30095)
Background:
Migration will create new tablet in different DataDir, the old tablet will
be moved to TabletManager::_shutdown_tablets.
The migration task won't copy data in stale rowsets to new tablet, so after
migration, the new tablet don't contains stale rowsets of old tablet
The path GC process will check every path, to make sure if it's an useless
tablet, or an useless rowset. If it is, will remove data of these
tablets/rowsets
The issue:
When path GC got a stale rowset path from the data dir of old tablet, it
extract the tablet id and rowset id
Then it check if the tablet id exists in TabletManager, and the answer is
YES!
It got the tablet instance, which is the new tablet, then it check if the
stale rowset id from the old tablet path exists in the new tablet instance, and
got the answer NO.
The path GC process treat the rowset as an useless rowset, since it can't
find anyone holds reference to it, then delete the data of this stale rowset.
But some query may still holds reference to this stale rowset, the deletion
will cause query failure.
Solution:
The lifecycle of all rowsets in a shutdown tablet, should be related with
the lifecycle of this tablet
We need to differentiate the old tablet and the new one created by
migration task, while performing path GC.
---
be/src/olap/data_dir.cpp | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/be/src/olap/data_dir.cpp b/be/src/olap/data_dir.cpp
index e84ebd403b0..0f6d279e729 100644
--- a/be/src/olap/data_dir.cpp
+++ b/be/src/olap/data_dir.cpp
@@ -670,7 +670,14 @@ void
DataDir::_perform_path_gc_by_tablet(std::vector<std::string>& tablet_paths)
std::swap(*forward, *backward);
continue;
}
- if (auto tablet = _engine.tablet_manager()->get_tablet(tablet_id);
!tablet) {
+ auto tablet = _engine.tablet_manager()->get_tablet(tablet_id);
+ if (!tablet || tablet->data_dir() != this) {
+ if (tablet) {
+ LOG(INFO) << "The tablet in path " << path
+ << " is not same with the running one: " <<
tablet->data_dir()->_path
+ << "/" << tablet->tablet_path()
+ << ", might be the old tablet after migration, try
to move it to trash";
+ }
_engine.tablet_manager()->try_delete_unused_tablet_path(this,
tablet_id, schema_hash,
path);
--backward;
@@ -717,6 +724,12 @@ void DataDir::_perform_path_gc_by_rowset(const
std::vector<std::string>& tablet_
continue;
}
+ if (tablet->data_dir() != this) {
+ // Current running tablet is not in same data_dir, maybe it's a
tablet after migration,
+ // will be reclaimed in the next time `_perform_path_gc_by_tablet`
+ continue;
+ }
+
bool exists;
std::vector<io::FileInfo> files;
auto st = io::global_local_filesystem()->list(path, true, &files,
&exists);
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]