github-actions[bot] commented on code in PR #63286:
URL: https://github.com/apache/doris/pull/63286#discussion_r3246471833
##########
cloud/src/recycler/recycler.cpp:
##########
@@ -298,6 +302,11 @@ void Recycler::recycle_callback() {
// skip instance in recycling
if (recycling_instance_map_.count(instance_id)) continue;
}
Review Comment:
This only prevents a dequeued instance from starting after the flag is
already false. If `enable_recycler` is flipped to false while an
`InstanceRecycler::do_recycle()` round is already running, that round keeps
deleting data/KV and `lease_recycle_jobs()` continues renewing its job because
neither path observes the new switch. For a dynamic recycler disable switch,
that leaves the most important operational case uncovered: disabling recycler
during a suspected bad recycle does not stop the in-flight recycle work. Please
also stop/abort the entries in `recycling_instance_map_` when the flag becomes
false, or make the active recycle/lease paths check the flag and stop promptly.
##########
cloud/test/recycler_test.cpp:
##########
@@ -8693,4 +8693,85 @@ TEST(RecyclerTest,
recycle_tablet_with_delete_file_failure) {
EXPECT_EQ(it->size(), 0) << "All recycle rowset keys should be
deleted";
}
}
+
+TEST(RecyclerTest, enable_recycler_default_true) {
+ EXPECT_TRUE(config::enable_recycler);
+}
+
+TEST(RecyclerTest, enable_recycler_skip_instance_scanner) {
+ auto txn_kv = std::make_shared<MemTxnKv>();
+ ASSERT_EQ(txn_kv->init(), 0);
+
+ bool old_val = config::enable_recycler;
+ config::enable_recycler = false;
+ DORIS_CLOUD_DEFER {
+ config::enable_recycler = old_val;
+ };
+
+ int64_t old_recycle_interval = config::recycle_interval_seconds;
+ config::recycle_interval_seconds = 0;
+ DORIS_CLOUD_DEFER {
+ config::recycle_interval_seconds = old_recycle_interval;
+ };
+
+ int64_t old_sleep = config::recycler_sleep_before_scheduling_seconds;
+ config::recycler_sleep_before_scheduling_seconds = 0;
+ DORIS_CLOUD_DEFER {
+ config::recycler_sleep_before_scheduling_seconds = old_sleep;
+ };
+
+ Recycler recycler(txn_kv);
+ std::thread t([&]() { recycler.instance_scanner_callback(); });
+
+ // Let the callback complete one iteration:
+ // sleep(0) -> check enable_recycler (false, skip) -> wait_for(0,
timeout)
+ std::this_thread::sleep_for(std::chrono::milliseconds(100));
+
+ recycler.stopped_ = true;
+ recycler.notifier_.notify_all();
+ t.join();
+
+ EXPECT_TRUE(recycler.pending_instance_queue_.empty());
Review Comment:
This assertion does not prove that scanning was skipped. The test never
writes any instance metadata into `txn_kv`, so `pending_instance_queue_` would
remain empty even if the new `if (config::enable_recycler)` guard in
`instance_scanner_callback()` were removed and `get_all_instances()` ran
normally. Please seed an instance that would be enqueued when the flag is true,
or otherwise assert that `get_all_instances()`/enqueue is not reached while
disabled.
##########
cloud/test/recycler_test.cpp:
##########
@@ -8693,4 +8693,85 @@ TEST(RecyclerTest,
recycle_tablet_with_delete_file_failure) {
EXPECT_EQ(it->size(), 0) << "All recycle rowset keys should be
deleted";
}
}
+
+TEST(RecyclerTest, enable_recycler_default_true) {
+ EXPECT_TRUE(config::enable_recycler);
+}
+
+TEST(RecyclerTest, enable_recycler_skip_instance_scanner) {
+ auto txn_kv = std::make_shared<MemTxnKv>();
+ ASSERT_EQ(txn_kv->init(), 0);
+
+ bool old_val = config::enable_recycler;
+ config::enable_recycler = false;
+ DORIS_CLOUD_DEFER {
+ config::enable_recycler = old_val;
+ };
+
+ int64_t old_recycle_interval = config::recycle_interval_seconds;
+ config::recycle_interval_seconds = 0;
+ DORIS_CLOUD_DEFER {
+ config::recycle_interval_seconds = old_recycle_interval;
+ };
+
+ int64_t old_sleep = config::recycler_sleep_before_scheduling_seconds;
+ config::recycler_sleep_before_scheduling_seconds = 0;
+ DORIS_CLOUD_DEFER {
+ config::recycler_sleep_before_scheduling_seconds = old_sleep;
+ };
+
+ Recycler recycler(txn_kv);
+ std::thread t([&]() { recycler.instance_scanner_callback(); });
+
+ // Let the callback complete one iteration:
+ // sleep(0) -> check enable_recycler (false, skip) -> wait_for(0,
timeout)
+ std::this_thread::sleep_for(std::chrono::milliseconds(100));
+
+ recycler.stopped_ = true;
+ recycler.notifier_.notify_all();
+ t.join();
+
+ EXPECT_TRUE(recycler.pending_instance_queue_.empty());
+}
+
+TEST(RecyclerTest, enable_recycler_skip_recycle_callback) {
+ auto txn_kv = std::make_shared<MemTxnKv>();
+ ASSERT_EQ(txn_kv->init(), 0);
+
+ bool old_val = config::enable_recycler;
+ config::enable_recycler = false;
+ DORIS_CLOUD_DEFER {
+ config::enable_recycler = old_val;
+ };
+
+ Recycler recycler(txn_kv);
+
+ InstanceInfoPB instance;
+ instance.set_instance_id("test_instance");
+ recycler.pending_instance_queue_.push_back(instance);
+ recycler.pending_instance_set_.insert("test_instance");
+
+ std::thread t([&]() { recycler.recycle_callback(); });
+
+ // Wait until the callback has popped the instance from the queue.
+ // Can not wait on pending_instance_cond_ here because the callback does
+ // not notify after popping, which may cause a deadlock: both the main
+ // thread and the callback end up waiting on the same CV with different
+ // predicates and no one will wake them up.
+ while (true) {
+ {
+ std::lock_guard lock(recycler.mtx_);
+ if (recycler.pending_instance_queue_.empty()) break;
+ }
+ std::this_thread::yield();
+ }
+
+ recycler.stopped_ = true;
+ recycler.pending_instance_cond_.notify_all();
+ t.join();
+
+ EXPECT_TRUE(recycler.pending_instance_queue_.empty());
Review Comment:
These final-state assertions do not prove that the recycle callback skipped
work because of `enable_recycler=false`. With the guard removed, the worker
still pops the queue and can fail early on this mostly empty `InstanceInfoPB`,
leaving the queue/set/map empty as asserted here. Please make the test
distinguish the disabled path from an attempted recycle, for example by
asserting no recycle job key is created or by using a valid instance plus
instrumentation/sync point that would fail if `InstanceRecycler::init()` or
`do_recycle()` is reached.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]