Does this new functionality do anything that is not covered by REST API? https://bookkeeper.apache.org/docs/4.10.0/admin/http/#endpoint-apiv1autorecoverytrigger_audit
On 2023/07/10 06:34:35 Wenbing Shen wrote: > Hi everyone, > > I would like to initiate a discussion regarding the current bookie force > reschedule auditor tasks. Below is the detailed BP content. If you have any > questions or ideas, please feel free to reply to this email for further > discussion. Thank you! > > This is the master ticket for tracking BP-63 : > Proposal PR - #3964 <https://github.com/apache/bookkeeper/pull/3964> > Motivation > > Currently, the Bookie can reschedule Auditor check tasks in several ways, > excluding the auditorBookieTask as it provides a separate mechanism to > trigger task reexecution. This BP specifically discusses > AuditorCheckAllLedgersTask/AuditorPlacementPolicyCheckTask/AuditorReplicasCheckTask: > > 1: The Bookie provides three execution times based on ZooKeeper, > checkallledgersctime/placementpolicycheckctime/replicascheckctime. By > updating these execution times, we can dynamically adjust the execution > frequency of auditor tasks, but it requires restarting the Auditor process > or reopening the Auditor election to trigger task execution. > > 2: By using the ForceAuditorChecksCmd tool, which is still based on the > underlying logic of the first point, restarting the Auditor or performing > an election is also necessary to trigger task execution. > > 3: The Decommission and RecoveryBookie tools tend to focus on executing > recovery logic and only check and recover a specific subset of Bookie > services. > > The above methods are complex and have poor stability when rescheduling the > Auditor check tasks in a cluster. > Proposal > > Therefore, I propose further optimizing the rescheduling of Auditor tasks. > > 1: The Auditor monitors the persistent znode path > /ZK_LEDGERS_ROOT_PATH/underreplication/scheduleAuditor. > 2: Users modify the task ctime using the ForceAuditorChecksCmd tool and > forcefully create the above znode path using the force parameter. > 3: The Auditor creates callbacks through scheduleAuditor to reschedule the > aforementioned three tasks. > 4: After the Auditor completes rescheduling the tasks, the scheduleAuditor > node is deleted. > 5: When the Auditor starts, it deletes the old scheduleAuditor node to > avoid logical confusion. > > This way, we can trigger the scheduling and execution of Auditor tasks > through an online interface without relying on service restart or > re-election. > Compatibility, Deprecation, and Migration Plan > > There are no compatibility issues. This BP introduces a new trigger flag > that does not affect the original logic and does not involve any changes to > other existing public APIs. There is no deprecation or migration plan. > > > Best regards, > > Wenbing Shen >