Does this new functionality do anything that is not covered by REST API?
https://bookkeeper.apache.org/docs/4.10.0/admin/http/#endpoint-apiv1autorecoverytrigger_audit

On 2023/07/10 06:34:35 Wenbing Shen wrote:
> Hi everyone,
> 
> I would like to initiate a discussion regarding the current bookie force
> reschedule auditor tasks. Below is the detailed BP content. If you have any
> questions or ideas, please feel free to reply to this email for further
> discussion. Thank you!
> 
> This is the master ticket for tracking BP-63 :
> Proposal PR - #3964 <https://github.com/apache/bookkeeper/pull/3964>
> Motivation
> 
> Currently, the Bookie can reschedule Auditor check tasks in several ways,
> excluding the auditorBookieTask as it provides a separate mechanism to
> trigger task reexecution. This BP specifically discusses
> AuditorCheckAllLedgersTask/AuditorPlacementPolicyCheckTask/AuditorReplicasCheckTask:
> 
> 1: The Bookie provides three execution times based on ZooKeeper,
> checkallledgersctime/placementpolicycheckctime/replicascheckctime. By
> updating these execution times, we can dynamically adjust the execution
> frequency of auditor tasks, but it requires restarting the Auditor process
> or reopening the Auditor election to trigger task execution.
> 
> 2: By using the ForceAuditorChecksCmd tool, which is still based on the
> underlying logic of the first point, restarting the Auditor or performing
> an election is also necessary to trigger task execution.
> 
> 3: The Decommission and RecoveryBookie tools tend to focus on executing
> recovery logic and only check and recover a specific subset of Bookie
> services.
> 
> The above methods are complex and have poor stability when rescheduling the
> Auditor check tasks in a cluster.
> Proposal
> 
> Therefore, I propose further optimizing the rescheduling of Auditor tasks.
> 
> 1: The Auditor monitors the persistent znode path
> /ZK_LEDGERS_ROOT_PATH/underreplication/scheduleAuditor.
> 2: Users modify the task ctime using the ForceAuditorChecksCmd tool and
> forcefully create the above znode path using the force parameter.
> 3: The Auditor creates callbacks through scheduleAuditor to reschedule the
> aforementioned three tasks.
> 4: After the Auditor completes rescheduling the tasks, the scheduleAuditor
> node is deleted.
> 5: When the Auditor starts, it deletes the old scheduleAuditor node to
> avoid logical confusion.
> 
> This way, we can trigger the scheduling and execution of Auditor tasks
> through an online interface without relying on service restart or
> re-election.
> Compatibility, Deprecation, and Migration Plan
> 
> There are no compatibility issues. This BP introduces a new trigger flag
> that does not affect the original logic and does not involve any changes to
> other existing public APIs. There is no deprecation or migration plan.
> 
> 
> Best regards,
> 
> Wenbing Shen
> 

Reply via email to