[ 
https://issues.apache.org/jira/browse/KUDU-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenxingwuying updated KUDU-3364:
---------------------------------
    Attachment: image-2022-06-01-11-02-20-888.png

> Add TimerThread to ThreadPool to support a category of problem
> --------------------------------------------------------------
>
>                 Key: KUDU-3364
>                 URL: https://issues.apache.org/jira/browse/KUDU-3364
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Minor
>         Attachments: image-2022-06-01-11-02-20-888.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> h1. Scenanios
> In general, I am talking about a category of problem.
> There are some periodic tasks or automatically triggered scheduling tasks in 
> kudu. 
> For example, automatic rebalance of cluster data, some GC task and compaction 
> tasks.
> Their implementation is by kudu Thread, maybe std::thread or ThreadPool, the 
> really task internally periodic scheduled or internally strategy to trigge 
> execution. 
> They are all internal, we cann't do some.
> In fact, we need a method our control to trigge the above types of actions.
> In general, I am talking about a category of problem. 
> Some scenarios is significant.
> Below is examples:
>  
> h2. data rebalance
> There are two rebalance ways:
> 1. enable auto rebalance
> 2. use rebalance tool 1.14 before.
> The two ways maybe exist some conflicts at opeations race, because rebalance 
> tool' logic is a litte complex at tool and auto rebalance is running at 
> master.
> In future, auto rebalance at master will become very steady and become the 
> main way for data rebalance. And at the same time, admin opers need a 
> external trigger the rebalance just like auto rebalance.
> But, now auto rebalance is running in a thread and by time period.
> Although we can add a api for MasterService, but the api is synchronize, and 
> will cose very much, we need a asynchronized method to trigger the rebalance.
> h2. auto compaction
> Another example is auto compaction,
> I have found compaction strategy is not always valid, so maybe we need a 
> method  controlled by admin users to triggle compaction.
> If we can do a RowSetInCompaction, we need not restart the kudu cluster.
> h1.  
> h1. My Solution
> Add a timer in ThreadPool. This timer is a worker thread that schedules tasks 
> to the specified thread according to time.
> We can limit only SERIAL ThreadPoolToken can enable TimerThread.
> Pseudo code expresses my intention:
> {code:java}
> //代码占位符
> class TimerThread {
> class Task {         
> ThreadPoolToken token;         
> std::function<void()> f;     
> };
>     
> void Schedule(Task task, int delay_ms) {         
>   tasks_.insert(...);     
> }
> void RunLoop() {
>   while (...) {
>     SleepFor(100ms);
>     tasks = FindTasks();
>     for (auto task : tasks) {
>       token = task.token;
>       token->Submit(task.f);
>       tasks_.erase...             
>     }
>   }
> }
>   scoped_refptr<Thread> thread_;
>   std::multimap<MonoTime, Task>  tasks;
> };
> class ThreadPool{  
> ...  
> TimerThread* timer_;
> ... 
> };
> class ThreadPoolToken {
>   void Scheduler();      
> };{code}
> This scheme can be compatible with the previous ThreadPool, and timer is 
> nullptr by default.
> For periodic tasks, We can use a Control ThreadPool with timer to refact some 
> codes to make them more clear, to avoid the problem of too many single 
> threads in the past.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to