[ 
https://issues.apache.org/jira/browse/KUDU-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenxingwuying reassigned KUDU-3422:
------------------------------------

    Assignee: shenxingwuying

> provide compact CLI tools for kudu administrators
> -------------------------------------------------
>
>                 Key: KUDU-3422
>                 URL: https://issues.apache.org/jira/browse/KUDU-3422
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: shenxingwuying
>            Assignee: shenxingwuying
>            Priority: Major
>
> h1. Motivation
> In kudu, compaction jobs may be a suffering at some scenario, for example:
>  # mrs, dms flush not timely enough. The patch for this: 
> [https://gerrit.cloudera.org/c/17743/]
>  # Disk space amplification is too serious, need compact all rs, but no jobs 
> runs, even when no maintenance job and workload is very low.
>  # Some kinds of gc jobs should have been launched but no jobs runs, even 
> when no maintenance job and workload is very low.
> We can solve every problem about them case by case. Compaction jobs don't 
> work well may be complex, bugs exist or strategies are not good enough and 
> should be improved. Our new optimize scheme maybe not reach the effect we 
> expected. And we should ensure the new optimization online by upgrade kudu, 
> upgrade need consider some other situations about product environment and 
> users' worries, and the operation itself may encounter another suffering: 
> bootstrap is very very slow.
> All in words, It's a very complex. Every problems need take some time to 
> analyse. The problem when production environment happens, administrators have 
> to change some gflags parameters and restart kudu to expect some compaction 
> jobs can be scheduled. You see, restart kudu may take too much time and 
> restarting cluster may loss availability.
> I want to support a quick method to solve them without restart. It's a 
> troubleshooting for the cases above, not a root solution.
> At this, I view them from another angle to solve some difficulties. The 
> solution can be accepted by SREs.
> h1. Solution
> We can deal with the problem in a flexible way: kudu administrators can 
> launch some kind of compaction jobs based on their jugdements.
> To support the idea. Kudu CLI tool should add a command, like this:
>  
> {{kudu compact <master_list> --tables=<tables> --tablet_ids=<tablet_ids> 
> --servers=<host:port> --compact_type=<compact_rowsets,deleted_rowset_gc,...>}}
> kudu-tserver's network service should add a api, when receive the command, it 
> launch a corresponding compact job. The job should run at ThreadPool 
> 'thread_pool_' in class 'MaintenanceManager'. The compaction job is triggered 
> by administrators and it should skip the best score computation, so its a 
> method for abnormal cases.
> The compaction job should run at another thread not the service thread, 
> because it may be a long time job.
> So we should provide a method to check the job's status.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to