[ https://issues.apache.org/jira/browse/KUDU-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
shenxingwuying reassigned KUDU-3422: ------------------------------------ Assignee: shenxingwuying > provide compact CLI tools for kudu administrators > ------------------------------------------------- > > Key: KUDU-3422 > URL: https://issues.apache.org/jira/browse/KUDU-3422 > Project: Kudu > Issue Type: New Feature > Reporter: shenxingwuying > Assignee: shenxingwuying > Priority: Major > > h1. Motivation > In kudu, compaction jobs may be a suffering at some scenario, for example: > # mrs, dms flush not timely enough. The patch for this: > [https://gerrit.cloudera.org/c/17743/] > # Disk space amplification is too serious, need compact all rs, but no jobs > runs, even when no maintenance job and workload is very low. > # Some kinds of gc jobs should have been launched but no jobs runs, even > when no maintenance job and workload is very low. > We can solve every problem about them case by case. Compaction jobs don't > work well may be complex, bugs exist or strategies are not good enough and > should be improved. Our new optimize scheme maybe not reach the effect we > expected. And we should ensure the new optimization online by upgrade kudu, > upgrade need consider some other situations about product environment and > users' worries, and the operation itself may encounter another suffering: > bootstrap is very very slow. > All in words, It's a very complex. Every problems need take some time to > analyse. The problem when production environment happens, administrators have > to change some gflags parameters and restart kudu to expect some compaction > jobs can be scheduled. You see, restart kudu may take too much time and > restarting cluster may loss availability. > I want to support a quick method to solve them without restart. It's a > troubleshooting for the cases above, not a root solution. > At this, I view them from another angle to solve some difficulties. The > solution can be accepted by SREs. > h1. Solution > We can deal with the problem in a flexible way: kudu administrators can > launch some kind of compaction jobs based on their jugdements. > To support the idea. Kudu CLI tool should add a command, like this: > > {{kudu compact <master_list> --tables=<tables> --tablet_ids=<tablet_ids> > --servers=<host:port> --compact_type=<compact_rowsets,deleted_rowset_gc,...>}} > kudu-tserver's network service should add a api, when receive the command, it > launch a corresponding compact job. The job should run at ThreadPool > 'thread_pool_' in class 'MaintenanceManager'. The compaction job is triggered > by administrators and it should skip the best score computation, so its a > method for abnormal cases. > The compaction job should run at another thread not the service thread, > because it may be a long time job. > So we should provide a method to check the job's status. -- This message was sent by Atlassian Jira (v8.20.10#820010)