[ https://issues.apache.org/jira/browse/HBASE-17707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897548#comment-15897548 ]
Hadoop QA commented on HBASE-17707: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 7s {color} | {color:red} HBASE-17707 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12856301/HBASE-17707-06.patch | | JIRA Issue | HBASE-17707 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/5964/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > New More Accurate Table Skew cost function/generator > ---------------------------------------------------- > > Key: HBASE-17707 > URL: https://issues.apache.org/jira/browse/HBASE-17707 > Project: HBase > Issue Type: New Feature > Components: Balancer > Affects Versions: 1.2.0 > Environment: CentOS Derivative with a derivative of the 3.18.43 > kernel. HBase on CDH5.9.0 with some patches. HDFS CDH 5.9.0 with no patches. > Reporter: Kahlil Oppenheimer > Assignee: Kahlil Oppenheimer > Priority: Minor > Fix For: 2.0 > > Attachments: HBASE-17707-00.patch, HBASE-17707-01.patch, > HBASE-17707-02.patch, HBASE-17707-03.patch, HBASE-17707-04.patch, > HBASE-17707-05.patch, HBASE-17707-06.patch > > > This patch includes new version of the TableSkewCostFunction and a new > TableSkewCandidateGenerator. > The new TableSkewCostFunction computes table skew by counting the minimal > number of region moves required for a given table to perfectly balance the > table across the cluster (i.e. as if the regions from that table had been > round-robin-ed across the cluster). This number of moves is computer for each > table, then normalized to a score between 0-1 by dividing by the number of > moves required in the absolute worst case (i.e. the entire table is stored on > one server), and stored in an array. The cost function then takes a weighted > average of the average and maximum value across all tables. The weights in > this average are configurable to allow for certain users to more strongly > penalize situations where one table is skewed versus where every table is a > little bit skewed. To better spread this value more evenly across the range > 0-1, we take the square root of the weighted average to get the final value. > The new TableSkewCandidateGenerator generates region moves/swaps to optimize > the above TableSkewCostFunction. It first simply tries to move regions until > each server has the right number of regions, then it swaps regions around > such that each region swap improves table skew across the cluster. > We tested the cost function and generator in our production clusters with > 100s of TBs of data and 100s of tables across dozens of servers and found > both to be very performant and accurate. -- This message was sent by Atlassian JIRA (v6.3.15#6346)