[ 
https://issues.apache.org/jira/browse/HBASE-28068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764326#comment-17764326
 ] 

Viraj Jasani commented on HBASE-28068:
--------------------------------------

In fact, the config limit can be applied during plan computation (i.e. 
{_}computeMergeNormalizationPlans(){_}).

For instance, we can limit the size of rangeMembers here:
{code:java}
...
...
...

if (
  rangeMembers.isEmpty() // when there are no range members, seed the range 
with whatever
                         // we have. this way we're prepared in case the next 
region is
                         // 0-size.
    || (rangeMembers.size() == 1 && sumRangeMembersSizeMb == 0) // when there 
is only one
                                                                // region and 
the size is 0,
                                                                // seed the 
range with
                                                                // whatever we 
have.
    || regionSizeMb == 0 // always add an empty region to the current range.
    || (regionSizeMb + sumRangeMembersSizeMb <= avgRegionSizeMb)
) { // add the current region
    // to the range when
    // there's capacity
    // remaining.
  rangeMembers.add(new NormalizationTarget(regionInfo, regionSizeMb));
  sumRangeMembersSizeMb += regionSizeMb;
  continue;
}

...
...
... {code}
If the configured limit is higher thanĀ {_}rangeMembers.size(){_}, we don't need 
to compute any further. This is for merge plan, this might be improved in 
general as well.

> Normalizer should batch merging 0 sized/empty regions
> -----------------------------------------------------
>
>                 Key: HBASE-28068
>                 URL: https://issues.apache.org/jira/browse/HBASE-28068
>             Project: HBase
>          Issue Type: Improvement
>          Components: Normalizer
>    Affects Versions: 2.5.5
>            Reporter: Ravi Kishore Valeti
>            Assignee: Rahul Kumar
>            Priority: Minor
>             Fix For: 2.6.0, 2.5.6, 3.0.0-beta-1
>
>
> In our production environment, while investigating an issue, we observed that 
> the Noramlizer had scheduled one single merge procedure to an RS providing 
> 27K+ empty regions of a table (this was a result of a failed copy table job 
> that left 27K+ empty regions of the table) to merge.
> This action led the procedure to go to stuck state and eventually the 
> procedure framework bailed out after ~40mins. This was happening with each 
> normalizer run until we deleted the table manually.
> Logs
> Normalizer triggers a merge procedure
> normalizer.RegionNormalizerWorker - NormalizationTarget[regionInfo=\{ENCODED 
> => 6e8606335a62f6bafceb017dc7edfdf5, NAME => 'TEST.TEST_TABLE,XXXX.', 
> STARTKEY => 'XXXX', ENDKEY => 'YYYY'},{*}regionSizeMb=0{*}], 
> NormalizationTarget[regionInfo=\{ENCODED => 79607df308d7618e632abe8a12c1bf6b, 
> NAME => 'TEST.TEST_TABLE,XXXX', STARTKEY => 'XXYY', ENDKEY => 
> 'YYZZ'},{*}regionSizeMb=0]{*}]] resulting in *pid 21968356*
> procedure immediately gets stuck
> procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run 
> time 12.4850 sec
> Finally fails after ~40 mins
> procedure2.ProcedureExecutor - Worker *stuck* PEWorker-56(pid=21968356), run 
> time *40 mins, 58.055 sec*
> Bails out with RuntimeException
> procedure2.ProcedureExecutor - force=false
> java.lang.UnsupportedOperationException: pid=21968356, 
> state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, locked=true, 
> exception=java.lang.{*}RuntimeException via CODE-BUG: Uncaught runtime 
> exception{*}: pid=21968356, state=RUNNABLE:MERGE_TABLE_REGIONS_UPDATE_META, 
> locked=true; MergeTableRegionsProcedure table=TEST.TEST_TABLEXXXX, 
> {*}regions={*}{*}[269a1b168af497cce9ba6d3d581568f2{*}
> .
> .
> .
> .
> *27K+ regions printed here]*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to