Let me try out the below in a unit test. Good on you Lars,
S

On Wed, Nov 9, 2016 at 2:46 AM, Lars George <lars.geo...@gmail.com> wrote:

> Wait, I am getting more confused (why must this be so hard to read?),
> as in HMaster.balance() we have this
>
> long cutoffTime = System.currentTimeMillis() + maximumBalanceTime;
> int rpCount = 0;  // number of RegionPlans balanced so far
> long totalRegPlanExecTime = 0;
> if (plans != null && !plans.isEmpty()) {
>   for (RegionPlan plan: plans) {
>     LOG.info("balance " + plan);
>     long balStartTime = System.currentTimeMillis();
>     //TODO: bulk assign
>     this.assignmentManager.balance(plan);
>     totalRegPlanExecTime += System.currentTimeMillis()-balStartTime;
>     rpCount++;
>     if (rpCount < plans.size() &&
>         // if performing next balance exceeds cutoff time, exit the loop
>         (System.currentTimeMillis() + (totalRegPlanExecTime /
> rpCount)) > cutoffTime) {
>       //TODO: After balance, there should not be a cutoff time
> (keeping it as a security net for now)
>       LOG.debug("No more balancing till next balance run;
> maximumBalanceTime=" +
>         maximumBalanceTime);
>       break;
>     }
>   }
> }
>
> So the cut off time is T1 (assuming this is now when the balance()
> call was made by the chore) plus the extra 5mins. rpCount is 0 and the
> accrued time it took for plans to execute is also 0. Let's also assume
> that we have many plans to execute, to trigger the checks.
>
> We record the time the assignmentManager.balance(plan) call took, and
> add this to the accrued time counter, while increasing the plan
> counter. We loop now until we executed all plans _or_ (and that is the
> part I am after), until the current time plus "the accrued plan
> execution time divided by the their count" exceeds the cut off time
> computed above.
>
> For starters, why do we add the "rpCount < plans.size()" check at all
> into that condition? What does that actually help or do? Just to skip
> the very last check when the loop end and timeout coincide?
>
> Next, what is that division doing in the time check? It computes the
> average it seems (total time divided by the count) so say if 5 plans
> ran, and each took 4 mins, then the time we check "Tcurr + ( 5 * 4
> mins) / 5 > T1 + 5 mins", or "T1 + 20 mins + 4 mins > T1 + 5 mins". So
> this already triggered way earlier and won't get to this. I am at a
> loss to explain the logic behind this as I do not know what we usually
> see in terms of number of plans and their execution times.
>
> But even assuming say 4 plans at 1 min each, we get "T1 + 4 mins + 1
> mins > T1 + 5 mins" - so is adding the average to skip the last one on
> the assumption it would exceed the total balance time (the cut off)?
> Seems as such.
>
> It then runs say 4 mins, sleeps 1 min, and then starts again due to
> the chore firing. In the end, we continuously rearrange the cluster,
> no?
>
>
>
> On Wed, Nov 9, 2016 at 11:14 AM, Lars George <lars.geo...@gmail.com>
> wrote:
> > Hi,
> >
> > In HMaster.java we have this code (added, mostly, in HBASE-3422):
> >
> > private int getBalancerCutoffTime() {
> >   int balancerCutoffTime =
> >     getConfiguration().getInt("hbase.balancer.max.balancing", -1);
> >   if (balancerCutoffTime == -1) {
> >     // No time period set so create one
> >     int balancerPeriod =
> >       getConfiguration().getInt("hbase.balancer.period", 300000);
> >     balancerCutoffTime = balancerPeriod;
> >     // If nonsense period, set it to balancerPeriod
> >     if (balancerCutoffTime <= 0) balancerCutoffTime = balancerPeriod;
> >   }
> >   return balancerCutoffTime;
> > }
> >
> > Looking at these lines in particular, it seems that HBASE-8119 has
> > added the unconditional line, make the conditional obsolete:
> >
> >     balancerCutoffTime = balancerPeriod;
> >     // If nonsense period, set it to balancerPeriod
> >     if (balancerCutoffTime <= 0) balancerCutoffTime = balancerPeriod;
> >
> > Shall I just create a JIRA, patch and remove the conditional assignment?
> >
> > And since the cut off time is not set, it defaults to the same 5mins
> > as the balancer period resolves to, making this basically a continuous
> > process, as the chore is calling balance every 5 mins, and it then
> > runs for 5 mins before stopping. How does this address the issue
> > reported in HBASE-3422?
> >
> > Best,
> > Lars
>

Reply via email to