[jira] [Comment Edited] (HBASE-24528) Improve balancer decision observability

Andrew Kyle Purtell (Jira) Fri, 12 Jun 2020 11:31:33 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134458#comment-17134458
 ]


Andrew Kyle Purtell edited comment on HBASE-24528 at 6/12/20, 6:30 PM:
-----------------------------------------------------------------------

bq. Nobody is suggesting “everywhere”.

That said, for sake of completeness in discussion, let me quickly brainstorm 
cases where the functionality could be reused, if agreed upon:
* online slow log
* balancer decisions
* procedure execution history (maybe there's already something for this, didn't 
check)
* compaction decisions and activity
* split/merge decisions and activity
* archiving decisions and activity
* anything implemented as a scheduled chore, I suppose

I would not advocate for many of these, but a subset could be really useful, 
especially the items pertaining to region housekeeping and movement. 

Also, I am thinking about how we might collect really fine grained statistics 
on scanner activity. I don't have a proposal well formed enough for a JIRA or a 
design doc yet, I want to play around first, but the gist is when doing 
comparative benchmarks it would be super beneficial to count many low level 
actions and decisions. Currently we can use a profiler like async-profiler or 
visualvm to determine % cpu and time spent in methods or classes, or we can 
analyze a JFR event trace, or we can analyze a GC log, and from these indirect 
observations about the actions of our code generate hypotheses. Because they 
are indirect they can be hard to confirm or refute. Reasoning from code 
inspection only works reliably if the code fits within a single page of the 
editor. For example, between two versions I might observe % of time spent in 
decompression or block encoding increases, and I can see the total cumulative 
time spent in cell comparators is 3x more, so I can hypothesize that we are 
processing more cells during scanning, but I don't know why. Are we doing less 
SKIPping and more SEEKing? Probably? But if I had counters that I could compare 
of the SKIP versus SEEK decision, and in general counts of the many low level 
activities that go into scanning (block cache read vs hdfs block read vs 
memstore hit vs block decode, number of blocks read, number of keys iterated 
over, count of filter executions by classname, distribution of filter hint 
types returned, ...), then there is less to infer and more direct evidence to 
draw conclusions. It helps that, at least for now, all activity for a handler 
happens in the same thread context, so we can hang a concurrent map off a 
thread local and populate it with string keyed atomic counters. How would we 
expose this though? Perhaps as ScanMetrics. Perhaps as a sliding window of 
serialized trace. The latter would be akin to how the JVM manages such things 
for Java Flight Recorder. 

So to be fair that is more than just the two cases that invoked this discussion 
about DRY.


was (Author: apurtell):
bq. Nobody is suggesting “everywhere”.

That said, for sake of completeness in discussion, let me quickly brainstorm 
cases where the functionality could be reused, if agreed upon:
* online slow log
* balancer decisions
* procedure execution history (maybe there's already something for this, didn't 
check)
* compaction decisions and activity
* split/merge decisions and activity
* archiving decisions and activity
* anything implemented as a scheduled chore, I suppose

I would not advocate for many of these, but a subset could be really useful, 
especially the items pertaining to region housekeeping and movement. 

Also, I am thinking about how we might collect really fine grained statistics 
on scanner activity. I don't have a proposal well formed enough for a JIRA or a 
design doc yet, I want to play around first, but the gist is when doing 
comparative benchmarks it would be super beneficial to count many low level 
actions and decisions. Currently we can use a profiler like async-profiler or 
visualvm to determine % cpu and time spent in methods or classes, or we can 
analyze a JFR event trace, or we can analyze a GC log, and from these indirect 
observations about the actions of our code generate hypotheses. Because they 
are indirect they can be hard to confirm or refute. Reasoning from code 
inspection only works reliably if the code fits within a single page of the 
editor. For example, between two versions I might observe % of time spent in 
decompression or block encoding increases, and I can see the total cumulative 
time spent in cell comparators is 3x more, so I can hypothesize that we are 
processing more cells during scanning, but I don't know why. Are we doing less 
SKIPping and more SEEKing? Probably? But if I had counters that I could compare 
of the SKIP versus SEEK decision, and in general counts of the many low level 
activities that go into scanning (block cache read vs hdfs block read vs 
memstore hit vs block decode vs ...), then there is less to infer and more 
direct evidence to draw conclusions. It helps that, at least for now, all 
activity for a handler happens in the same thread context, so we can hang a 
concurrent map off a thread local and populate it with string keyed atomic 
counters. How would we expose this though? Perhaps as ScanMetrics. Perhaps as a 
sliding window of serialized trace. The latter would be akin to how the JVM 
manages such things for Java Flight Recorder. 

So to be fair that is more than just the two cases that invoked this discussion 
about DRY.

> Improve balancer decision observability
> ---------------------------------------
>
>                 Key: HBASE-24528
>                 URL: https://issues.apache.org/jira/browse/HBASE-24528
>             Project: HBase
>          Issue Type: New Feature
>          Components: Admin, Balancer, Operability, shell, UI
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> We provide detailed INFO and DEBUG level logging of balancer decision 
> factors, outcome, and reassignment planning, as well as similarly detailed 
> logging of the resulting assignment manager activity. However, an operator 
> may need to perform online and interactive observation, debugging, or 
> performance analysis of current balancer activity. Scraping and correlating 
> the many log lines resulting from a balancer execution is labor intensive and 
> has a lot of latency (order of ~minutes to acquire and index, order of 
> ~minutes to correlate). 
> The balancer should maintain a rolling window of history, e.g. the last 100 
> region move plans, or last 1000 region move plans submitted to the assignment 
> manager. This history should include decision factor details and weights and 
> costs. The rsgroups balancer may be able to provide fairly simple decision 
> factors, like for example "this table was reassigned to that regionserver 
> group". The underlying or vanilla stochastic balancer on the other hand, 
> after a walk over random assignment plans, will have considered a number of 
> cost functions with various inputs (locality, load, etc.) and multipliers, 
> including custom cost functions. We can devise an extensible class structure 
> that represents explanations for balancer decisions, and for each region move 
> plan that is actually submitted to the assignment manager, we can keep the 
> explanations of all relevant decision factors alongside the other details of 
> the assignment plan like the region name, and the source and destination 
> regionservers. 
> This history should be available via API for use by new shell commands and 
> admin UI widgets.
> The new shell commands and UI widgets can unpack the representation of 
> balancer decision components into human readable output. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-24528) Improve balancer decision observability

Reply via email to