[ https://issues.apache.org/jira/browse/HBASE-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133174#comment-17133174 ]
Viraj Jasani edited comment on HBASE-24528 at 6/11/20, 12:58 PM: ----------------------------------------------------------------- All great suggestions! Although work done for slow_log ringbuffer in RegionServer mostly has all IA.Private (and some IS.Evolving) interfaces. If we think about the ongoing release preparations for 2.3.0, HBASE-22978 and it's completed sub-tasks are going to land in 2.3.0 sooner. If we consider providing higher level abstraction for the framework (like LMax Disruptor publish access should be available to all types of clients), I am wondering if it breaks any convenience of IA by any chance. I hope not. I believe we can take this up in various sub-tasks / linked tasks: # Create abstraction for providing payload to RingBuffer, which can include EventType (could be slow_log or balancer_plans) # Put abstractions in common place accessible to entire hbase-server module (It is already in hbase-server but with regionserver package) # For above 2 tasks, slow_log should be the only implementor (ring buffer), no regression so far # Implement new ring buffer balancer_plans, separate queues should take requests from their respective clients. # Persistence layer (ZK? System table?) Although I mentioned above items, in order to make patches easy to review, I will try to focus more on specific sub-task and maybe introduce more if required with no tight coupling among them. was (Author: vjasani): All great suggestions! Although work done for slow_log ringbuffer in RegionServer mostly has all IA.Private (and some IS.Evolving) interfaces. If we think about the ongoing release preparations for 2.3.0, HBASE-22978 and it's completed sub-tasks are going to land in 2.3.0 sooner. If we consider providing higher level abstraction for the framework (like LMax Disruptor publish access should be available to all types of clients), I am wondering if it breaks any convenience of IA by any chance. I hope not. I believe we can take this up in various sub-tasks / linked tasks: # Create abstraction for providing payload to RingBuffer, which can include EventType (could be slow_log or balancer_plans) # Put abstractions in common place accessible to entire hbase-server module (It is already in hbase-server but with regionserver package) # For above 2 tasks, slow_log should be the only implementor (ring buffer), no regression so far # Implement new ring buffer balancer_plans, separate queues should take requests from their respective clients. Although I mentioned above items, in order to make patches easy to review, I will try to focus more on specific sub-task and maybe introduce more if required with no tight coupling among them. > Improve balancer decision observability > --------------------------------------- > > Key: HBASE-24528 > URL: https://issues.apache.org/jira/browse/HBASE-24528 > Project: HBase > Issue Type: New Feature > Components: Admin, Balancer, Operability, shell, UI > Reporter: Andrew Kyle Purtell > Priority: Major > > We provide detailed INFO and DEBUG level logging of balancer decision > factors, outcome, and reassignment planning, as well as similarly detailed > logging of the resulting assignment manager activity. However, an operator > may need to perform online and interactive observation, debugging, or > performance analysis of current balancer activity. Scraping and correlating > the many log lines resulting from a balancer execution is labor intensive and > has a lot of latency (order of ~minutes to acquire and index, order of > ~minutes to correlate). > The balancer should maintain a rolling window of history, e.g. the last 100 > region move plans, or last 1000 region move plans submitted to the assignment > manager. This history should include decision factor details and weights and > costs. The rsgroups balancer may be able to provide fairly simple decision > factors, like for example "this table was reassigned to that regionserver > group". The underlying or vanilla stochastic balancer on the other hand, > after a walk over random assignment plans, will have considered a number of > cost functions with various inputs (locality, load, etc.) and multipliers, > including custom cost functions. We can devise an extensible class structure > that represents explanations for balancer decisions, and for each region move > plan that is actually submitted to the assignment manager, we can keep the > explanations of all relevant decision factors alongside the other details of > the assignment plan like the region name, and the source and destination > regionservers. > This history should be available via API for use by new shell commands and > admin UI widgets. > The new shell commands and UI widgets can unpack the representation of > balancer decision components into human readable output. -- This message was sent by Atlassian Jira (v8.3.4#803005)