> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would 
> > actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This 
> > goes back through the balancer, so you have a bit of churn in however many 
> > "rounds" the Balancer takes to choose where those tablets should go, and 
> > then for the master to process the necessary assignments for each tserver. 
> > How I'm seeing it described is that the only piece of the puzzle that we're 
> > making better is removing the migration components in favor of letting the 
> > user control this directly. How much does a "smart" Balancer implementation 
> > close the gap between the user providing migrations in regards to 
> > performance? Also, how does removing the Balancer from the equation change 
> > the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into 
> > some simple primitives, I believe this approach is still a rather difficult 
> > distributed state problem that I'm worried is being over-architected. My 
> > $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found 
> http://hbase.apache.org/book/node.management.html. Their general approach is 
> to provide a graceful shutdown for regionservers. This is still subject to 
> problems in mass amounts of servers being stopped at one time. To alleviate 
> some of this pain, they use ZK to store what servers are currently in a 
> "draining state" to avoid new assignments to those nodes -- "[...] 
> decommissioning mulitple nodes may be non-optimal because regions that are 
> being drained from one region server may be moved to other regionservers that 
> are also draining. Marking RegionServers to be in the draining state prevents 
> this from happening",

An alternative to this design, is one that Mike mentioned on the issue.   
Temporarily replace the balancer.  I am thinking that providing these primitves 
for manipulating tablets will allow an administrator to quickly script a one 
off solution to a problem, in addition to solving the rolling restart problem.  
You do not get this quick flexibility with writing a new balancer.

Killing tablet servers is a solution.  I think it would be nice to have a 
solution that avoids log recovery, minimizes down time of individual tablets, 
preserves locality, and is easy to use.  It does not have to be this solution.  
W/o additional scripts, the primary use case in 1454 would not be easy to use.  
 A balancer alone would not be enough to achieve the goal of migrating tablets 
between old and new tservers on the same node.  However a balancer + tservers 
states like you mentioned from HBAse may provide enough.  Should probably try 
to explore the balancer option a bit more.


- kturner


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>

Reply via email to