> On Aug. 19, 2014, 6:31 p.m., Josh Elser wrote:
> > One big design concern I have is what gains the final solution would 
> > actually have over what is currently possible with Accumulo as it stands.
> > 
> > Right now, you can force tablets to migrate by stopping a tserver. This 
> > goes back through the balancer, so you have a bit of churn in however many 
> > "rounds" the Balancer takes to choose where those tablets should go, and 
> > then for the master to process the necessary assignments for each tserver. 
> > How I'm seeing it described is that the only piece of the puzzle that we're 
> > making better is removing the migration components in favor of letting the 
> > user control this directly. How much does a "smart" Balancer implementation 
> > close the gap between the user providing migrations in regards to 
> > performance? Also, how does removing the Balancer from the equation change 
> > the wall time to get a tablet assigned (is it significant)?
> > 
> > We have to also understand that while we can decompose the problem into 
> > some simple primitives, I believe this approach is still a rather difficult 
> > distributed state problem that I'm worried is being over-architected. My 
> > $0.02.
> 
> Josh Elser wrote:
>     For context, I was reading about HBase's support on the subject and found 
> http://hbase.apache.org/book/node.management.html. Their general approach is 
> to provide a graceful shutdown for regionservers. This is still subject to 
> problems in mass amounts of servers being stopped at one time. To alleviate 
> some of this pain, they use ZK to store what servers are currently in a 
> "draining state" to avoid new assignments to those nodes -- "[...] 
> decommissioning mulitple nodes may be non-optimal because regions that are 
> being drained from one region server may be moved to other regionservers that 
> are also draining. Marking RegionServers to be in the draining state prevents 
> this from happening",
> 
> kturner wrote:
>     An alternative to this design, is one that Mike mentioned on the issue.   
> Temporarily replace the balancer.  I am thinking that providing these 
> primitves for manipulating tablets will allow an administrator to quickly 
> script a one off solution to a problem, in addition to solving the rolling 
> restart problem.  You do not get this quick flexibility with writing a new 
> balancer.
>     
>     Killing tablet servers is a solution.  I think it would be nice to have a 
> solution that avoids log recovery, minimizes down time of individual tablets, 
> preserves locality, and is easy to use.  It does not have to be this 
> solution.  W/o additional scripts, the primary use case in 1454 would not be 
> easy to use.   A balancer alone would not be enough to achieve the goal of 
> migrating tablets between old and new tservers on the same node.  However a 
> balancer + tservers states like you mentioned from HBAse may provide enough.  
> Should probably try to explore the balancer option a bit more.
> 
> kturner wrote:
>     One other thing I was thinking about was that you can not make 
> assumptions about the environment.  Users may not use the Accumulo scripts to 
> start and stop tservers.
> 
> Josh Elser wrote:
>     I think there would be merit in enumerating what would be needed by a 
> custom Balancer. Is it really something that would need to be written on a 
> per-instance basis, or is there something we could provide that would be more 
> conducive to "heavy" tserver churn.
>     
>     I would definitely not advocate killing tservers. A graceful shutdown 
> would be much more desirable. We get a little bit of help here by the 
> client-side scan retries for not having to quiesce all reads to a tablet, but 
> that could still introduce more latency for a query (e.g. lots of filtering 
> over a large row).
>     
>     As mentioned about concerns with the final two-tservers-per-node 
> approach, I'm not entirely convinced that "sibling" tservers is worth the 
> complexity. We really don't have that much locality in how we use HDFS now. 
> Is trying to keep all of the tablets assigned on the same node going to make 
> things much more efficient over assigning them to nodes elsewhere? I don't 
> even have a good grasp for what these perf numbers would be at a high level.
> 
> kturner wrote:
>     Eric looked into locality once when running continuous ingest and found 
> that ~50% of tablets had local data.    This matches expectations as the 
> default balancer will try to migrate one child after a split.
>     
>     The sibling tserver concept may be too complex to implement.  Sigh, but 
> its so cool :)

Clarification on what I meant by locality: we don't consider HDFS block 
locations when we chose where Tablets get assigned, AFAIK. Yes, we'll have 
locality when we're slamming Accumulo with ingest, but once we start agitating 
at any reasonable rate, that's going to be lost.

Requiring sibling tservers also implies that you have ample extra resources on 
a node which is absolutely not going to be the case for most systems. It would 
be nice, but it sounds to me like a one-off from what would be the norm. :)


- Josh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24855/#review51006
-----------------------------------------------------------


On Aug. 19, 2014, 5:50 p.m., kturner wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24855/
> -----------------------------------------------------------
> 
> (Updated Aug. 19, 2014, 5:50 p.m.)
> 
> 
> Review request for accumulo.
> 
> 
> Bugs: ACCUMULO-1454
>     https://issues.apache.org/jira/browse/ACCUMULO-1454
> 
> 
> Repository: accumulo
> 
> 
> Description
> -------
> 
> Positing ACCUMULO-1454 design doc for review
> 
> 
> Diffs
> -----
> 
>   docs/src/main/asciidoc/design/ACCUMULO-1454-proposal-01.adoc PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24855/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> kturner
> 
>

Reply via email to