[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249103#comment-14249103
 ] 

Albert P Tobey commented on CASSANDRA-8494:
-------------------------------------------

Neat idea. I think this would make a lot of sense to operators and provide 
visibility into the rebuild process that's easy to understand (how many tokens 
are complete?).

Many of the customers I've talked to in the last few months will be very 
excited about this. In one case, they want to attach ~70TB of very fast SSD. I 
explained everything to them, they're still going to try.

Another client has more than 100 remote sites that store time-series data. They 
want to store 10-15TB per node on 15K SAS RAID10. It's the gear they can get 
and they have limited ability to control power drops etc. in the remote sites, 
so density is really important to them.

My former employer was trying to run 8 x 3TB SATA. No matter how hard we fought 
for the right drives, the incentives from the HW vendors etc. drove them to buy 
the big SATA drives.

I think ops folks will like this and there's an opportunity to use this feature 
to improve the UX of bootstrap (by using token ranges to improve feedback to 
ops).

> incremental bootstrap
> ---------------------
>
>                 Key: CASSANDRA-8494
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jon Haddad
>            Priority: Minor
>              Labels: density
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to