[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2020-09-23 Thread Paulo Motta (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201025#comment-17201025
 ] 

Paulo Motta commented on CASSANDRA-8494:


Dynamic virtual nodes (CASSANDRA-16141) will make it trivial to support 
incremental bootstrap. The idea is similar to [~rustyrazorblade] suggestion on 
[this 
comment|https://issues.apache.org/jira/browse/CASSANDRA-8494?focusedCommentId=14264970&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14264970]:
 a node will bootstrap one token at a time and announce to the cluster that 
token is ready to receive requests before bootstrapping the next token. The 
pseudo-code is available 
[here|https://gist.github.com/pauloricardomg/1930c8cf645aa63387a57bb57f79a0f7#file-incremental_bootstrap-py].

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Legacy/Streaming and Messaging
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Low
>  Labels: dense-storage
> Fix For: 4.x
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2017-03-01 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891516#comment-15891516
 ] 

Jon Haddad commented on CASSANDRA-8494:
---

It looks like the original thing won't be done because of the insanity due to 
shuffle - so I'm renaming this & marking as done.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 3.11.x
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2016-07-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384157#comment-15384157
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

See comments on CASSANDRA-8943.  

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 3.x
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2016-07-19 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384102#comment-15384102
 ] 

Jeremy Hanna commented on CASSANDRA-8494:
-

Where did we get with this apart from avoiding pitfalls?  This is a nice way to 
add capacity and have it lighten the load sooner than later, especially 
important with dense nodes.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 3.x
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2016-04-06 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228943#comment-15228943
 ] 

Jeremiah Jordan commented on CASSANDRA-8494:


Need to be careful here.  We got rid of taketoken in CASSANDRA-7601 because 
there were a lot of pointy things with it that caused issues, and this ticket 
wants to implement it again :).

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 3.x
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-02-17 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324434#comment-14324434
 ] 

Yuki Morishita commented on CASSANDRA-8494:
---

Now I'm having challenging part.

Bootstrapping token as it is ready can change replica placement or replica 
range in the middle of bootstrapping process. We may succeed bootstrapping for 
few vnodes but if we have many, it is likely to fail streaming. Or even if we 
suceed, we may have over-streamed (out of range) data.

So, it may be easier to go Jake's route: proxy reads while bootstrapping. It is 
simpler for bootstrapping node to decide what to do for read request based on 
its bootstrapping progress, rather than coordinating (through gossip) which 
tokens are up/down amoung nodes in the cluster.
To do that, I guess we need to have:

* Bootstrapping node - Anounce bootstrapping tokens as usual. Receive data 
while keeping track of the completed range (This can also be used for resume). 
When read request comes and it is in the completed range, just serve the data. 
Otherwise forward the request to current replica to answer instead.
* Existing node - When receiving read request and it's in the pending range, 
just do extra work for preparing to receive proxied response.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-09 Thread Donald Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14271721#comment-14271721
 ] 

Donald Smith commented on CASSANDRA-8494:
-

Tunable consistency is related:  don't fail if a range is missing. Be fault 
tolerant and bootstrap as much as it can.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267006#comment-14267006
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

ISTM that there is some commonality here with nodes telling the cluster "I've 
lost a disk so I don't have vnodes X Y and Z" post-CASSANDRA-6696.  So a 
bootstrapping node would tell the cluster that initially it has nothing, and 
would update as we add the different vnodes in.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265097#comment-14265097
 ] 

Jon Haddad commented on CASSANDRA-8494:
---

{quote}
It could I think. Assuming it stored the last range it received. Feels like a 
follow-on ticket though.
{quote}

Hmm... one of the point of the ticket was to reduce the impact of node failure, 
but I failed to explicitly point that out, my bad.  The state manager that Yuki 
had proposed does include that functionality.  It seems like once that follow 
up ticket is written, you'll end up writing a state manager anyways? 

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265087#comment-14265087
 ] 

T Jake Luciani commented on CASSANDRA-8494:
---

It could I think.  Assuming it stored the last range it received.  Feels like a 
follow-on ticket though.

 The problem is while it was down any writes would be lost for the already 
streamed ranges.  

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265084#comment-14265084
 ] 

Jon Haddad commented on CASSANDRA-8494:
---

{quote}
 If the node dies nothing bad happens. 
{quote}

[~tjake] Using your approach, when the node is restarted, will it know where to 
resume bootstrapping? 

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265078#comment-14265078
 ] 

T Jake Luciani commented on CASSANDRA-8494:
---

bq. Is it possible for a node to acquire a token after it's been bootstrapped?  

No.  Shuffle is fundamentally broken and was removed from the codebase.  

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264970#comment-14264970
 ] 

Jon Haddad commented on CASSANDRA-8494:
---

Is it possible for a node to acquire a token after it's been bootstrapped?  If 
so, I feel like bootstrap can just be a single token, and then acquire follow 
up tokens after the initial bootstrap is finished.   It may need nodetool 
commands to start / stop this process, I haven't though that through all the 
way.  I assumed this functionality was there because of shuffle.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264714#comment-14264714
 ] 

T Jake Luciani commented on CASSANDRA-8494:
---

I'm not suggesting we change the ring early, just that we include pending 
ranges when we do read requests.  But you are right, if node A proxies to 
joining node B how do we keep B from sending back to A. 

Perhaps we can have B broadcast gossip the ranges it's completed and A would 
only send to B when it sees there is data for that range.

If B dies along the way everything was still pending so nothing bad happens.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-05 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264688#comment-14264688
 ] 

Yuki Morishita commented on CASSANDRA-8494:
---

In that case, we need to distinguish proxy request from normal request, since 
bootstrapping node has already joined and token ranges have been changed, 
otherwise proxied node will forward that request back to bootstrapping node.

If that is done easily, then this would be good approach.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2015-01-02 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263304#comment-14263304
 ] 

T Jake Luciani commented on CASSANDRA-8494:
---

Rather than add a rich state management to bootstrap why don't we consider 
joining nodes a part of the ring right away and proxy non-streamed ranges to a 
known replica till all the data is streamed.  If the node dies nothing bad 
happens.  We already send extra writes to joining nodes, so we would only need 
to add the ability for a joining node to track what data has been streamed so 
far. 

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Tupshin Harper (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249114#comment-14249114
 ] 

Tupshin Harper commented on CASSANDRA-8494:
---

bq. I think the improved feedback will make a huge difference for people 
wondering if bootstrap is working!
So much this. Would make the ticket worthwhile by itself.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249115#comment-14249115
 ] 

Blake Eggleston commented on CASSANDRA-8494:


+1
There were a few times at my last job where we had to quickly add a few nodes 
because of unforeseen problems. Having to wait 1-2 days to be able to use them 
would have been a problem. Even at the smaller scale we were operating at, 
having the extra capacity available sooner would have been helpful.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: density
> Fix For: 3.0
>
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249112#comment-14249112
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

I think the improved feedback will make a huge difference for people wondering 
if bootstrap is working!

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>  Labels: density
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Albert P Tobey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249103#comment-14249103
 ] 

Albert P Tobey commented on CASSANDRA-8494:
---

Neat idea. I think this would make a lot of sense to operators and provide 
visibility into the rebuild process that's easy to understand (how many tokens 
are complete?).

Many of the customers I've talked to in the last few months will be very 
excited about this. In one case, they want to attach ~70TB of very fast SSD. I 
explained everything to them, they're still going to try.

Another client has more than 100 remote sites that store time-series data. They 
want to store 10-15TB per node on 15K SAS RAID10. It's the gear they can get 
and they have limited ability to control power drops etc. in the remote sites, 
so density is really important to them.

My former employer was trying to run 8 x 3TB SATA. No matter how hard we fought 
for the right drives, the incentives from the HW vendors etc. drove them to buy 
the big SATA drives.

I think ops folks will like this and there's an opportunity to use this feature 
to improve the UX of bootstrap (by using token ranges to improve feedback to 
ops).

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>  Labels: density
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249099#comment-14249099
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

Can "retry partial bootstrap failure" be a separate ticket?

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>  Labels: density
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249082#comment-14249082
 ] 

Yuki Morishita commented on CASSANDRA-8494:
---

I think streaming part is not a big deal. More work is needed for announcing 
available token.

Also, we need to define behavior of bootstrap failure.
If one part of bootstrap failed, it is better to be able to re-bootstrap only 
the failed part.
To do this, we need to have rich bootstrap state management and new API? to 
resume bootstrapping.


> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>  Labels: density
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249046#comment-14249046
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

All right.  [~yukim] / [~brandon.williams], how feasible is this?

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>  Labels: density
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Ryan Svihla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249040#comment-14249040
 ] 

Ryan Svihla commented on CASSANDRA-8494:


I think there is an overlap between high density desire users and very slow 
almost glacial planning. By the time they've requisitioned the hardware, and 
gotten the nodes in place, their cluster may very well be far past overloaded. 
End of the day, this will help those that don't plan well the most.

I think it could be much better new user experience if we get this right.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249031#comment-14249031
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

Why is it a big deal if bootstrap takes a day or two given reasonable capacity 
planning?

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Ryan Svihla (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249029#comment-14249029
 ] 

Ryan Svihla commented on CASSANDRA-8494:


I'm mixed on the push for density, I get that people _really_ want it, and this 
would substantially help Cassandra in that space, but I'm also convinced just 
by physics the story for high density will always be worse than the story for a 
bunch of cheap low density nodes (IE total cost, not just data center space 
costs).

Regardless, i think even in the case of more say 1TB nodes, this would be an 
impressive boost to handling overloaded clusters, where load can be moved off 
struggling nodes more quickly and gracefully. What we struggle with today in 
the field is a people that don't monitor their clusters, and don't realize till 
they're going OOM that they're in trouble. For those folks we always struggle 
streaming in new nodes as quickly as possible. I think this could potentially 
really help with those more common than you'd think scenarios.

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249016#comment-14249016
 ] 

Jon Haddad commented on CASSANDRA-8494:
---

Well, it depends.  The issue is written on the assumption that we want to be 
able to increase node density, and that currently bootstrapping a 20TB node is 
problematic.  If we're not going to push node density, it might not be an 
issue, but I suspect sticking to "no more than 1TB per node" is going to fly 
less and less over time.  

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jon Haddad
>Priority: Minor
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8494) incremental bootstrap

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248999#comment-14248999
 ] 

Jonathan Ellis commented on CASSANDRA-8494:
---

Is this really a big problem in practice?  Yeah, it's aesthetically ugly, but 
is it worth rewriting and destabilizing bootstrap to address it?

> incremental bootstrap
> -
>
> Key: CASSANDRA-8494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8494
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Jon Haddad
>
> Current bootstrapping involves (to my knowledge) picking tokens and streaming 
> data before the node is available for requests.  This can be problematic with 
> "fat nodes", since it may require 20TB of data to be streamed over before the 
> machine can be useful.  This can result in a massive window of time before 
> the machine can do anything useful.
> As a potential approach to mitigate the huge window of time before a node is 
> available, I suggest modifying the bootstrap process to only acquire a single 
> initial token before being marked UP.  This would likely be a configuration 
> parameter "incremental_bootstrap" or something similar.
> After the node is bootstrapped with this one token, it could go into UP 
> state, and could then acquire additional tokens (one or a handful at a time), 
> which would be streamed over while the node is active and serving requests.  
> The benefit here is that with the default 256 tokens a node could become an 
> active part of the cluster with less than 1% of it's final data streamed over.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)