[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2019-08-05 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900234#comment-16900234
 ] 

Jeff Jirsa commented on CASSANDRA-8838:
---

For the record, this patch almost certainly causes us to violate 
consistency/correctness for the reasons discussed in the 4 comments above 
(missing writes while restarting / may not store hints / may not deliver hints 
before timeout), and that it's enabled by default and can't be disabled is 
really unfortunate for people who want to disable this feature. We need to do 
be more aware of new features and correctness in the future. 

 

 

> Resumable bootstrap streaming
> -
>
> Key: CASSANDRA-8838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Streaming and Messaging
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Low
>  Labels: dense-storage
> Fix For: 2.2.0 beta 1
>
>
> This allows the bootstrapping node not to be streamed already received data.
> The bootstrapping node records received keyspace/ranges as one stream session 
> completes. When some sessions with other nodes fail, bootstrapping fails 
> also, though next time it re-bootstraps, already received keyspace/ranges are 
> skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-11-11 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000728#comment-15000728
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

bq. in that hints will not be stored to the bootstrapping node after 
RING_DELAY, since it will evicted from the TMD pending ranges. Should we create 
a ticket to address this?

Let's discuss this in new ticket.
Though I implemented this so we enter the same state as "write survey mode" 
when bootstrap streaming failed, if above analysis from Paulo is correct, then 
"write survey mode" is also won't work as intended?

/cc [~brandon.williams]

> Resumable bootstrap streaming
> -
>
> Key: CASSANDRA-8838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 2.2.0 beta 1
>
>
> This allows the bootstrapping node not to be streamed already received data.
> The bootstrapping node records received keyspace/ranges as one stream session 
> completes. When some sessions with other nodes fail, bootstrapping fails 
> also, though next time it re-bootstraps, already received keyspace/ranges are 
> skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-11-11 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000741#comment-15000741
 ] 

Brandon Williams commented on CASSANDRA-8838:
-

Write survey mode works fine as long as the surveying node is alive... once 
it's gone, as Paulo noted, it will be removed after ring_delay because it was a 
fat client (all bootstrapping nodes are fat clients)

> Resumable bootstrap streaming
> -
>
> Key: CASSANDRA-8838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 2.2.0 beta 1
>
>
> This allows the bootstrapping node not to be streamed already received data.
> The bootstrapping node records received keyspace/ranges as one stream session 
> completes. When some sessions with other nodes fail, bootstrapping fails 
> also, though next time it re-bootstraps, already received keyspace/ranges are 
> skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-11-11 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000771#comment-15000771
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

So the problem happens when bootstrapping node goes down for some reason, and 
keep on down for more than RING_DELAY.
In that case, yes, hints are not stored because the node is evicted from TMD.

We have flag to do normal bootstrap for that case 
(-Dcassandra.reset_bootstrap_progress).

> Resumable bootstrap streaming
> -
>
> Key: CASSANDRA-8838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 2.2.0 beta 1
>
>
> This allows the bootstrapping node not to be streamed already received data.
> The bootstrapping node records received keyspace/ranges as one stream session 
> completes. When some sessions with other nodes fail, bootstrapping fails 
> also, though next time it re-bootstraps, already received keyspace/ranges are 
> skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-11-09 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997442#comment-14997442
 ] 

Paulo Motta commented on CASSANDRA-8838:


[~yukim] does this feature support resuming bootstrap after the bootstrapping 
node goes down? If so, I think [~brandon.williams] concern is valid, in that 
hints will not be stored to the bootstrapping node after RING_DELAY, since it 
will evicted from the TMD pending ranges. Should we create a ticket to address 
this?

> Resumable bootstrap streaming
> -
>
> Key: CASSANDRA-8838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Minor
>  Labels: dense-storage
> Fix For: 2.2.0 beta 1
>
>
> This allows the bootstrapping node not to be streamed already received data.
> The bootstrapping node records received keyspace/ranges as one stream session 
> completes. When some sessions with other nodes fail, bootstrapping fails 
> also, though next time it re-bootstraps, already received keyspace/ranges are 
> skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-18 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367511#comment-14367511
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

Committed, thanks for review!

I also made pull request for cassandra-dtest here 
(https://github.com/riptano/cassandra-dtest/pull/199).

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-17 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364788#comment-14364788
 ] 

Sylvain Lebresne commented on CASSANDRA-8838:
-

bq. The first issue is that starting a node with {{no_wait=True}} fails to 
detect the process pid and the test stops there, this applies to all four tests.

I've noticed that too and this is not specific to this issue: there is quite a 
few dtests that uses this and are currently failing on trunk on my box for the 
reason you mention. It appears that on current trunk, on my box, the Cassandra 
process takes  10 seconds to even get created (which is definitively something 
relatively new). I haven't investigated what made that happen so it could be 
nice to bisect it so we at least know, but as you said, removing the 
{{no_wait}} flag at least fixes the tests.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-17 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366406#comment-14366406
 ] 

Stefania commented on CASSANDRA-8838:
-

+1 on everything now, tests look great and pass without problems on my box.

I cannot commit myself so I did not resolve the ticket just yet but it can be 
resolved and committed at any time.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-17 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365869#comment-14365869
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

Alright, I force pushed updates to dtests. This version is at least reliable on 
my machine.

https://github.com/yukim/cassandra-dtest/tree/CASSANDRA-8838

bq. With no_wait=True it waits for 2 seconds

That's not what I expected, I removed them all.

bq. I encountered another issue, in that the streaming completes before we kill 
node 1

{{node#start()}} sometimes goes too far before we want to kill the node. This 
version spawns a thread to watch log and kill node 1 so it reacts as expected.

I also fixed my new tests in {{replace_address_test.py}} that does not properly 
replacing.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-17 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365081#comment-14365081
 ] 

Stefania commented on CASSANDRA-8838:
-

On my box it takes just under 4 seconds with both cassandra-2.1 and trunk. With 
cassandra-2.0 however, it takes just over 2 seconds. I'll try to spend more 
time investigating this.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-16 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364459#comment-14364459
 ] 

Stefania commented on CASSANDRA-8838:
-

I'm sorry but all four tests have problems on my machine:
\\
\\
* The first issue is that starting a node with {{no_wait=True}} fails to detect 
the process pid and the test stops there, this applies to all four tests. Is 
there any reason for using this since then we are waiting to grep from the logs 
{{\[STREAM-IN-/127.0.0.1\].* Prepare completed}} anyway? With {{no_wait=True}} 
it waits for 2 seconds, I guess we could increase it but it would be best to 
remove that flag if possible.

* So I went ahead and tried to run {{resumable_bootstrap_test}} without 
{{no_wait=True}} and I encountered another issue, in that the streaming 
completes before we kill node 1. So the test fails when starting node 3 the 
second time, since it is already running. I increased the number of stress 
entries from 100,000 to 1M and the streaming took as long as 6 seconds, see 
below, yet node 1 was not killed until after the session completed, again logs 
below. I think the problem could be in {{watch_log_for}} but I did not look 
further.
\\
{code}
NODE 3

INFO  [STREAM-IN-/127.0.0.1] 2015-03-17 10:21:54,611 
StreamResultFuture.java:167 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d 
ID#0] Prepare completed. Receiving 3 files(95252818 bytes), sending 0 files(0 
bytes)
INFO  [STREAM-IN-/127.0.0.2] 2015-03-17 10:21:54,899 
StreamResultFuture.java:167 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d 
ID#0] Prepare completed. Receiving 5 files(87661884 bytes), sending 0 files(0 
bytes)
INFO  [StreamReceiveTask:2] 2015-03-17 10:22:00,382 StreamResultFuture.java:181 
- [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] Session with /127.0.0.1 is 
complete
INFO  [StreamReceiveTask:1] 2015-03-17 10:22:00,495 StreamResultFuture.java:181 
- [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] Session with /127.0.0.2 is 
complete
{code}
\\
{code}
NODE 1

INFO  [STREAM-IN-/127.0.0.3] 2015-03-17 10:22:00,376 
StreamResultFuture.java:181 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] 
Session with /127.0.0.3 is complete
INFO  [STREAM-IN-/127.0.0.3] 2015-03-17 10:22:00,377 
StreamResultFuture.java:213 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] 
All sessions completed
INFO  [main] 2015-03-17 10:22:13,857 YamlConfigurationLoader.java:92 - Loading 
settings from file:/tmp/dtest-8beQKM/test/node1/conf/cassandra.yaml
INFO  [main] 2015-03-17 10:22:13,929 YamlConfigurationLoader.java:135 - Node 
configuration:[authenticator=AllowAllAuthenticator; 
authorizer=AllowAllAuthorizer; auto_bootstrap=false
{code}

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-16 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364480#comment-14364480
 ] 

Stefania commented on CASSANDRA-8838:
-

Might be useful to compare the ccm last commit, mine is 
{{a9ea7f0c15866f80b2cf2a8144ef701d43030459}}.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-16 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364109#comment-14364109
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

dtests added for replace_address_test and bootstrap_test here: 
https://github.com/yukim/cassandra-dtest/tree/CASSANDRA-8838

Also I updated my [branch|https://github.com/yukim/cassandra/commits/8838-2] 
with the commit that removes check to potentially prevent re-bootstrapping the 
failed node.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-15 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362635#comment-14362635
 ] 

Stefania commented on CASSANDRA-8838:
-

The code looks good.

{{replace_address_test.py}} and {{bootstrap_test.py}} ran successfully on my 
local machine as well.

Once we have a couple of dtests about the feature we should be good.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-13 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360691#comment-14360691
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

You are right, I need reset code for replacing also. Updated branch with the 
fix. This time I put reset check right before starting bootstrap.

I tested manually with ccm and manual intervention.
I also ran dtest's {{replace_address_test.py}} and {{bootstrap_test.py}}, and 
both ran successfully in my local machine.

I think I can add some dtests about the feature, similar to what I've done in 
CASSANDRA-8942.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-12 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359941#comment-14359941
 ] 

Stefania commented on CASSANDRA-8838:
-

Only one comment on the code: 
# we are not looking at the new flag when replacing why is that?

Let's focus on the tests:

# Do you think we should add a unit test for {{resetAvailableRanges()}} since 
this code path is the only code path that was added to {{SystemKeyspace}} and 
that is not covered by {{StreamStoreStateTest}}?
# Not sure how easy it would be, we'd have to rework {{RangeStreamer}} a bit 
probably, but we could add a unit test to {{BootStrapperTest}} that checks that 
the {{RangeStreamer}} is not requesting available ranges (since this would be 
hard to check in dtest). Up to you if you want to do this or not.
# The dtest {{bootstrap_test.py}} is currently failing (not related to your 
patch as it is failing on trunk as well). See the failures I got below. 
## It would be good to fix these failures (not necessarily yourself but someone 
in QA). 
## I would also be beneficial to add a couple more dtests to simulate a 
bootstrap failure and test the resume with and without 
{{cassandra.reset_bootstrap_progress}}. Not sure how easy it would be to 
simulate the failure, perhaps you'd have to add a cassandra.test flag.
## Perhaps we should have more tests for replacing a node (didn't see any in 
dtests) with and without a failed bootstrap.

What do you think is this reasonable for testing or did you do more manual 
tests? If you tested manually with ccm and nodetool perhaps we should simply 
list the tests here and let QA add them to dtest or the new tool for functional 
tests that Ariel is working on.

Dtest failure on trunk:
{code}
==
FAIL: simple_bootstrap_test (bootstrap_test.TestBootstrap)
--
Traceback (most recent call last):
  File /home/stefania/git/cstar/cassandra-dtest/bootstrap_test.py, line 66, 
in simple_bootstrap_test
reader.check()
  File /home/stefania/git/cstar/cassandra-dtest/dtest.py, line 115, in check
raise self.__error
AssertionError: 
  begin captured logging  
dtest: DEBUG: cluster ccm directory: /tmp/dtest-510qhu
dtest: DEBUG: connecting...
dtest: DEBUG: reading...
-  end captured logging  -
{code}

Dtest failure on 8838-2:
{code}
==
ERROR: simple_bootstrap_test (bootstrap_test.TestBootstrap)
--
Traceback (most recent call last):
  File /home/stefania/git/cstar/cassandra-dtest/bootstrap_test.py, line 74, 
in simple_bootstrap_test
assert_almost_equal(size1, size2, error=0.3)
  File /home/stefania/git/cstar/cassandra-dtest/assertions.py, line 56, in 
assert_almost_equal
assert vmin  vmax * (1.0 - error) or vmin == vmax, values not within 
%.2f%% of the max: %s % (error * 100, args)
TypeError: unsupported operand type(s) for *: 'Decimal' and 'float'
  begin captured logging  
dtest: DEBUG: cluster ccm directory: /tmp/dtest-4sMR0c
dtest: DEBUG: connecting...
dtest: DEBUG: reading...
dtest: DEBUG: initial_size: 482.41
dtest: DEBUG: size1: 305.65
dtest: DEBUG: size2: 287.5
-  end captured logging  -
{code}

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-12 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359347#comment-14359347
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

[~Stefania] I pushed rebased version with option to reset available ranges: 
https://github.com/yukim/cassandra/commits/8838-2

User can pass {{-Dcassandra.reset_bootstrap_progress=true}} to reset available 
ranges when booting the node.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-10 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1439#comment-1439
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

Thanks for the review!

bq. StreamStateStore.isDataAvailable() doesn't seem to be used

Sorry, I plan to use this in CASSANDRA-8943, but included in this release for 
unit test.

bq. In SystemKeyspace.getAvailableRanges() why do we need to copy the result 
into an ImmutableSet, just to make sure the caller cannot modify the result or 
is there more to it I should understand?

Just wanted to make sure it is not modifiable outside.

bq. Do we ever reset the available ranges? If not, is this not going to cause 
issues if the node is down for a very long time, like a few days or do we just 
rely on deleting the whole node data in these cases?

Good point. We need to add that feature for sure. Let me work on adding config 
entry to skip this functionality.


 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-10 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355956#comment-14355956
 ] 

Stefania commented on CASSANDRA-8838:
-

Sure, let us know when the config entry is ready but otherwise LGTM!

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-03-05 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349709#comment-14349709
 ] 

Stefania commented on CASSANDRA-8838:
-

Hi Yuki, really nice code, here are my comments (mostly questions for my own 
benefit really):
\\
\\
* {{StreamStateStore.isDataAvailable()}} doesn't seem to be used

* In {{SystemKeyspace.getAvailableRanges()}} why do we need to copy the result 
into an {{ImmutableSet}}, just to make sure the caller cannot modify the result 
or is there more to it I should understand?

* Do we ever reset the available ranges? If not, is this not going to cause 
issues if the node is down for a very long time, like a few days or do we just 
rely on deleting the whole node data in these cases?

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
  Labels: dense-storage
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-23 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334212#comment-14334212
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

Updated branch with new commit because I found one mistake: 
https://github.com/yukim/cassandra/tree/8838

IIRC hints are already stored even when bootstrapping node goes down and down 
time is within max hint window.
In my opinion, bootstrap streaming failure happens more when streaming source 
goes down, rather than bootstrapping node goes down.
When that happens, whole bootstrap process fails in current versions.

I'm working on to prevent whole process failure and instead going to the same 
state as 'write survey mode' and nodetool that resume failed streaming from 
that state.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-20 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329104#comment-14329104
 ] 

Brandon Williams commented on CASSANDRA-8838:
-

Worth noting though that even if we resume streaming, the node won't be fully 
complete at the end, since it will have missed any writes for it while it was 
down. Perhaps people with large nodes may not care though depending on their CL.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329250#comment-14329250
 ] 

Jonathan Ellis commented on CASSANDRA-8838:
---

Do we not hint for bootstrapping nodes?  I imagine that should be an easy fix.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-20 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329323#comment-14329323
 ] 

Brandon Williams commented on CASSANDRA-8838:
-

Well, then the question becomes for how long?  If they try again reasonably 
soon, the fat client timeout gives them RING_DELAY, which I guess there's no 
harm in tweaking as long as you know the consequence.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-20 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329459#comment-14329459
 ] 

T Jake Luciani commented on CASSANDRA-8838:
---

This is a major win!  Since failed bootstraps on dense nodes can take many 
hours.  If anything goes wrong you need to wipe and restart.



 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming

2015-02-19 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328510#comment-14328510
 ] 

Yuki Morishita commented on CASSANDRA-8838:
---

Also here: https://github.com/yukim/cassandra/tree/8838

Still, bootstrap will fail and node stops when streaming failed. Next step 
would be to bring up the node even if streaming failed and provide tool to 
resume failed streaming.

 Resumable bootstrap streaming
 -

 Key: CASSANDRA-8838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838
 Project: Cassandra
  Issue Type: Sub-task
Reporter: Yuki Morishita
Assignee: Yuki Morishita
Priority: Minor
 Fix For: 3.0

 Attachments: 0001-Resumable-bootstrap-streaming.patch


 This allows the bootstrapping node not to be streamed already received data.
 The bootstrapping node records received keyspace/ranges as one stream session 
 completes. When some sessions with other nodes fail, bootstrapping fails 
 also, though next time it re-bootstraps, already received keyspace/ranges are 
 skipped to be streamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)