[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16900234#comment-16900234 ] Jeff Jirsa commented on CASSANDRA-8838: --- For the record, this patch almost certainly causes us to violate consistency/correctness for the reasons discussed in the 4 comments above (missing writes while restarting / may not store hints / may not deliver hints before timeout), and that it's enabled by default and can't be disabled is really unfortunate for people who want to disable this feature. We need to do be more aware of new features and correctness in the future. > Resumable bootstrap streaming > - > > Key: CASSANDRA-8838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Streaming and Messaging >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Low > Labels: dense-storage > Fix For: 2.2.0 beta 1 > > > This allows the bootstrapping node not to be streamed already received data. > The bootstrapping node records received keyspace/ranges as one stream session > completes. When some sessions with other nodes fail, bootstrapping fails > also, though next time it re-bootstraps, already received keyspace/ranges are > skipped to be streamed. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000728#comment-15000728 ] Yuki Morishita commented on CASSANDRA-8838: --- bq. in that hints will not be stored to the bootstrapping node after RING_DELAY, since it will evicted from the TMD pending ranges. Should we create a ticket to address this? Let's discuss this in new ticket. Though I implemented this so we enter the same state as "write survey mode" when bootstrap streaming failed, if above analysis from Paulo is correct, then "write survey mode" is also won't work as intended? /cc [~brandon.williams] > Resumable bootstrap streaming > - > > Key: CASSANDRA-8838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 > Project: Cassandra > Issue Type: Sub-task >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Minor > Labels: dense-storage > Fix For: 2.2.0 beta 1 > > > This allows the bootstrapping node not to be streamed already received data. > The bootstrapping node records received keyspace/ranges as one stream session > completes. When some sessions with other nodes fail, bootstrapping fails > also, though next time it re-bootstraps, already received keyspace/ranges are > skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000741#comment-15000741 ] Brandon Williams commented on CASSANDRA-8838: - Write survey mode works fine as long as the surveying node is alive... once it's gone, as Paulo noted, it will be removed after ring_delay because it was a fat client (all bootstrapping nodes are fat clients) > Resumable bootstrap streaming > - > > Key: CASSANDRA-8838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 > Project: Cassandra > Issue Type: Sub-task >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Minor > Labels: dense-storage > Fix For: 2.2.0 beta 1 > > > This allows the bootstrapping node not to be streamed already received data. > The bootstrapping node records received keyspace/ranges as one stream session > completes. When some sessions with other nodes fail, bootstrapping fails > also, though next time it re-bootstraps, already received keyspace/ranges are > skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000771#comment-15000771 ] Yuki Morishita commented on CASSANDRA-8838: --- So the problem happens when bootstrapping node goes down for some reason, and keep on down for more than RING_DELAY. In that case, yes, hints are not stored because the node is evicted from TMD. We have flag to do normal bootstrap for that case (-Dcassandra.reset_bootstrap_progress). > Resumable bootstrap streaming > - > > Key: CASSANDRA-8838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 > Project: Cassandra > Issue Type: Sub-task >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Minor > Labels: dense-storage > Fix For: 2.2.0 beta 1 > > > This allows the bootstrapping node not to be streamed already received data. > The bootstrapping node records received keyspace/ranges as one stream session > completes. When some sessions with other nodes fail, bootstrapping fails > also, though next time it re-bootstraps, already received keyspace/ranges are > skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997442#comment-14997442 ] Paulo Motta commented on CASSANDRA-8838: [~yukim] does this feature support resuming bootstrap after the bootstrapping node goes down? If so, I think [~brandon.williams] concern is valid, in that hints will not be stored to the bootstrapping node after RING_DELAY, since it will evicted from the TMD pending ranges. Should we create a ticket to address this? > Resumable bootstrap streaming > - > > Key: CASSANDRA-8838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 > Project: Cassandra > Issue Type: Sub-task >Reporter: Yuki Morishita >Assignee: Yuki Morishita >Priority: Minor > Labels: dense-storage > Fix For: 2.2.0 beta 1 > > > This allows the bootstrapping node not to be streamed already received data. > The bootstrapping node records received keyspace/ranges as one stream session > completes. When some sessions with other nodes fail, bootstrapping fails > also, though next time it re-bootstraps, already received keyspace/ranges are > skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367511#comment-14367511 ] Yuki Morishita commented on CASSANDRA-8838: --- Committed, thanks for review! I also made pull request for cassandra-dtest here (https://github.com/riptano/cassandra-dtest/pull/199). Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364788#comment-14364788 ] Sylvain Lebresne commented on CASSANDRA-8838: - bq. The first issue is that starting a node with {{no_wait=True}} fails to detect the process pid and the test stops there, this applies to all four tests. I've noticed that too and this is not specific to this issue: there is quite a few dtests that uses this and are currently failing on trunk on my box for the reason you mention. It appears that on current trunk, on my box, the Cassandra process takes 10 seconds to even get created (which is definitively something relatively new). I haven't investigated what made that happen so it could be nice to bisect it so we at least know, but as you said, removing the {{no_wait}} flag at least fixes the tests. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366406#comment-14366406 ] Stefania commented on CASSANDRA-8838: - +1 on everything now, tests look great and pass without problems on my box. I cannot commit myself so I did not resolve the ticket just yet but it can be resolved and committed at any time. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365869#comment-14365869 ] Yuki Morishita commented on CASSANDRA-8838: --- Alright, I force pushed updates to dtests. This version is at least reliable on my machine. https://github.com/yukim/cassandra-dtest/tree/CASSANDRA-8838 bq. With no_wait=True it waits for 2 seconds That's not what I expected, I removed them all. bq. I encountered another issue, in that the streaming completes before we kill node 1 {{node#start()}} sometimes goes too far before we want to kill the node. This version spawns a thread to watch log and kill node 1 so it reacts as expected. I also fixed my new tests in {{replace_address_test.py}} that does not properly replacing. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365081#comment-14365081 ] Stefania commented on CASSANDRA-8838: - On my box it takes just under 4 seconds with both cassandra-2.1 and trunk. With cassandra-2.0 however, it takes just over 2 seconds. I'll try to spend more time investigating this. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364459#comment-14364459 ] Stefania commented on CASSANDRA-8838: - I'm sorry but all four tests have problems on my machine: \\ \\ * The first issue is that starting a node with {{no_wait=True}} fails to detect the process pid and the test stops there, this applies to all four tests. Is there any reason for using this since then we are waiting to grep from the logs {{\[STREAM-IN-/127.0.0.1\].* Prepare completed}} anyway? With {{no_wait=True}} it waits for 2 seconds, I guess we could increase it but it would be best to remove that flag if possible. * So I went ahead and tried to run {{resumable_bootstrap_test}} without {{no_wait=True}} and I encountered another issue, in that the streaming completes before we kill node 1. So the test fails when starting node 3 the second time, since it is already running. I increased the number of stress entries from 100,000 to 1M and the streaming took as long as 6 seconds, see below, yet node 1 was not killed until after the session completed, again logs below. I think the problem could be in {{watch_log_for}} but I did not look further. \\ {code} NODE 3 INFO [STREAM-IN-/127.0.0.1] 2015-03-17 10:21:54,611 StreamResultFuture.java:167 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d ID#0] Prepare completed. Receiving 3 files(95252818 bytes), sending 0 files(0 bytes) INFO [STREAM-IN-/127.0.0.2] 2015-03-17 10:21:54,899 StreamResultFuture.java:167 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d ID#0] Prepare completed. Receiving 5 files(87661884 bytes), sending 0 files(0 bytes) INFO [StreamReceiveTask:2] 2015-03-17 10:22:00,382 StreamResultFuture.java:181 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] Session with /127.0.0.1 is complete INFO [StreamReceiveTask:1] 2015-03-17 10:22:00,495 StreamResultFuture.java:181 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] Session with /127.0.0.2 is complete {code} \\ {code} NODE 1 INFO [STREAM-IN-/127.0.0.3] 2015-03-17 10:22:00,376 StreamResultFuture.java:181 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] Session with /127.0.0.3 is complete INFO [STREAM-IN-/127.0.0.3] 2015-03-17 10:22:00,377 StreamResultFuture.java:213 - [Stream #60c0daf0-cc4c-11e4-95db-8df1f8b3509d] All sessions completed INFO [main] 2015-03-17 10:22:13,857 YamlConfigurationLoader.java:92 - Loading settings from file:/tmp/dtest-8beQKM/test/node1/conf/cassandra.yaml INFO [main] 2015-03-17 10:22:13,929 YamlConfigurationLoader.java:135 - Node configuration:[authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; auto_bootstrap=false {code} Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364480#comment-14364480 ] Stefania commented on CASSANDRA-8838: - Might be useful to compare the ccm last commit, mine is {{a9ea7f0c15866f80b2cf2a8144ef701d43030459}}. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364109#comment-14364109 ] Yuki Morishita commented on CASSANDRA-8838: --- dtests added for replace_address_test and bootstrap_test here: https://github.com/yukim/cassandra-dtest/tree/CASSANDRA-8838 Also I updated my [branch|https://github.com/yukim/cassandra/commits/8838-2] with the commit that removes check to potentially prevent re-bootstrapping the failed node. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362635#comment-14362635 ] Stefania commented on CASSANDRA-8838: - The code looks good. {{replace_address_test.py}} and {{bootstrap_test.py}} ran successfully on my local machine as well. Once we have a couple of dtests about the feature we should be good. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360691#comment-14360691 ] Yuki Morishita commented on CASSANDRA-8838: --- You are right, I need reset code for replacing also. Updated branch with the fix. This time I put reset check right before starting bootstrap. I tested manually with ccm and manual intervention. I also ran dtest's {{replace_address_test.py}} and {{bootstrap_test.py}}, and both ran successfully in my local machine. I think I can add some dtests about the feature, similar to what I've done in CASSANDRA-8942. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359941#comment-14359941 ] Stefania commented on CASSANDRA-8838: - Only one comment on the code: # we are not looking at the new flag when replacing why is that? Let's focus on the tests: # Do you think we should add a unit test for {{resetAvailableRanges()}} since this code path is the only code path that was added to {{SystemKeyspace}} and that is not covered by {{StreamStoreStateTest}}? # Not sure how easy it would be, we'd have to rework {{RangeStreamer}} a bit probably, but we could add a unit test to {{BootStrapperTest}} that checks that the {{RangeStreamer}} is not requesting available ranges (since this would be hard to check in dtest). Up to you if you want to do this or not. # The dtest {{bootstrap_test.py}} is currently failing (not related to your patch as it is failing on trunk as well). See the failures I got below. ## It would be good to fix these failures (not necessarily yourself but someone in QA). ## I would also be beneficial to add a couple more dtests to simulate a bootstrap failure and test the resume with and without {{cassandra.reset_bootstrap_progress}}. Not sure how easy it would be to simulate the failure, perhaps you'd have to add a cassandra.test flag. ## Perhaps we should have more tests for replacing a node (didn't see any in dtests) with and without a failed bootstrap. What do you think is this reasonable for testing or did you do more manual tests? If you tested manually with ccm and nodetool perhaps we should simply list the tests here and let QA add them to dtest or the new tool for functional tests that Ariel is working on. Dtest failure on trunk: {code} == FAIL: simple_bootstrap_test (bootstrap_test.TestBootstrap) -- Traceback (most recent call last): File /home/stefania/git/cstar/cassandra-dtest/bootstrap_test.py, line 66, in simple_bootstrap_test reader.check() File /home/stefania/git/cstar/cassandra-dtest/dtest.py, line 115, in check raise self.__error AssertionError: begin captured logging dtest: DEBUG: cluster ccm directory: /tmp/dtest-510qhu dtest: DEBUG: connecting... dtest: DEBUG: reading... - end captured logging - {code} Dtest failure on 8838-2: {code} == ERROR: simple_bootstrap_test (bootstrap_test.TestBootstrap) -- Traceback (most recent call last): File /home/stefania/git/cstar/cassandra-dtest/bootstrap_test.py, line 74, in simple_bootstrap_test assert_almost_equal(size1, size2, error=0.3) File /home/stefania/git/cstar/cassandra-dtest/assertions.py, line 56, in assert_almost_equal assert vmin vmax * (1.0 - error) or vmin == vmax, values not within %.2f%% of the max: %s % (error * 100, args) TypeError: unsupported operand type(s) for *: 'Decimal' and 'float' begin captured logging dtest: DEBUG: cluster ccm directory: /tmp/dtest-4sMR0c dtest: DEBUG: connecting... dtest: DEBUG: reading... dtest: DEBUG: initial_size: 482.41 dtest: DEBUG: size1: 305.65 dtest: DEBUG: size2: 287.5 - end captured logging - {code} Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359347#comment-14359347 ] Yuki Morishita commented on CASSANDRA-8838: --- [~Stefania] I pushed rebased version with option to reset available ranges: https://github.com/yukim/cassandra/commits/8838-2 User can pass {{-Dcassandra.reset_bootstrap_progress=true}} to reset available ranges when booting the node. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1439#comment-1439 ] Yuki Morishita commented on CASSANDRA-8838: --- Thanks for the review! bq. StreamStateStore.isDataAvailable() doesn't seem to be used Sorry, I plan to use this in CASSANDRA-8943, but included in this release for unit test. bq. In SystemKeyspace.getAvailableRanges() why do we need to copy the result into an ImmutableSet, just to make sure the caller cannot modify the result or is there more to it I should understand? Just wanted to make sure it is not modifiable outside. bq. Do we ever reset the available ranges? If not, is this not going to cause issues if the node is down for a very long time, like a few days or do we just rely on deleting the whole node data in these cases? Good point. We need to add that feature for sure. Let me work on adding config entry to skip this functionality. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355956#comment-14355956 ] Stefania commented on CASSANDRA-8838: - Sure, let us know when the config entry is ready but otherwise LGTM! Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349709#comment-14349709 ] Stefania commented on CASSANDRA-8838: - Hi Yuki, really nice code, here are my comments (mostly questions for my own benefit really): \\ \\ * {{StreamStateStore.isDataAvailable()}} doesn't seem to be used * In {{SystemKeyspace.getAvailableRanges()}} why do we need to copy the result into an {{ImmutableSet}}, just to make sure the caller cannot modify the result or is there more to it I should understand? * Do we ever reset the available ranges? If not, is this not going to cause issues if the node is down for a very long time, like a few days or do we just rely on deleting the whole node data in these cases? Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Labels: dense-storage Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334212#comment-14334212 ] Yuki Morishita commented on CASSANDRA-8838: --- Updated branch with new commit because I found one mistake: https://github.com/yukim/cassandra/tree/8838 IIRC hints are already stored even when bootstrapping node goes down and down time is within max hint window. In my opinion, bootstrap streaming failure happens more when streaming source goes down, rather than bootstrapping node goes down. When that happens, whole bootstrap process fails in current versions. I'm working on to prevent whole process failure and instead going to the same state as 'write survey mode' and nodetool that resume failed streaming from that state. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Fix For: 3.0 This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329104#comment-14329104 ] Brandon Williams commented on CASSANDRA-8838: - Worth noting though that even if we resume streaming, the node won't be fully complete at the end, since it will have missed any writes for it while it was down. Perhaps people with large nodes may not care though depending on their CL. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Fix For: 3.0 Attachments: 0001-Resumable-bootstrap-streaming.patch This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329250#comment-14329250 ] Jonathan Ellis commented on CASSANDRA-8838: --- Do we not hint for bootstrapping nodes? I imagine that should be an easy fix. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Fix For: 3.0 Attachments: 0001-Resumable-bootstrap-streaming.patch This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329323#comment-14329323 ] Brandon Williams commented on CASSANDRA-8838: - Well, then the question becomes for how long? If they try again reasonably soon, the fat client timeout gives them RING_DELAY, which I guess there's no harm in tweaking as long as you know the consequence. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Fix For: 3.0 Attachments: 0001-Resumable-bootstrap-streaming.patch This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329459#comment-14329459 ] T Jake Luciani commented on CASSANDRA-8838: --- This is a major win! Since failed bootstraps on dense nodes can take many hours. If anything goes wrong you need to wipe and restart. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Fix For: 3.0 Attachments: 0001-Resumable-bootstrap-streaming.patch This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8838) Resumable bootstrap streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328510#comment-14328510 ] Yuki Morishita commented on CASSANDRA-8838: --- Also here: https://github.com/yukim/cassandra/tree/8838 Still, bootstrap will fail and node stops when streaming failed. Next step would be to bring up the node even if streaming failed and provide tool to resume failed streaming. Resumable bootstrap streaming - Key: CASSANDRA-8838 URL: https://issues.apache.org/jira/browse/CASSANDRA-8838 Project: Cassandra Issue Type: Sub-task Reporter: Yuki Morishita Assignee: Yuki Morishita Priority: Minor Fix For: 3.0 Attachments: 0001-Resumable-bootstrap-streaming.patch This allows the bootstrapping node not to be streamed already received data. The bootstrapping node records received keyspace/ranges as one stream session completes. When some sessions with other nodes fail, bootstrapping fails also, though next time it re-bootstraps, already received keyspace/ranges are skipped to be streamed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)