[ 
https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402388#comment-15402388
 ] 

Paulo Motta commented on CASSANDRA-12008:
-----------------------------------------

Thanks for the update. This is looking better and we're nearly done, see follow 
up below:
* Code
** Fix indentation of {{logger.debug("DECOMMISSIONING")}} 
** The {{isDecommissioning.get()}} should use a {{compareAndSet}} to avoid 
starting simultaneous decommision sessions. See the {{isRebuilding}} check. 
Also, add a test to verify it's not possible to start multiple decommission 
simultaneously based on the solution on CASSANDRA-11687 to avoid test flakiness.
** on {{SessionCompleteEvent}} use {{Collections.unmodifiableMap}} when copying 
the {{transferredRangesPerKeyspace}} map to avoid modifications to the ma
** In order to avoid allocating a {{HashSet}} when it's not necessary, change 
this {noformat}
            Set<Range<Token>> toBeUpdated = new HashSet<>();
            if (transferredRangesPerKeyspace.containsKey(keyspace))
            {
                toBeUpdated = transferredRangesPerKeyspace.get(keyspace);
            }
{noformat} with this: {noformat}
            Set<Range<Token>> toBeUpdated = 
transferredRangesPerKeyspace.get(keyspace)
            if (toBeUpdated == null)
            {
                toBeUpdated = new HashSet<>();
            }
{noformat}
** {{Error while decommissioning node}} is never printed  because the 
{{ExecutionException}} is being wrapped in a {{RuntimeException}} on 
{{unbootstrap}}, so perhaps you can modify {{unbootstrap}} to throw 
{{ExecutionException | InterruptedException}} and catch that on {{decomission}} 
to wrap in {{RuntimeException}}.

* dtests
** Simply running {{stress read}} will not fail if the keys are not there, you 
need to either compare the retrieved keys or check that there was no failure on 
the stress process (see {{bootstrap_test}} for examples).
** When verifying if the retrieved data is correct on 
{{resumable_decommission_test}}, you need to stop either node1 or node3 when 
querying the other otherwise the data may be in only one of these nodes (while 
it must be in both nodes, since RF=2 and N=2).
** Perhaps reduce the number of keys to 10k so the test will be faster.
** On {{resumable_decommission_test}} set 
{{stream_throughput_outbound_megabits_per_sec}} to {{1}} to the streaming will 
be slower and allow more time for interrupting.
** Perhaps it's better for {{InterruptDecommission}} to watch on {{rebuild from 
dc}} since this is print before {{"Executing streaming plan for Unbootstrap"}}
** Instead of counting for {{decommission_error}} you can add a 
{{self.fail("second rebuild should fail")}} after 
{{node2.nodetool('decommission')}} and on the {{except}} part perhaps check 
that the following message is being print on logs {{Error while decommissioning 
node}} - see new version of {{simple_rebuild_test}} from CASSANDRA-11687.
** bq. I found that streamed range skipping behaviour log check-up is not 
working
*** This is probably because the {{Range 
(-2556370087840976503,-2548250017122308073] already in /127.0.0.3, skipping}} 
message is only being print on {{debug.log}} so you should pass a 
{{filename='debug.log'}} to {{watch_log_for}}.

When you modify {{StreamStateStore}} to {{updateStreamedRanges}} for requested 
ranges (ie. bootstrap), there could be a collision between received and 
transferred ranges for the same peer. While this collision will not show up in 
decommission, bootstrap and rebuild, since we only transfer in one direction, 
this may be confusing and source of problems in the future, so in order to 
avoid creating another table to support that in the future, I think we can 
modify {{streamed_ranges}} to include an {{outgoing}} boolean primary key field 
indicating if it's an incoming or outgoing transfer. WDYT [~yukim] [~kdmu]?

> Make decommission operations resumable
> --------------------------------------
>
>                 Key: CASSANDRA-12008
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12008
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Tom van der Woerdt
>            Assignee: Kaide Mu
>            Priority: Minor
>
> We're dealing with large data sets (multiple terabytes per node) and 
> sometimes we need to add or remove nodes. These operations are very dependent 
> on the entire cluster being up, so while we're joining a new node (which 
> sometimes takes 6 hours or longer) a lot can go wrong and in a lot of cases 
> something does.
> It would be great if the ability to retry streams was implemented.
> Example to illustrate the problem :
> {code}
> 03:18 PM   ~ $ nodetool decommission
> error: Stream failed
> -- StackTrace --
> org.apache.cassandra.streaming.StreamException: Stream failed
>         at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>         at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
>         at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>         at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>         at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>         at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>         at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
>         at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
>         at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
>         at 
> org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622)
>         at 
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486)
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
>         at java.lang.Thread.run(Thread.java:745)
> 08:04 PM   ~ $ nodetool decommission
> nodetool: Unsupported operation: Node in LEAVING state; wait for status to 
> become normal or restart
> See 'nodetool help' or 'nodetool help <command>'.
> {code}
> Streaming failed, probably due to load :
> {code}
> ERROR [STREAM-IN-/<ipaddr>] 2016-06-14 18:05:47,275 StreamSession.java:520 - 
> [Stream #<streamid>] Streaming error occurred
> java.net.SocketTimeoutException: null
>         at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) 
> ~[na:1.8.0_77]
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) 
> ~[na:1.8.0_77]
>         at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) 
> ~[na:1.8.0_77]
>         at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> {code}
> If implementing retries is not possible, can we have a 'nodetool decommission 
> resume'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to