[
https://issues.apache.org/jira/browse/CASSANDRA-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077444#comment-18077444
]
Matt Byrd commented on CASSANDRA-15437:
---------------------------------------
Ok I see that in both the original patch and linked PR we're reducing the
amount of times we both log the problem and bubble up the exception. Sure I
don't mind possibly avoiding an additional error log message/stack-trace, by
not logging in an intermediate try catch. However that seems like a sort of
minor side concern, as compared to "should the behavior upon not being able to
deliver hints be to fail the decommission or not." Both your patches, change
this correctness behavior, which needs its own justification (reasoning about
correctness/tradeoffs etc).
Having that be configurable seems like a good compromise possibly, in addition
to just being able to configure avoiding delivery entirely (which will be
available again after CASSANDRA-21341)
Avoiding delivery entirely when there is nothing to deliver via skipping when:
if (!catalog.hasFiles())
seems to solve the original problem statement of the ticket.
I think there is some sense in allowing hint delivery failure to not block
decommission, but I'd lean towards that being configurable (defaulting to
fail-hard), similar to the configuration for
avoiding hint transfer entirely (CASSANDRA-21341).
> Decommission fails with "Unable to stream hints since no live endpoints seen"
> even if no hints need to be sent
> --------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15437
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15437
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: YCozy
> Assignee: Stefan Miklosovic
> Priority: Normal
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Dear Cassandra developers, I was applying fault-injection to test Cassandra
> and noticed the following behavior. I think this may be a bug. Please let me
> know if I'm missing something.
>
> Step to reproduce:
> # Start a two node cluster (node1 & node2) using {{ccm}}.
> # Add another node to the cluster (node3).
> # Partition node3 from the other two nodes.
> # Try to decommission node3 using {{nodetool decommission}}.
> # Notice that the decommission failed with the following error log:
>
> {code:java}
> ERROR [RMI TCP Connection(4)-127.0.0.1] 2019-11-25 22:45:27,716
> StorageService.java:4198 - Error while decommissioning node
> java.lang.RuntimeException: Unable to stream hints since no live endpoints
> seen
> at
> org.apache.cassandra.service.StorageService.getPreferredHintsStreamTarget(StorageService.java:4281)
> at
> org.apache.cassandra.hints.HintsDispatchExecutor$TransferHintsTask.run(HintsDispatchExecutor.java:156)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.lang.Thread.run(Thread.java:748){code}
>
> Since I didn't write any data, there is no hint to be sent. In this case,
> shouldn't the decommission continue?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]