[ 
https://issues.apache.org/jira/browse/CASSANDRA-15437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18077444#comment-18077444
 ] 

Matt Byrd commented on CASSANDRA-15437:
---------------------------------------

Ok I see that in both the original patch and linked PR we're reducing the 
amount of times we both log the problem and bubble up the exception. Sure I 
don't mind possibly avoiding an additional error log message/stack-trace, by 
not logging in an intermediate try catch. However that seems like a sort of 
minor side concern, as compared to "should the behavior upon not being able to 
deliver hints be to fail the decommission or not." Both your patches, change 
this correctness behavior, which needs its own justification (reasoning about 
correctness/tradeoffs etc).

Having that be configurable seems like a good compromise possibly, in addition 
to just being able to configure avoiding delivery entirely (which will be 
available again after CASSANDRA-21341)

Avoiding delivery entirely when there is nothing to deliver via skipping when:
if (!catalog.hasFiles())
seems to solve the original problem statement of the ticket.

I think there is some sense in allowing hint delivery failure to not block 
decommission, but I'd lean towards that being configurable (defaulting to 
fail-hard), similar to the configuration for 
  avoiding hint transfer entirely (CASSANDRA-21341).

> Decommission fails with "Unable to stream hints since no live endpoints seen" 
> even if no hints need to be sent
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15437
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15437
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: YCozy
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Dear Cassandra developers, I was applying fault-injection to test Cassandra 
> and noticed the following behavior. I think this may be a bug. Please let me 
> know if I'm missing something.
>  
> Step to reproduce:
>  # Start a two node cluster (node1 & node2) using {{ccm}}.
>  # Add another node to the cluster (node3).
>  # Partition node3 from the other two nodes.
>  # Try to decommission node3 using {{nodetool decommission}}.
>  # Notice that the decommission failed with the following error log:
>  
> {code:java}
> ERROR [RMI TCP Connection(4)-127.0.0.1] 2019-11-25 22:45:27,716 
> StorageService.java:4198 - Error while decommissioning node 
>  java.lang.RuntimeException: Unable to stream hints since no live endpoints 
> seen
>   at 
> org.apache.cassandra.service.StorageService.getPreferredHintsStreamTarget(StorageService.java:4281)
>   at 
> org.apache.cassandra.hints.HintsDispatchExecutor$TransferHintsTask.run(HintsDispatchExecutor.java:156)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748){code}
>  
> Since I didn't write any data, there is no hint to be sent. In this case, 
> shouldn't the decommission continue?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to