[ 
https://issues.apache.org/jira/browse/CASSANDRA-14215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16356241#comment-16356241
 ] 

Kurt Greaves commented on CASSANDRA-14215:
------------------------------------------

My understanding was that we'd keep hints for the first N hours, and then we'd 
stop storing them. We'd always replay whatever hints are stored and delete them 
after replaying. 

Code in {{org.apache.cassandra.service.StorageProxy#shouldHint}} seems to imply 
the above
{code:java}
boolean hintWindowExpired = Gossiper.instance.getEndpointDowntime(ep) > 
DatabaseDescriptor.getMaxHintWindow();
if (hintWindowExpired)
{
    HintsService.instance.metrics.incrPastWindow(ep);
    Tracing.trace("Not hinting {} which has been down {} ms", ep, 
Gossiper.instance.getEndpointDowntime(ep));
}
return !hintWindowExpired;
{code}
This sounds like an issue to me, pretty sure we shouldn't be storing anything 
that's past the hint window.

 bq. Finally, check out the bit around the node down timer being reset on 
Cassandra restarts. This does not matter if we are ok with generating hints 
beyond the max hint window, but I really don't think that we should be...
This is also not really intended. As checks are currently only performed on how 
long the node has been down if it comes up after 3 hours then goes back down 
straight away we'll effectively store 6 hours of hints for the node. It's 
probably reasonable to only store a maximum of {{max_hint_window_in_ms}}. We 
might be able to get away with just looking at the timestamp of the earliest 
hint for the node and using that if it's prior to the current downtime.

I'll have a look at these in the next few days.

{quote}There is a deleteAllHintsForEndpoint JMX target that would let you purge 
hints (manually), but it does perhaps seem like a missing feature that we don't 
more aggressively clean up hints that are expired.
{quote}
[~VincentWhite] was looking at this recently while trying to solve a problem 
where hint replaying would get stuck and continuously replay a single hint if 
it happened to fail on the receiving side continuously (usually due to 
receiving node being overloaded and never acknowledging the hint in time). I 
recall that he also had problems because we have no service to clean up hints 
in this case. He's on leave this week but would probably have some input around 
this.

> Cassandra does not seem to be respecting max hint window
> --------------------------------------------------------
>
>                 Key: CASSANDRA-14215
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14215
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hints, Streaming and Messaging
>            Reporter: Arijit
>            Priority: Major
>
> On Cassandra 3.0.9, it was observed that Cassandra continues to write hints 
> even though a node remains down (and does not come up) for longer than the 
> default 3 hour window.
>  
> After doing "nodetool setlogginglevel org.apache.cassandra TRACE", we see the 
> following log line in cassandra (debug) logs:
>  StorageProxy.java:2625 - Adding hints for [/10.0.100.84]
>  
> One possible code path seems to be:
> cas -> commitPaxos(proposal, consistencyForCommit, true); -> submitHint (in 
> StorageProxy.java)
>  
> The "true" parameter above explicitly states that a hint should be recorded 
> and ignores the time window calculation performed by the shouldHint method 
> invoked in other code paths. Is there a reason for this behavior?
>  
> Edit: There are actually two stacks that seem to be producing hints, the 
> "cas" and "syncWriteBatchedMutations" methods. I have posted them below.
>  
> A third issue seems to be that Cassandra seems to reset the timer which 
> counts how long a node has been down after a restart. Thus if Cassandra is 
> restarted on a good node, it continues to accumulate hints for a down node 
> over the next three hours.
>  
> WARN [SharedPool-Worker-14] 2018-02-06 22:15:51,136 StorageProxy.java:2636 - 
> Adding hints for [/10.0.100.84] with stack trace: java.lang.Throwable: at 
> org.apache.cassandra.service.StorageProxy.stackTrace(StorageProxy.java:2608) 
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:2617) 
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:2603) 
> at 
> org.apache.cassandra.service.StorageProxy.commitPaxos(StorageProxy.java:540) 
> at org.apache.cassandra.service.StorageProxy.cas(StorageProxy.java:282) at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithCondition(ModificationStatement.java:432)
>  at 
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:407)
>  at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:206)
>  at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:237) 
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) 
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at 
> java.lang.Thread.run(Thread.java:748) WARN
>  
>  
> [SharedPool-Worker-8] 2018-02-06 22:15:51,153 StorageProxy.java:2636 - Adding 
> hints for [/10.0.100.84] with stack trace: java.lang.Throwable: at 
> org.apache.cassandra.service.StorageProxy.stackTrace(StorageProxy.java:2608) 
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:2617) 
> at 
> org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:1247)
>  at 
> org.apache.cassandra.service.StorageProxy.syncWriteBatchedMutations(StorageProxy.java:1014)
>  at 
> org.apache.cassandra.service.StorageProxy.mutateAtomically(StorageProxy.java:899)
>  at 
> org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:834)
>  at 
> org.apache.cassandra.cql3.statements.BatchStatement.executeWithoutConditions(BatchStatement.java:365)
>  at 
> org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:343)
>  at 
> org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:329)
>  at 
> org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:324)
>  at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:206)
>  at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:237) 
> at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) 
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) at 
> java.lang.Thread.run(Thread.java:748)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to