[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042198#comment-15042198
 ] 

Ariel Weisberg commented on CASSANDRA-10477:
--------------------------------------------

bq. Why isn't the added assertion in WriteCallbackInfo on 3.0 not using 
!shouldHint lie in the 2.1 patch?
This turns out to be because shouldHint() has additional stuff that the 
assertion doesn't want. The assertion doesn't care if hints are disabled along 
with several of the other things that are added.

I think I managed to shuffle everything correctly. Going to let the tests run 
and prognosticate on how I want to test OE. I grepped the dtests and unit tests 
for OverloadedException and didn't get a single hit!

|[2.1 
code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]|
|[3.0 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]|
|[Trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]|

> java.lang.AssertionError in StorageProxy.submitHint
> ---------------------------------------------------
>
>                 Key: CASSANDRA-10477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local Write-Read Paths
>         Environment: CentOS 6, Oracle JVM 1.8.45
>            Reporter: Severin Leonhardt
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
>         at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to