[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094149#comment-15094149 ] Ariel Weisberg commented on CASSANDRA-10477: Fixed 3.0 compilation issue. Added 2.2 branch and updated test matrix. |[2.1 code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]| |[2.2 code|https://github.com/apache/cassandra/compare/cassandra-2.2...aweisberg:CASSANDRA-10477-2.2]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-2.2-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-2.2-dtest/]| |[3.0 code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]| |[3.3 code|https://github.com/apache/cassandra/compare/cassandra-3.3...aweisberg:CASSANDRA-10477-3.3]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-dtest/]| |[Trunk code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]| > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094166#comment-15094166 ] Jacques-Henri Berthemet commented on CASSANDRA-10477: - Great, thank you. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093874#comment-15093874 ] Sylvain Lebresne commented on CASSANDRA-10477: -- [~aweisberg] I think you have a bad merge on 3.0 (though strangely the 3.3 and trunk branches seem fine), the test run failed at compilation time. bq. Will it be fixed in 2.2.x too? It will. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093552#comment-15093552 ] Jacques-Henri Berthemet commented on CASSANDRA-10477: - Will it be fixed in 2.2.x too? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092336#comment-15092336 ] Ariel Weisberg commented on CASSANDRA-10477: Rebased, updated commit message, updated test matrix, started tests. |[2.1 code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]| |[3.0 code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]| |[3.3 code|https://github.com/apache/cassandra/compare/cassandra-3.3...aweisberg:CASSANDRA-10477-3.3]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-dtest/]| |[Trunk code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]| > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092221#comment-15092221 ] Ariel Weisberg commented on CASSANDRA-10477: I agree the assertion should just be on the address. Already made the change back in December just need to get the tests done. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092200#comment-15092200 ] Sylvain Lebresne commented on CASSANDRA-10477: -- Can you also answer my comments on the assertion? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092170#comment-15092170 ] Ariel Weisberg commented on CASSANDRA-10477: The tests are passing, but enough time has passed that I should rebase and test again. Will do that today. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091802#comment-15091802 ] Sylvain Lebresne commented on CASSANDRA-10477: -- [~aweisberg] the ball is in your court I believe. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091779#comment-15091779 ] Jacques-Henri Berthemet commented on CASSANDRA-10477: - I'm having the same problem on 2.2.3: {code} 10:55:54.203 [ERROR] CassandraDaemon- Exception in thread Thread[EXPIRING-MAP-REAPER:1,5,UCS-Threads] java.lang.AssertionError: / at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:978) at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:399) at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:379) at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} What is the status on this issue? Can I expect this to be fixed in 2.2.x branch? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044588#comment-15044588 ] Sylvain Lebresne commented on CASSANDRA-10477: -- bq. The assertion doesn't care if hints are disabled along with several of the other things that are added. First, I still don't understand why it's not consistent between 2.1 and 3.0. As far as I can tell, the {{WriteCallbackInfo.shouldHint()}} mostly method calls {{StorageProxy.shouldHint()}} which does pretty much the same thing in both versions. Second, I'd argue the assertion _must_ use {{!shouldHint()}} because what we're trying to assert is that {{submitHint}} is never called for localhost on the expiration of a callback, and that depends on the result of {{shouldHint()}}. That said, I think it would almost be better to have the assertion just be {{!target.equals(FBUtilities.getBroadcastAddress())}} as we're basically saying a local write should always use the specific local path, not {{MessagingService}}. In any case, I think the assertion is worth a quick comment to explain why we're asserting that here. The rest of the changes lgtm, but the unit tests on 3.0 don't seem to have run due to some problem with an {{@Override}}. bq. and prognosticate on how I want to test OE The lack of coverage of OE is certainly something we should fix (it's not trivial though), but I would suggest not blocking that fix for that since it's not directly related (meaning, we should probably open a separate ticket for it). > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042198#comment-15042198 ] Ariel Weisberg commented on CASSANDRA-10477: bq. Why isn't the added assertion in WriteCallbackInfo on 3.0 not using !shouldHint lie in the 2.1 patch? This turns out to be because shouldHint() has additional stuff that the assertion doesn't want. The assertion doesn't care if hints are disabled along with several of the other things that are added. I think I managed to shuffle everything correctly. Going to let the tests run and prognosticate on how I want to test OE. I grepped the dtests and unit tests for OverloadedException and didn't get a single hit! |[2.1 code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]| |[3.0 code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]| |[Trunk code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]| > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041862#comment-15041862 ] Sylvain Lebresne commented on CASSANDRA-10477: -- bq. Which aspect of hint "overload" protection is missing? I see it increments a counter which I thought was the signal upstream. This is about whom is looking at said counter (to do something about it if it's too high). The normal write path is, and so incrementing the counter in CAS will potentially apply back-pressure on normal write, but not on CAS request themselves. bq. Looking at it further is it because it doesn't throw OverloadedException? So a better behavior would be to have the check and exception in a helper method and use that in commitPaxos() so that it can now throw OverloadedException? Exactly. bq. I do wonder what the unforeseen consequences of having CAS capable of throwing OE is going to do that we haven't seen or tested before. It's a good question, and to be honest I'm not sure we have any test that cover {{OverloadException}} at all (but I could be wrong). But in general, the commit part of Paxos is not very "sensible": worst case, if not enough replica get the commit, the next serial operation (including a read) on the partition will re-commit. So the main question is whether potentially throwing {{OverloadedException}} would surprise people. I would argue it shouldn't because normal writes can do so and we never specified it was any different for CAS. That said, if we're uncomfortable with it, I'm totally fine committing that part of the change only in 3.2 (aka trunk currently). bq. the read path now throws OE where it didn't before Right. That's probably more justification for keeping that part in 3.2 only. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041786#comment-15041786 ] Ariel Weisberg commented on CASSANDRA-10477: bq. We're kind of dodging the hint "overload" protection on the paxos path as we don't use sendToHintedEndpoints (which in particular makes the comment on commitPaxosLocal misleading as it suggests otherwise). I think the simplest solution is to move the overload test from sendToHintedEndpoints to some checkOverloaded() method and call that in commitPaxos too. Which aspect of hint "overload" protection is missing? [I see it increments a counter which I thought was the signal upstream.|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageProxy.java#L976] Looking at it further is it because it doesn't throw {{OverloadedException}}? So a better behavior would be to have the check and exception in a helper method and use that in commitPaxos() so that it can now throw {{OverloadedException}}? I do wonder what the unforeseen consequences of having {{CAS}} capable of throwing {{OE}} is going to do that we haven't seen or tested before. Where this gets interesting is that the read path now throws {{OE}} where it didn't before because apparently serial consistency reads can end up calling {{beginAndRepairPaxos}}. I need to take a close look at how we test this path to make sure it's going to behave well once exercised. bq. In theory, we could still run into the problem of that ticket if OPTIMIZE_LOCAL_REQUESTS is false. And in fact, I believe this option is unsafe since at least CASSANDRA-4753 as we somewhat strongly assume writes to the localhost do not go through MessagingService. So I would suggest ditching that option. Not only is it unsafe, but it's not used anywhere by the code and it's hardcoded so you have to change the code and recompile to even use it (which means I doubt anyone has even tried it in a long long time). And if we end up needing it in the future, we'll have to figure out how to make it safe. It's already removed from 2.2. Yeah I don't think anyone uses it. bq. Why isn't the added assertion in WriteCallbackInfo on 3.0 not using !shouldHint lie in the 2.1 patch? It's an oversight from merging. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cas
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041410#comment-15041410 ] Sylvain Lebresne commented on CASSANDRA-10477: -- * The failure detector will never return false for the local host, so the changes in the 2nd branch of commitPaxos are unnecessary. * We're kind of dodging the hint "overload" protection on the paxos path as we don't use {{sendToHintedEndpoints}} (which in particular makes the comment on {{commitPaxosLocal}} misleading as it suggests otherwise). I think the simplest solution is to move the overload test from {{sendToHintedEndpoints}} to some {{checkOverloaded()}} method and call that in {{commitPaxos}} too. * Instead of adding the {{droppable()}} method to {{LocalMutationRunnable}}, we should probably use {{MessagingService.DROPPABLE_VERBS.contains(verb)}}. * In theory, we could still run into the problem of that ticket if {{OPTIMIZE_LOCAL_REQUESTS}} is {{false}}. And in fact, I believe this option is unsafe since at least CASSANDRA-4753 as we somewhat strongly assume writes to the localhost do *not* go through {{MessagingService}}. So I would suggest ditching that option. Not only is it unsafe, but it's not used anywhere by the code and it's hardcoded so you have to change the code and recompile to even use it (which means I doubt anyone has even tried it in a long long time). And if we end up needing it in the future, we'll have to figure out how to make it safe. * Why isn't the added assertion in {{WriteCallbackInfo}} on 3.0 not using {{!shouldHint}} lie in the 2.1 patch? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012348#comment-15012348 ] Ariel Weisberg commented on CASSANDRA-10477: Proposed fix |[2.1 code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]| |[3.0 code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]| > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011675#comment-15011675 ] Blake Eggleston commented on CASSANDRA-10477: - It seems like adding a paxos commit equivalent of StorageProxy.insertLocal, and submitting local commits that way would be the safest thing to do here. In theory, you should be able to add a check against the local address to StorageProxy.shouldHint and just drop the commit message if the node is overloaded, it should get back up to speed on the next paxos round. However there may be subtleties and edge cases that I'm not thinking of, so I don't want to recommend that without giving this more thought. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011402#comment-15011402 ] Ariel Weisberg commented on CASSANDRA-10477: [~bdeggleston] [~slebresne] can you chime in on whether I am on the right track here? Should {{[StorageProxy.commitPaxos|https://github.com/apache/cassandra/blob/cassandra-2.1.11/src/java/org/apache/cassandra/service/StorageProxy.java#L494]}} not be sending messages to the local node that are eligible for hinting on timeout? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011365#comment-15011365 ] Ariel Weisberg commented on CASSANDRA-10477: Good news is that I am at least partially correct and PAXOS is heading down the road to submitting hints for the local node. [New failing utests from this assertion|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test?expand=1#diff-5e7d892105f1fa0706dbedf919b5dd99L46] http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/1/testReport/junit/org.apache.cassandra.triggers/TriggersTest/executeTriggerOnCqlInsertWithConditions/ http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/1/testReport/junit/org.apache.cassandra.triggers/TriggersTest/executeTriggerOnCqlBatchWithConditions/ http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/1/testReport/junit/org.apache.cassandra.triggers/TriggersTest/executeTriggerOnThriftCASOperation/ Also several [failing dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/1/#showFailuresLink] I'll try getting the PAXOS code to do something similar to the insertLocal where it doesn't submit a real hint. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009725#comment-15009725 ] Ariel Weisberg commented on CASSANDRA-10477: Theory time. [There is a path by which tasks that are supposed to go through the local hint process for inserts need to use.|https://github.com/apache/cassandra/blob/cassandra-2.1.9/src/java/org/apache/cassandra/service/StorageProxy.java#L1027] Since we have a case where an insert does not go down this path it kind of implies that one of the other call sites for inserts is incorrect and is going through the remote message service path. It only happens when the node is overloaded and local inserts start timing out. The reason you don't normally see it is that local inserts probably don't time out most of the time. One thing you could do is increase the mutation timeouts to see if you can get past the low performance period without timing out and hitting this. However I think that the assertion is a symptom of a different problem and not the cause for the performance/availability issues. It's the canary in the coal mine letting you know this broken path is being taken due timeouts of local mutations. I think the thing to do is search the call hierarchy of {{[StorageProxy.submitHint|https://github.com/apache/cassandra/blob/cassandra-2.1.9/src/java/org/apache/cassandra/service/StorageProxy.java#L944}} to find a path where it can be reached when timing out a local write. We know it's coming through MessageService in this instance which makes it a little trickier because the type of the callback isn't known. It looks like PAXOS might in some cases go down this path incorrectly. I am going to try running a few things locally with some assertions to see if I can get it to send a message with hint delivery to itself. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009335#comment-15009335 ] Ariel Weisberg commented on CASSANDRA-10477: Can you send me the yaml's your are using at each node? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007923#comment-15007923 ] Hao Bryan Cheng commented on CASSANDRA-10477: - Just observed this issue again. Node was undergoing anticompaction when it occurred- once again brought the ring to a halt. Couldn't get all the required information due to the urgency of the situation, but did confirm that nodetool status reported the node as up with no issue (on another node). I have fresh logs to offer out-of-band to anyone who is investigating this issue- feel free to email or ping here. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1504#comment-1504 ] Hao Bryan Cheng commented on CASSANDRA-10477: - A few additional details: Unfortunately, I didn't get any data while the issue was happening. Afterwards, netstat, nodetool status, etc. are all nominal. During the period of time when this node was experiencing difficulty, no other nodes reported any unhealthy hosts. However, we do have our phi convict threshold tuned up from 8 to 10, due to running on AWS. This event was localized to one node out of 12. Keyspace RF ranges from 3-5. Queries at LOCAL_QUORUM were timing out with insufficient responses. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1462#comment-1462 ] Hao Bryan Cheng commented on CASSANDRA-10477: - Hello, we just observed this on a cluster running 2.1.11, Oracle Java 1.8.0_66. A single machine experienced this issue, causing our entire cluster to grind to a halt on any quorum operations. Our logs feature an extremely large number of: {code} ERROR [EXPIRING-MAP-REAPER:1] 2015-11-11 05:10:22,894 CassandraDaemon.java:227 - Exception in threa d Thread[EXPIRING-MAP-REAPER:1,5,main] java.lang.AssertionError: /172.31.3.33 at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) ~[apache-cas sandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) ~[apache-ca ssandra-2.1.11.jar:2.1.11] at org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) ~[apache-ca ssandra-2.1.11.jar:2.1.11] at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) ~[apache-cassandra-2.1 .11.jar:2.1.11] at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunna ble.run(DebuggableScheduledThreadPoolExecutor.java:118) ~[apache-cassandra-2.1.11.jar:2.1.11] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_66] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_66] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Schedule dThreadPoolExecutor.java:180) [na:1.8.0_66] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThread PoolExecutor.java:294) [na:1.8.0_66] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8. 0_66] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8. 0_66] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66] {code} Additionally, this is interspersed with every nearly every other neighbor node being marked down: {code} INFO [GossipStage:1] 2015-11-11 04:14:25,369 Gossiper.java:1020 - Node /172.31.55.172 has restarted, now UP INFO [GossipStage:1] 2015-11-11 04:14:25,369 TokenMetadata.java:414 - Updating topology for /172.31.55.172 INFO [GossipStage:1] 2015-11-11 04:14:25,369 TokenMetadata.java:414 - Updating topology for /172.31.55.172 INFO [GossipStage:1] 2015-11-11 04:14:25,370 StorageService.java:1698 - Node /172.31.55.172 state jump to normal INFO [GossipStage:1] 2015-11-11 04:14:25,372 TokenMetadata.java:414 - Updating topology for /172.31.55.172 INFO [GossipStage:1] 2015-11-11 04:14:25,372 TokenMetadata.java:414 - Updating topology for /172.31.55.172 INFO [SharedPool-Worker-3] 2015-11-11 04:14:25,531 Gossiper.java:987 - InetAddress /172.31.55.172 is now UP INFO [SharedPool-Worker-5] 2015-11-11 04:14:25,536 Gossiper.java:987 - InetAddress /172.31.55.172 is now UP INFO [SharedPool-Worker-3] 2015-11-11 04:14:25,536 Gossiper.java:987 - InetAddress /172.31.55.172 is now UP INFO [SharedPool-Worker-1] 2015-11-11 04:14:25,536 Gossiper.java:987 - InetAddress /172.31.55.172 is now UP INFO [SharedPool-Worker-4] 2015-11-11 04:14:25,536 Gossiper.java:987 - InetAddress /172.31.55.172 is now UP INFO [HANDSHAKE-/172.31.55.172] 2015-11-11 04:14:25,537 OutboundTcpConnection.java:485 - Handshaking version with /172.31.55.172 [snipped] WARN [GossipTasks:1] 2015-11-11 04:18:26,379 Gossiper.java:747 - Gossip stage has 15 pending tasks ; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-11-11 04:18:27,480 Gossiper.java:747 - Gossip stage has 17 pending tasks ; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-11-11 04:18:28,580 Gossiper.java:747 - Gossip stage has 19 pending tasks ; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-11-11 04:18:29,681 Gossiper.java:747 - Gossip stage has 21 pending tasks ; skipping status check (no nodes will be marked down) WARN [GossipTasks:1] 2015-11-11 04:18:30,781 Gossiper.java:747 - Gossip stage has 25 pending tasks ; skipping status check (no nodes will be marked down) ... {code} No other nodes were restarted in this time frame. Please let us know if there is any additional information we can provide. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few da
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999379#comment-14999379 ] Ariel Weisberg commented on CASSANDRA-10477: This assertion makes it look like the node either think it's broadcast address is that of another node, or alternatively it is connecting with itself which is causing it to submit hints to itself which is what the assertion is checking for. Neither condition should occur. If you could also get me the output of "netstat -tlnp" along with node tool status, ring, and netstats when the problem occurs that would be helpful. You could do it now just to see if something shows up, but definitely when the problem occurs. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999369#comment-14999369 ] Ariel Weisberg commented on CASSANDRA-10477: [~leonhardt] can you email me the log file to the address in my profile? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt >Assignee: Ariel Weisberg > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953262#comment-14953262 ] Philip Thompson commented on CASSANDRA-10477: - Once I find someone to work on this, they'll share their contact info with you. Thanks. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953112#comment-14953112 ] Severin Leonhardt commented on CASSANDRA-10477: --- [~philipthompson] I can't make the logfile publicly accessible but I can share it by mail. Let me know to whom I should send it. > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint
[ https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950826#comment-14950826 ] Philip Thompson commented on CASSANDRA-10477: - [~iamaleksey], who should be assigned to this? [~leonhardt], can you attach the system.log file from one of the affected nodes? > java.lang.AssertionError in StorageProxy.submitHint > --- > > Key: CASSANDRA-10477 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10477 > Project: Cassandra > Issue Type: Bug > Environment: CentOS 6, Oracle JVM 1.8.45 >Reporter: Severin Leonhardt > Fix For: 2.1.x > > > A few days after updating from 2.0.15 to 2.1.9 we have the following log > entry on 2 of 5 machines: > {noformat} > ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 > CassandraDaemon.java:223 - Exception in thread > Thread[EXPIRING-MAP-REAPER:1,5,main] > java.lang.AssertionError: /192.168.11.88 > at > org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > ~[apache-cassandra-2.1.9.jar:2.1.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_45] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_45] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_45] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > {noformat} > 192.168.11.88 is the broadcast address of the local machine. > When this is logged the read request latency of the whole cluster becomes > very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients > get a lot of timeouts. We need to restart the affected Cassandra node to get > back normal read latencies. It seems write latency is not affected. > Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the > assert from being logged. At some point the read latency becomes bad again. > Restarting the node where hinted handoff was disabled results in the read > latency being better again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)