[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-12 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094149#comment-15094149
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


Fixed 3.0 compilation issue. Added 2.2 branch and updated test matrix.
|[2.1 
code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]|
|[2.2 
code|https://github.com/apache/cassandra/compare/cassandra-2.2...aweisberg:CASSANDRA-10477-2.2]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-2.2-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-2.2-dtest/]|
|[3.0 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]|
|[3.3 
code|https://github.com/apache/cassandra/compare/cassandra-3.3...aweisberg:CASSANDRA-10477-3.3]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-dtest/]|
|[Trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]|

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-12 Thread Jacques-Henri Berthemet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094166#comment-15094166
 ] 

Jacques-Henri Berthemet commented on CASSANDRA-10477:
-

Great, thank you.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-12 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093874#comment-15093874
 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
--

[~aweisberg] I think you have a bad merge on 3.0 (though strangely the 3.3 and 
trunk branches seem fine), the test run failed at compilation time.

bq. Will it be fixed in 2.2.x too?

It will.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-12 Thread Jacques-Henri Berthemet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093552#comment-15093552
 ] 

Jacques-Henri Berthemet commented on CASSANDRA-10477:
-

Will it be fixed in 2.2.x too?

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092336#comment-15092336
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


Rebased, updated commit message, updated test matrix, started tests.
|[2.1 
code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]|
|[3.0 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]|
|[3.3 
code|https://github.com/apache/cassandra/compare/cassandra-3.3...aweisberg:CASSANDRA-10477-3.3]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.3-dtest/]|
|[Trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]|

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092221#comment-15092221
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


I agree the assertion should just be on the address. Already made the change 
back in December just need to get the tests done.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092200#comment-15092200
 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
--

Can you also answer my comments on the assertion?

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-11 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092170#comment-15092170
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


The tests are passing, but enough time has passed that I should rebase and test 
again. Will do that today.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091802#comment-15091802
 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
--

[~aweisberg] the ball is in your court I believe.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2016-01-11 Thread Jacques-Henri Berthemet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091779#comment-15091779
 ] 

Jacques-Henri Berthemet commented on CASSANDRA-10477:
-

I'm having the same problem on 2.2.3:
{code}
10:55:54.203 [ERROR] CassandraDaemon- Exception in thread 
Thread[EXPIRING-MAP-REAPER:1,5,UCS-Threads] java.lang.AssertionError: 
/
at 
org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:978)
at 
org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:399)
at 
org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:379)
at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98)
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

What is the status on this issue? Can I expect this to be fixed in 2.2.x branch?

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-12-07 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15044588#comment-15044588
 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
--

bq. The assertion doesn't care if hints are disabled along with several of the 
other things that are added.

First, I still don't understand why it's not consistent between 2.1 and 3.0. As 
far as I can tell, the {{WriteCallbackInfo.shouldHint()}} mostly method calls 
{{StorageProxy.shouldHint()}} which does pretty much the same thing in both 
versions.  Second, I'd argue the assertion _must_ use {{!shouldHint()}} because 
what we're trying to assert is that {{submitHint}} is never called for 
localhost on the expiration of a callback, and that depends on the result of 
{{shouldHint()}}. That said, I think it would almost be better to have the 
assertion just be {{!target.equals(FBUtilities.getBroadcastAddress())}} as 
we're basically saying a local write should always use the specific local path, 
not {{MessagingService}}. In any case, I think the assertion is worth a quick 
comment to explain why we're asserting that here.

The rest of the changes lgtm, but the unit tests on 3.0 don't seem to have run 
due to some problem with an {{@Override}}.

bq. and prognosticate on how I want to test OE

The lack of coverage of OE is certainly something we should fix (it's not 
trivial though), but I would suggest not blocking that fix for that since it's 
not directly related (meaning, we should probably open a separate ticket for 
it).


> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-12-04 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15042198#comment-15042198
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


bq. Why isn't the added assertion in WriteCallbackInfo on 3.0 not using 
!shouldHint lie in the 2.1 patch?
This turns out to be because shouldHint() has additional stuff that the 
assertion doesn't want. The assertion doesn't care if hints are disabled along 
with several of the other things that are added.

I think I managed to shuffle everything correctly. Going to let the tests run 
and prognosticate on how I want to test OE. I grepped the dtests and unit tests 
for OverloadedException and didn't get a single hit!

|[2.1 
code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]|
|[3.0 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]|
|[Trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:CASSANDRA-10477-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-trunk-dtest/]|

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-12-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041862#comment-15041862
 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
--

bq. Which aspect of hint "overload" protection is missing? I see it increments 
a counter which I thought was the signal upstream.

This is about whom is looking at said counter (to do something about it if it's 
too high). The normal write path is, and so incrementing the counter in CAS 
will potentially apply back-pressure on normal write, but not on CAS request 
themselves.

bq. Looking at it further is it because it doesn't throw OverloadedException? 
So a better behavior would be to have the check and exception in a helper 
method and use that in commitPaxos() so that it can now throw 
OverloadedException?

Exactly.

bq. I do wonder what the unforeseen consequences of having CAS capable of 
throwing OE is going to do that we haven't seen or tested before.

It's a good question, and to be honest I'm not sure we have any test that cover 
{{OverloadException}} at all (but I could be wrong). But in general, the commit 
part of Paxos is not very "sensible": worst case, if not enough replica get the 
commit, the next serial operation (including a read) on the partition will 
re-commit. So the main question is whether potentially throwing 
{{OverloadedException}} would surprise people. I would argue it shouldn't 
because normal writes can do so and we never specified it was any different for 
CAS. That said, if we're uncomfortable with it, I'm totally fine committing 
that part of the change only in 3.2 (aka trunk currently).

bq. the read path now throws OE where it didn't before

Right. That's probably more justification for keeping that part in 3.2 only.



> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-12-04 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041786#comment-15041786
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


bq. We're kind of dodging the hint "overload" protection on the paxos path as 
we don't use sendToHintedEndpoints (which in particular makes the comment on 
commitPaxosLocal misleading as it suggests otherwise). I think the simplest 
solution is to move the overload test from sendToHintedEndpoints to some 
checkOverloaded() method and call that in commitPaxos too.
Which aspect of hint "overload" protection is missing? [I see it increments a 
counter which I thought was the signal 
upstream.|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/service/StorageProxy.java#L976]

Looking at it further is it because it doesn't throw {{OverloadedException}}? 
So a better behavior would be to have the check and exception in a helper 
method and use that in commitPaxos() so that it can now throw 
{{OverloadedException}}?

I do wonder what the unforeseen consequences of having {{CAS}} capable of 
throwing {{OE}} is going to do that we haven't seen or tested before. Where 
this gets interesting is that the read path now throws {{OE}} where it didn't 
before because apparently serial consistency reads can end up calling 
{{beginAndRepairPaxos}}. I need to take a close look at how we test this path 
to make sure it's going to behave well once exercised.

bq. In theory, we could still run into the problem of that ticket if 
OPTIMIZE_LOCAL_REQUESTS is false. And in fact, I believe this option is unsafe 
since at least CASSANDRA-4753 as we somewhat strongly assume writes to the 
localhost do not go through MessagingService. So I would suggest ditching that 
option. Not only is it unsafe, but it's not used anywhere by the code and it's 
hardcoded so you have to change the code and recompile to even use it (which 
means I doubt anyone has even tried it in a long long time). And if we end up 
needing it in the future, we'll have to figure out how to make it safe.
It's already removed from 2.2. Yeah I don't think anyone uses it.

bq. Why isn't the added assertion in WriteCallbackInfo on 3.0 not using 
!shouldHint lie in the 2.1 patch?
It's an oversight from merging.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cas

[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-12-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041410#comment-15041410
 ] 

Sylvain Lebresne commented on CASSANDRA-10477:
--

* The failure detector will never return false for the local host, so the 
changes in the 2nd branch of commitPaxos are unnecessary.
* We're kind of dodging the hint "overload" protection on the paxos path as we 
don't use {{sendToHintedEndpoints}} (which in particular makes the comment on 
{{commitPaxosLocal}} misleading as it suggests otherwise). I think the simplest 
solution is to move the overload test from {{sendToHintedEndpoints}} to some 
{{checkOverloaded()}} method and call that in {{commitPaxos}} too.
* Instead of adding the {{droppable()}} method to {{LocalMutationRunnable}}, we 
should probably use {{MessagingService.DROPPABLE_VERBS.contains(verb)}}.
* In theory, we could still run into the problem of that ticket if 
{{OPTIMIZE_LOCAL_REQUESTS}} is {{false}}. And in fact, I believe this option is 
unsafe since at least CASSANDRA-4753 as we somewhat strongly assume writes to 
the localhost do *not* go through {{MessagingService}}. So I would suggest 
ditching that option. Not only is it unsafe, but it's not used anywhere by the 
code and it's hardcoded so you have to change the code and recompile to even 
use it (which means I doubt anyone has even tried it in a long long time). And 
if we end up needing it in the future, we'll have to figure out how to make it 
safe.
* Why isn't the added assertion in {{WriteCallbackInfo}} on 3.0 not using 
{{!shouldHint}} lie in the 2.1 patch?


> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012348#comment-15012348
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


Proposed fix

|[2.1 
code|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/]|
|[3.0 
code|https://github.com/apache/cassandra/compare/cassandra-3.0...aweisberg:CASSANDRA-10477-3.0]|[utests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-testall/]|[dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-3.0-dtest/]|

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-18 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011675#comment-15011675
 ] 

Blake Eggleston commented on CASSANDRA-10477:
-

It seems like adding a paxos commit equivalent of StorageProxy.insertLocal, and 
submitting local commits that way would be the safest thing to do here. In 
theory, you should be able to add a check against the local address to 
StorageProxy.shouldHint and just drop the commit message if the node is 
overloaded, it should get back up to speed on the next paxos round. However 
there may be subtleties and edge cases that I'm not thinking of, so I don't 
want to recommend that without giving this more thought.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011402#comment-15011402
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


[~bdeggleston] [~slebresne] can you chime in on whether I am on the right track 
here?

Should 
{{[StorageProxy.commitPaxos|https://github.com/apache/cassandra/blob/cassandra-2.1.11/src/java/org/apache/cassandra/service/StorageProxy.java#L494]}}
 not be sending messages to the local node that are eligible for hinting on 
timeout?


> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-18 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011365#comment-15011365
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


Good news is that I am at least partially correct and PAXOS is heading down the 
road to submitting hints for the local node.

[New failing utests from this 
assertion|https://github.com/apache/cassandra/compare/cassandra-2.1...aweisberg:CASSANDRA-10477-test?expand=1#diff-5e7d892105f1fa0706dbedf919b5dd99L46]
http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/1/testReport/junit/org.apache.cassandra.triggers/TriggersTest/executeTriggerOnCqlInsertWithConditions/
http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/1/testReport/junit/org.apache.cassandra.triggers/TriggersTest/executeTriggerOnCqlBatchWithConditions/
http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-testall/1/testReport/junit/org.apache.cassandra.triggers/TriggersTest/executeTriggerOnThriftCASOperation/

Also several [failing 
dtests|http://cassci.datastax.com/view/Dev/view/aweisberg/job/aweisberg-CASSANDRA-10477-test-dtest/1/#showFailuresLink]

I'll try getting the PAXOS code to do something similar to the insertLocal 
where it doesn't submit a real hint.


> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009725#comment-15009725
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


Theory time. [There is a path by which tasks that are supposed to go through 
the local hint process for inserts need to 
use.|https://github.com/apache/cassandra/blob/cassandra-2.1.9/src/java/org/apache/cassandra/service/StorageProxy.java#L1027]
 Since we have a case where an insert does not go down this path it kind of 
implies that one of the other call sites for inserts is incorrect and is going 
through the remote message service path.

It only happens when the node is overloaded and local inserts start timing out. 
The reason you don't normally see it is that local inserts probably don't time 
out most of the time. One thing you could do is increase the mutation timeouts 
to see if you can get past the low performance period without timing out and 
hitting this.

However I think that the assertion is a symptom of a different problem and not 
the cause for the performance/availability issues. It's the canary in the coal 
mine letting you know this broken path is being taken due timeouts of local 
mutations.

I think the thing to do is search the call hierarchy of 
{{[StorageProxy.submitHint|https://github.com/apache/cassandra/blob/cassandra-2.1.9/src/java/org/apache/cassandra/service/StorageProxy.java#L944}}
 to find a  path where it can be reached when timing out a local write. We know 
it's coming through MessageService in this instance which makes it a little 
trickier because the type of the callback isn't known. It looks like PAXOS 
might in some cases go down this path incorrectly.

I am going to try running a few things locally with some assertions to see if I 
can get it to send a message with hint delivery to itself.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-17 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009335#comment-15009335
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


Can you send me the yaml's your are using at each node?

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-16 Thread Hao Bryan Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007923#comment-15007923
 ] 

Hao Bryan Cheng commented on CASSANDRA-10477:
-

Just observed this issue again. Node was undergoing anticompaction when it 
occurred- once again brought the ring to a halt.

Couldn't get all the required information due to the urgency of the situation, 
but did confirm that nodetool status reported the node as up with no issue (on 
another node).

I have fresh logs to offer out-of-band to anyone who is investigating this 
issue- feel free to email or ping here.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-10 Thread Hao Bryan Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1504#comment-1504
 ] 

Hao Bryan Cheng commented on CASSANDRA-10477:
-

A few additional details:

Unfortunately, I didn't get any data while the issue was happening. Afterwards, 
netstat, nodetool status, etc. are all nominal.

During the period of time when this node was experiencing difficulty, no other 
nodes reported any unhealthy hosts. However, we do have our phi convict 
threshold tuned up from 8 to 10, due to running on AWS.

This event was localized to one node out of 12. Keyspace RF ranges from 3-5. 
Queries at LOCAL_QUORUM were timing out with insufficient responses.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-10 Thread Hao Bryan Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1462#comment-1462
 ] 

Hao Bryan Cheng commented on CASSANDRA-10477:
-

Hello, we just observed this on a cluster running 2.1.11, Oracle Java 1.8.0_66.

A single machine experienced this issue, causing our entire cluster to grind to 
a halt on any quorum operations.

Our logs feature an extremely large number of:

{code}
ERROR [EXPIRING-MAP-REAPER:1] 2015-11-11 05:10:22,894 CassandraDaemon.java:227 
- Exception in threa
d Thread[EXPIRING-MAP-REAPER:1,5,main]
java.lang.AssertionError: /172.31.3.33
at 
org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
~[apache-cas
sandra-2.1.11.jar:2.1.11]
at 
org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
~[apache-ca
ssandra-2.1.11.jar:2.1.11]
at 
org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
~[apache-ca
ssandra-2.1.11.jar:2.1.11]
at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
~[apache-cassandra-2.1
.11.jar:2.1.11]
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunna
ble.run(DebuggableScheduledThreadPoolExecutor.java:118) 
~[apache-cassandra-2.1.11.jar:2.1.11]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_66]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_66]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(Schedule
dThreadPoolExecutor.java:180) [na:1.8.0_66]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThread
PoolExecutor.java:294) [na:1.8.0_66]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.
0_66]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.
0_66]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
{code}

Additionally, this is interspersed with every nearly every other neighbor node 
being marked down:

{code}
INFO  [GossipStage:1] 2015-11-11 04:14:25,369 Gossiper.java:1020 - Node 
/172.31.55.172 has restarted, now UP
INFO  [GossipStage:1] 2015-11-11 04:14:25,369 TokenMetadata.java:414 - Updating 
topology for /172.31.55.172
INFO  [GossipStage:1] 2015-11-11 04:14:25,369 TokenMetadata.java:414 - Updating 
topology for /172.31.55.172
INFO  [GossipStage:1] 2015-11-11 04:14:25,370 StorageService.java:1698 - Node 
/172.31.55.172 state jump to normal
INFO  [GossipStage:1] 2015-11-11 04:14:25,372 TokenMetadata.java:414 - Updating 
topology for /172.31.55.172
INFO  [GossipStage:1] 2015-11-11 04:14:25,372 TokenMetadata.java:414 - Updating 
topology for /172.31.55.172
INFO  [SharedPool-Worker-3] 2015-11-11 04:14:25,531 Gossiper.java:987 - 
InetAddress /172.31.55.172 is now UP
INFO  [SharedPool-Worker-5] 2015-11-11 04:14:25,536 Gossiper.java:987 - 
InetAddress /172.31.55.172 is now UP
INFO  [SharedPool-Worker-3] 2015-11-11 04:14:25,536 Gossiper.java:987 - 
InetAddress /172.31.55.172 is now UP
INFO  [SharedPool-Worker-1] 2015-11-11 04:14:25,536 Gossiper.java:987 - 
InetAddress /172.31.55.172 is now UP
INFO  [SharedPool-Worker-4] 2015-11-11 04:14:25,536 Gossiper.java:987 - 
InetAddress /172.31.55.172 is now UP
INFO  [HANDSHAKE-/172.31.55.172] 2015-11-11 04:14:25,537 
OutboundTcpConnection.java:485 - Handshaking version with /172.31.55.172
[snipped]
WARN  [GossipTasks:1] 2015-11-11 04:18:26,379 Gossiper.java:747 - Gossip stage 
has 15 pending tasks
; skipping status check (no nodes will be marked down)
WARN  [GossipTasks:1] 2015-11-11 04:18:27,480 Gossiper.java:747 - Gossip stage 
has 17 pending tasks
; skipping status check (no nodes will be marked down)
WARN  [GossipTasks:1] 2015-11-11 04:18:28,580 Gossiper.java:747 - Gossip stage 
has 19 pending tasks
; skipping status check (no nodes will be marked down)
WARN  [GossipTasks:1] 2015-11-11 04:18:29,681 Gossiper.java:747 - Gossip stage 
has 21 pending tasks
; skipping status check (no nodes will be marked down)
WARN  [GossipTasks:1] 2015-11-11 04:18:30,781 Gossiper.java:747 - Gossip stage 
has 25 pending tasks
; skipping status check (no nodes will be marked down)
...
{code}

No other nodes were restarted in this time frame.

Please let us know if there is any additional information we can provide.


> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few da

[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999379#comment-14999379
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


This assertion makes it look like the node either think it's broadcast address 
is that of another node, or alternatively it is connecting with itself which is 
causing it to submit hints to itself which is what the assertion is checking 
for. Neither condition should occur.

If you could also get me the output of "netstat -tlnp" along with node tool 
status, ring, and netstats when the problem occurs that would be helpful. You 
could do it now just to see if something shows up, but definitely when the 
problem occurs.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-11-10 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14999369#comment-14999369
 ] 

Ariel Weisberg commented on CASSANDRA-10477:


[~leonhardt] can you email me the log file to the address in my profile?

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
>Assignee: Ariel Weisberg
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-10-12 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953262#comment-14953262
 ] 

Philip Thompson commented on CASSANDRA-10477:
-

Once I find someone to work on this, they'll share their contact info with you. 
Thanks.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-10-12 Thread Severin Leonhardt (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14953112#comment-14953112
 ] 

Severin Leonhardt commented on CASSANDRA-10477:
---

[~philipthompson] I can't make the logfile publicly accessible but I can share 
it by mail. Let me know to whom I should send it.

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10477) java.lang.AssertionError in StorageProxy.submitHint

2015-10-09 Thread Philip Thompson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950826#comment-14950826
 ] 

Philip Thompson commented on CASSANDRA-10477:
-

[~iamaleksey], who should be assigned to this?

[~leonhardt], can you attach the system.log file from one of the affected nodes?

> java.lang.AssertionError in StorageProxy.submitHint
> ---
>
> Key: CASSANDRA-10477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10477
> Project: Cassandra
>  Issue Type: Bug
> Environment: CentOS 6, Oracle JVM 1.8.45
>Reporter: Severin Leonhardt
> Fix For: 2.1.x
>
>
> A few days after updating from 2.0.15 to 2.1.9 we have the following log 
> entry on 2 of 5 machines:
> {noformat}
> ERROR [EXPIRING-MAP-REAPER:1] 2015-10-07 17:01:08,041 
> CassandraDaemon.java:223 - Exception in thread 
> Thread[EXPIRING-MAP-REAPER:1,5,main]
> java.lang.AssertionError: /192.168.11.88
> at 
> org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:949) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:383) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.net.MessagingService$5.apply(MessagingService.java:363) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at org.apache.cassandra.utils.ExpiringMap$1.run(ExpiringMap.java:98) 
> ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
>  ~[apache-cassandra-2.1.9.jar:2.1.9]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_45]
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_45]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> {noformat}
> 192.168.11.88 is the broadcast address of the local machine.
> When this is logged the read request latency of the whole cluster becomes 
> very bad, from 6 ms/op to more than 100 ms/op according to OpsCenter. Clients 
> get a lot of timeouts. We need to restart the affected Cassandra node to get 
> back normal read latencies. It seems write latency is not affected.
> Disabling hinted handoff using {{nodetool disablehandoff}} only prevents the 
> assert from being logged. At some point the read latency becomes bad again. 
> Restarting the node where hinted handoff was disabled results in the read 
> latency being better again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)