[jira] [Updated] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes

Stefano Ortolani (JIRA) Thu, 05 May 2016 15:37:40 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefano Ortolani updated CASSANDRA-11723:
-----------------------------------------
    Description: 
Upgrade seems fine, but any restart of the node might lead to a situation where 
the node just dies after 30 seconds / 1 minute. 

Nothing in the logs besides many "FailureDetector.java:456 - Ignoring interval 
time of 3000892567 for /10.12.a.x" output every second (against all other 
nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair notifications:

{code:xml}
DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
Ignoring interval time of 2373187360 for /10.12.a.x
DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
Ignoring interval time of 2000276196 for /10.12.a.y
DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
(d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
        at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
 ~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
Ignoring interval time of 3000299340 for /10.12.33.5
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
ScheduledReporter.java:119 - RuntimeException thrown from 
GraphiteReporter#report. Exception was suppressed.
java.lang.IllegalStateException: Unable to compute ceiling for max when 
histogram overflowed
        at 
org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
 ~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
 ~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_60]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_60]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_60]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
{code}

I know this is not much but nothing else gets to dmesg or to any other log. Any 
suggestion how to debug this further?

I upgraded two nodes so far, and it happened on both nodes.


  was:
Upgrade seems fine, but any restart of the node might lead to a situation where 
the node just dies after 30 seconds / 1 minute. 

Nothing in the logs besides many "FailureDetector.java:456 - Ignoring interval 
time of 3000892567 for /10.12.a.x" output every second (against all other 
nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair notifications:

{{
DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
Ignoring interval time of 2373187360 for /10.12.a.x
DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
Ignoring interval time of 2000276196 for /10.12.a.y
DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
Digest mismatch:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
(d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
        at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
 ~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
Ignoring interval time of 3000299340 for /10.12.33.5
ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
ScheduledReporter.java:119 - RuntimeException thrown from 
GraphiteReporter#report. Exception was suppressed.
java.lang.IllegalStateException: Unable to compute ceiling for max when 
histogram overflowed
        at 
org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
 ~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
 ~[apache-cassandra-3.0.5.jar:3.0.5]
        at 
com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
 ~[metrics-graphite-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
~[metrics-core-3.1.0.jar:3.1.0]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_60]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
[na:1.8.0_60]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_60]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
 [na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
}}

I know this is not much but nothing else gets to dmesg or to any other log. Any 
suggestion how to debug this further?

I upgraded two nodes so far, and it happened on both nodes.



> Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-11723
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11723
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefano Ortolani
>            Priority: Critical
>
> Upgrade seems fine, but any restart of the node might lead to a situation 
> where the node just dies after 30 seconds / 1 minute. 
> Nothing in the logs besides many "FailureDetector.java:456 - Ignoring 
> interval time of 3000892567 for /10.12.a.x" output every second (against all 
> other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair 
> notifications:
> {code:xml}
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2373187360 for /10.12.a.x
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2000276196 for /10.12.a.y
> DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
> Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
> DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
> (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
>       at 
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
> Ignoring interval time of 3000299340 for /10.12.33.5
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.IllegalStateException: Unable to compute ceiling for max when 
> histogram overflowed
>       at 
> org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>       at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_60]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> I know this is not much but nothing else gets to dmesg or to any other log. 
> Any suggestion how to debug this further?
> I upgraded two nodes so far, and it happened on both nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes

Reply via email to