[ https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744732#comment-15744732 ]
Stefan Podkowinski commented on CASSANDRA-9625: ----------------------------------------------- I think [~ruoranwang] is right by addressing the {{getEstimatedRemainingTasks}} call, as it will delegate to the {{LeveledManifest}} version, which is synchronized and causes the reporter thread to block. At some point the reporter must get stuck after waiting too long. I'm not certain about the exact reasons for this, but having the reporter thread competing for compaction locks doesn't seem like a good idea in general to me, so I'd suggest to use a cached value of the remaining tasks count instead. This should also improve performance a bit by avoiding continuous level size calculation on unchanged sets of sstables. ||2.1||2.2||3.0||3.x|| |[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-3.x]| |[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.x-dtest/]| |[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.x-testall/]| Anyone wants to give this a try by running a patched node? Test results look ok except for the failing 2.1 {{LeveledCompactionStrategyTest.testMutateLevel}}, which always times out but works fine locally - any idea what can be done about that? > GraphiteReporter not reporting > ------------------------------ > > Key: CASSANDRA-9625 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9625 > Project: Cassandra > Issue Type: Bug > Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3 > Reporter: Eric Evans > Assignee: T Jake Luciani > Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, > thread-dump.log > > > When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops > working. The usual startup is logged, and one batch of samples is sent, but > the reporting interval comes and goes, and no other samples are ever sent. > The logs are free from errors. > Frustratingly, metrics reporting works in our smaller (staging) environment > on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not > on a 3 node (otherwise identical) staging cluster (maybe it takes a certain > level of concurrency?). > Attached is a thread dump, and our metrics.yaml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)