[ 
https://issues.apache.org/jira/browse/CASSANDRA-9625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744732#comment-15744732
 ] 

Stefan Podkowinski commented on CASSANDRA-9625:
-----------------------------------------------

I think [~ruoranwang] is right by addressing the {{getEstimatedRemainingTasks}} 
call, as it will delegate to the {{LeveledManifest}} version, which is 
synchronized and causes the reporter thread to block. At some point the 
reporter must get stuck after waiting too long. I'm not certain about the exact 
reasons for this, but having the reporter thread competing for compaction locks 
doesn't seem like a good idea in general to me, so I'd suggest to use a cached 
value of the remaining tasks count instead. This should also improve 
performance a bit by avoiding continuous level size calculation on unchanged 
sets of sstables.


||2.1||2.2||3.0||3.x||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-2.1]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-9625-3.x]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.1-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.x-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.1-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-9625-3.x-testall/]|

Anyone wants to give this a try by running a patched node? Test results look ok 
except for the failing 2.1 {{LeveledCompactionStrategyTest.testMutateLevel}}, 
which always times out but works fine locally - any idea what can be done about 
that? 

> GraphiteReporter not reporting
> ------------------------------
>
>                 Key: CASSANDRA-9625
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9625
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Debian Jessie, 7u79-2.5.5-1~deb8u1, Cassandra 2.1.3
>            Reporter: Eric Evans
>            Assignee: T Jake Luciani
>         Attachments: Screen Shot 2016-04-13 at 10.40.58 AM.png, metrics.yaml, 
> thread-dump.log
>
>
> When upgrading from 2.1.3 to 2.1.6, the Graphite metrics reporter stops 
> working.  The usual startup is logged, and one batch of samples is sent, but 
> the reporting interval comes and goes, and no other samples are ever sent.  
> The logs are free from errors.
> Frustratingly, metrics reporting works in our smaller (staging) environment 
> on 2.1.6; We are able to reproduce this on all 6 of production nodes, but not 
> on a 3 node (otherwise identical) staging cluster (maybe it takes a certain 
> level of concurrency?).
> Attached is a thread dump, and our metrics.yaml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to