[ https://issues.apache.org/jira/browse/CASSANDRA-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Edward Ribeiro updated CASSANDRA-11823: --------------------------------------- Attachment: CASSANDRA-11823.patch Hi [~ostefano] and [~Stefania], I took a stab at this issue, and I guess I've found the root cause of the problem. I am providing a patch for cassandra-3.0 branch. *IMHO*, it looks like when a table is created, the metrics Set for a specific key entry at {{TableMetrics.allTableMetrics}} is updated while the metrics {{Set}} is being iterated to get a summarized value to be passed to {{GraphiteReporter}}, as below, for example: {code} public Long getValue() { long total = 0; for (Metric cfGauge : allTableMetrics.get(name)) { total = total + ((Gauge<? extends Number>) cfGauge).getValue().longValue(); } return total; } {code} Even tough {{allTableMetrics}} is a thread-safe {{ConcurrentMap}}, *the {{Set}} iterated in the for-loop above is not!* Oddly enough, the {{ConcurrentModificationException}} reports the {{Map}} as the offending one instead of the {{Set}} inside the {{Map}} that's effectively being iterated (I guess that is is due to the nature of the for-each loop). *If this is the case*, the solution is to create a thread-safe {{Set}}. {{Collections#synchronizedSet}} will not work, but fortunately, we can also create a thread-safe {{Set}} backed by a {{ConcurrentHashMap}}. Until Java 8, we could do this as shown here: http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#newSetFromMap%28java.util.Map%29 But as C* uses Java 8 this can be done as here: http://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html#newKeySet-- Of course, I can be chasing my own tail (would not the first time, lol) and the problem has *nothing* to do with I exposed above, so, please, let me know what you think. :) > Creating a table leads to a race with GraphiteReporter > ------------------------------------------------------ > > Key: CASSANDRA-11823 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11823 > Project: Cassandra > Issue Type: Bug > Reporter: Stefano Ortolani > Priority: Minor > Labels: lhf > Attachments: CASSANDRA-11823.patch > > > Happened only on 3/4 nodes out of 13. > {code:xml} > INFO [MigrationStage:1] 2016-05-18 00:34:11,566 ColumnFamilyStore.java:381 - > Initializing schema.table > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-18 00:34:11,569 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.util.ConcurrentModificationException: null > at java.util.HashMap$HashIterator.nextNode(HashMap.java:1429) > ~[na:1.8.0_91] > at java.util.HashMap$KeyIterator.next(HashMap.java:1453) ~[na:1.8.0_91] > at > org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:690) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$33.getValue(TableMetrics.java:686) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_91] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_91] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)