[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333412#comment-15333412 ] Heiko Sommer commented on CASSANDRA-10389: -- I'm getting the same problem with Cassandra 2.2.5, cluster of 6 nodes, RF=2. As a workaround I must restart all nodes before running a repair. For sure I do not start multiple repairs simultaneously. Here is what happened the last time I tried it out: The previous incremental repair ("nodetool repair --partitioner-range -- mykeyspace") started on a single node after rolling cluster restart finished nicely, with the expected number of "Session completed successfully" logs. There were no more repair tasks or anticompaction tasks running, the cluster was stable. I restarted C* on 4 nodes, but left it running on 2 nodes. On one of the restarted nodes I ran an incremental repair again, this time also with the "--sequential" option. On the repairing node I get failure logs such as {noformat} java.lang.RuntimeException: Could not create snapshot at /10.195.62.171 at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:79) ~[apache-cassandra-2.2.5.jar:2.2.5] ERROR [Repair#1:16] 2016-06-16 07:10:29,239 CassandraDaemon.java:185 - Exception in thread Thread[Repair#1:16,5,RMI Runtime] com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Could not create snapshot at /10.195.62.171 at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387) ~[guava-16.0.jar:na] {noformat} while on the failing target nodes (those that were not restarted before the repair) I get logs such as {noformat} ERROR [AntiEntropyStage:1] 2016-06-16 07:10:29,237 RepairMessageVerbHandler.java:108 - Cannot start multiple repair sessions over the same sstables {noformat} Before that, I also tried with full repair, and got the impression that it is the same problem for full or incremental repairs. As I can reproduce the issue, I would be glad to provide you with more logs or some experimenting if that would help resolve the issue. > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151180#comment-15151180 ] Dominik Keil commented on CASSANDRA-10389: -- I think we're seeing this issue as well. Running Cassandra 2.2.5. Haven't tried restarting all nodes but will do that now. We're running incremental repairs (now default, eh?) and while testing this before we put that into production we already found that repairing a whole keyspace will create a massive amount of open filehandles / "anti-compacted" sstables even though the repair will still only work one CF at a time. This caused some problems so we're now running repairs one CF at a time and on only one node at a time. We did not have this issue in our testing but seing it in production now, nevertheless. What's interesting is that the node, on which the repair runs, at some point suddenly thrashes its heap (i.e. full heap usage, 65%-85% GC!!!) while at the same time produces huge amounts of tiny, concurrent reads, leading to really bad read latency from disk and a lot of I/O wait. The bad thing is: This (Cassandra) node becomes so unresponsive that it significantly impacts the performance of the whole cluster (a total of 9 machines, rf 5 / quorum for most reads/writes, rf 2 / one for less important bulk data). So neither the java driver nor the other nodes, when being coordinator, manage to just leave this node alone for a while. As soon as I disable gossip on this node, the rest of the cluster is fine again. [~slebresne]: I applaud you for your very useful comment. > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference >
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108643#comment-15108643 ] Jeff Gardner commented on CASSANDRA-10389: -- We are also experiencing this issue in 2.2.3; and yes we have restarted all nodes. Our config: Cassandra 2.2.3 AWS west-2 and east-1 regions with 6 nodes per region [2/AZ] > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108763#comment-15108763 ] Sylvain Lebresne commented on CASSANDRA-10389: -- bq. is there a known work around? yes, stop starting multiple competing repair simultaneously. > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped > {quote} > Now, after
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108758#comment-15108758 ] Jeff Gardner commented on CASSANDRA-10389: -- this is becoming a significant problem in our production environment... can this be escalated or is there a known work around? > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped >
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103940#comment-15103940 ] Kai Wang commented on CASSANDRA-10389: -- [~jedrzej.sieracki] not sure if you still have this problem. But have you tried to restart all the C* instances? > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped > {quote} > Now, after
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906018#comment-14906018 ] Jędrzej Sieracki commented on CASSANDRA-10389: -- After checking the logs more thoroughly, the issue seems to be "Cannot start multiple repair sessions over the same sstables". The interesting log portions from repair session run on cblade1: {quote} INFO [Repair#24:1] 2015-09-24 09:58:37,480 RepairJob.java:107 - [repair #0fc98340-6292-11e5-b992-9f13fa8664c8] requesting merkle trees for stock_increment_agg (to [/cblade10, cblade1]) INFO [Repair#24:1] 2015-09-24 09:58:37,480 RepairJob.java:181 - [repair #0fc98340-6292-11e5-b992-9f13fa8664c8] Requesting merkle trees for stock_increment_agg (to [/cblade10, cblade1]) ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 CompactionManager.java:1070 - Cannot start multiple repair sessions over the same sstables ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 Validator.java:246 - Failed creating a merkle tree for [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]], /cblade1(see log for details) INFO [AntiEntropyStage:1] 2015-09-24 09:58:37,481 RepairSession.java:181 - [repair #0fc98340-6292-11e5-b992-9f13fa8664c8] Received merkle tree for stock_increment_agg from /cblade1 ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 CassandraDaemon.java:183 - Exception in thread Thread[ValidationExecutor:28,1,main] java.lang.RuntimeException: Cannot start multiple repair sessions over the same sstables at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1071) ~[apache-cassandra-2.2.1.jar:2.2.1] at org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:94) ~[apache-cassandra-2.2.1.jar:2.2.1] at org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:669) ~[apache-cassandra-2.2.1.jar:2.2.1] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] WARN [RepairJobTask:1] 2015-09-24 09:58:37,481 RepairJob.java:162 - [repair #0fc98340-6292-11e5-b992-9f13fa8664c8] stock_increment_agg sync failed ERROR [RepairJobTask:2] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 - Exception in thread Thread[RepairJobTask:2,5,RMI Runtime] org.apache.cassandra.exceptions.RepairException: [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] Validation failed in cblade1.dforcom.localdomain/cblade1 at org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) ~[apache-cassandra-2.2.1.jar:2.2.1] at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) ~[apache-cassandra-2.2.1.jar:2.2.1] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399) ~[apache-cassandra-2.2.1.jar:2.2.1] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158) ~[apache-cassandra-2.2.1.jar:2.2.1] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) ~[apache-cassandra-2.2.1.jar:2.2.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_60] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] INFO [Repair#24:2] 2015-09-24 09:58:37,482 RepairJob.java:107 - [repair #0fc98340-6292-11e5-b992-9f13fa8664c8] requesting merkle trees for receipt_agg_total (to [/cblade10, cblade1.dforcom.localdomain/cblade1]) ERROR [Repair#24:1] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 - Exception in thread Thread[Repair#24:1,5,RMI Runtime] com.google.common.util.concurrent.UncheckedExecutionException: org.apache.cassandra.exceptions.RepairException: [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] Validation failed in cblade1.dforcom.localdomain/cblade1 at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387) ~[guava-16.0.jar:na] at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) ~[guava-16.0.jar:na] at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) ~[apache-cassandra-2.2.1.jar:2.2.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed
[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904474#comment-14904474 ] Yuki Morishita commented on CASSANDRA-10389: What kind of error do you see in replica nodes (in cblade1 or other nodes that failed to validate)? > Repair session exception Validation failed > -- > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) >Reporter: Jędrzej Sieracki > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped > {quote} > Now, after