[ https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108758#comment-15108758 ]
Jeff Gardner commented on CASSANDRA-10389: ------------------------------------------ this is becoming a significant problem in our production environment... can this be escalated or is there a known work around? > Repair session exception Validation failed > ------------------------------------------ > > Key: CASSANDRA-10389 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10389 > Project: Cassandra > Issue Type: Bug > Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax > compilation) > Reporter: Jędrzej Sieracki > Fix For: 2.2.x > > > I'm running a repair on a ring of nodes, that was recently extented from 3 to > 13 nodes. The extension was done two days ago, the repair was attempted > yesterday. > {quote} > [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace > perspectiv with repair options (parallelism: parallel, primary range: false, > incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], > hosts: [], # of ranges: 517) > [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 > for range (-5927186132136652665,-5917344746039874798] failed with error > [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] > Validation failed in cblade1.XXX/XXX (progress: 0%) > {quote} > BTW, I am ignoring the LEAK errors for now, that's outside of the scope of > the main issue: > {quote} > ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big > was not released before the reference was garbage collected > {quote} > I scrubbed the sstable with failed validation on cblade1 with nodetool scrub > perspectiv stock_increment_agg: > {quote} > INFO [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > (345466609 bytes) > INFO [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 > - Scrubbing > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > (60496378 bytes) > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class > org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big > was not released before the reference was garbage collected > INFO [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db') > complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped > INFO [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 > - Scrub of > BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db') > complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped > {quote} > Now, after scrubbing, another repair was attempted, it did finish, but with > lots of errors from other nodes: > {quote} > [2015-09-22 12:01:18,020] Repair session db476b51-6110-11e5-b992-9f13fa8664c8 > for range (5019296454787813261,5021512586040808168] failed with error [repair > #db476b51-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, > (5019296454787813261,5021512586040808168]] Validation failed in /10.YYY > (progress: 91%) > [2015-09-22 12:01:18,079] Repair session db482ea1-6110-11e5-b992-9f13fa8664c8 > for range (-3660233266780784242,-3638577078894365342] failed with error > [repair #db482ea1-6110-11e5-b992-9f13fa8664c8 on > perspectiv/stock_increment_agg, (-3660233266780784242,-3638577078894365342]] > Validation failed in /10.XXX (progress: 92%) > [2015-09-22 12:01:18,276] Repair session db4a0361-6110-11e5-b992-9f13fa8664c8 > for range (9158857758535272856,9167427882441871745] failed with error [repair > #db4a0361-6110-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, > (9158857758535272856,9167427882441871745]] Validation failed in /10.YYY > (progress: 95%) > {quote} > After scrubbing stock_increment_agg on all nodes, just to be sure, the repair > still failed, this time with the following exception: > {quote} > INFO [Repair#16:50] 2015-09-22 12:08:47,471 RepairJob.java:181 - [repair > #ea123bf3-6111-11e5-b992-9f13fa8664c8] Requesting merkle trees for > stock_increment_agg (to [/10.60.77.202, cblade1.XXX/XXX]) > ERROR [RepairJobTask:1] 2015-09-22 12:08:47,471 RepairSession.java:290 - > [repair #ea123bf0-6111-11e5-b992-9f13fa8664c8] Session completed with the > following error > org.apache.cassandra.exceptions.RepairException: [repair > #ea123bf0-6111-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, > (355657753119264326,366309649129068298]] Validation failed in cblade1. > at > org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) > ~[apache-cassandra-2.2.1.jar:2.2.1] > at > org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) > ~[apache-cassandra-2.2.1.jar:2.2.1] > at > org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399) > ~[apache-cassandra-2.2.1.jar:2.2.1] > at > org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158) > ~[apache-cassandra-2.2.1.jar:2.2.1] > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[apache-cassandra-2.2.1.jar:2.2.1] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)