[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2016-06-16 Thread Heiko Sommer (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333412#comment-15333412
 ] 

Heiko Sommer commented on CASSANDRA-10389:
--

I'm getting the same problem with Cassandra 2.2.5, cluster of 6 nodes, RF=2. 
As a workaround I must restart all nodes before running a repair. 

For sure I do not start multiple repairs simultaneously. Here is what happened 
the last time I tried it out: The previous incremental repair ("nodetool repair 
--partitioner-range -- mykeyspace") started on a single node after rolling 
cluster restart finished nicely, with the expected number of "Session completed 
successfully" logs. There were no more repair tasks or anticompaction tasks 
running, the cluster was stable. I restarted C* on 4 nodes, but left it running 
on 2 nodes. On one of the restarted nodes I ran an incremental repair again, 
this time also with the "--sequential" option. 
On the repairing node I get failure logs such as
{noformat}
java.lang.RuntimeException: Could not create snapshot at /10.195.62.171
at 
org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:79)
 ~[apache-cassandra-2.2.5.jar:2.2.5]
ERROR [Repair#1:16] 2016-06-16 07:10:29,239 CassandraDaemon.java:185 - 
Exception in thread Thread[Repair#1:16,5,RMI Runtime]
com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: Could not create snapshot at /10.195.62.171
at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387)
 ~[guava-16.0.jar:na]
{noformat}
while on the failing target nodes (those that were not restarted before the 
repair) I get logs such as
{noformat}
ERROR [AntiEntropyStage:1] 2016-06-16 07:10:29,237 
RepairMessageVerbHandler.java:108 - Cannot start multiple repair sessions over 
the same sstables
{noformat}

Before that, I also tried with full repair, and got the impression that it is 
the same problem for full or incremental repairs. 
As I can reproduce the issue, I would be glad to provide you with more logs or 
some experimenting if that would help resolve the issue. 

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
> Fix For: 2.2.x
>
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2016-02-17 Thread Dominik Keil (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151180#comment-15151180
 ] 

Dominik Keil commented on CASSANDRA-10389:
--

I think we're seeing this issue as well. Running Cassandra 2.2.5. Haven't tried 
restarting all nodes but will do that now.

We're running incremental repairs (now default, eh?) and while testing this 
before we put that into production we already found that repairing a whole 
keyspace will create a massive amount of open filehandles / "anti-compacted" 
sstables even though the repair will still only work one CF at a time. This 
caused some problems so we're now running repairs one CF at a time and on only 
one node at a time.

We did not have this issue in our testing but seing it in production now, 
nevertheless. What's interesting is that the node, on which the repair runs, at 
some point suddenly thrashes its heap (i.e. full heap usage, 65%-85% GC!!!) 
while at the same time produces huge amounts of tiny, concurrent reads, leading 
to really bad read latency from disk and a lot of I/O wait.

The bad thing is: This (Cassandra) node becomes so unresponsive that it 
significantly impacts the performance of the whole cluster (a total of 9 
machines, rf 5 / quorum for most reads/writes, rf 2 / one for less important 
bulk data). So neither the java driver nor the other nodes, when being 
coordinator, manage to just leave this node alone for a while. As soon as I 
disable gossip on this node, the rest of the cluster is fine again.

[~slebresne]: I applaud you for your very useful comment.

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
> Fix For: 2.2.x
>
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2016-01-20 Thread Jeff Gardner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108643#comment-15108643
 ] 

Jeff Gardner commented on CASSANDRA-10389:
--

We are also experiencing this issue in 2.2.3; and yes we have restarted all 
nodes.

Our config:

Cassandra 2.2.3
AWS west-2 and east-1 regions with 6 nodes per region [2/AZ]

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
> Fix For: 2.2.x
>
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2016-01-20 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108763#comment-15108763
 ] 

Sylvain Lebresne commented on CASSANDRA-10389:
--

bq. is there a known work around?

yes, stop starting multiple competing repair simultaneously.

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
> Fix For: 2.2.x
>
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
> {quote}
> Now, after 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2016-01-20 Thread Jeff Gardner (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15108758#comment-15108758
 ] 

Jeff Gardner commented on CASSANDRA-10389:
--

this is becoming a significant problem in our production environment... can 
this be escalated or is there a known work around?


> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
> Fix For: 2.2.x
>
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
> 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2016-01-17 Thread Kai Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103940#comment-15103940
 ] 

Kai Wang commented on CASSANDRA-10389:
--

[~jedrzej.sieracki] not sure if you still have this problem. But have you tried 
to restart all the C* instances?

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
> Fix For: 2.2.x
>
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
> {quote}
> Now, after 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2015-09-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14906018#comment-14906018
 ] 

Jędrzej Sieracki commented on CASSANDRA-10389:
--

After checking the logs more thoroughly, the issue seems to be "Cannot start 
multiple repair sessions over the same sstables".

The interesting log portions from repair session run on cblade1:

{quote}
INFO  [Repair#24:1] 2015-09-24 09:58:37,480 RepairJob.java:107 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] requesting merkle trees for 
stock_increment_agg (to [/cblade10, cblade1])
INFO  [Repair#24:1] 2015-09-24 09:58:37,480 RepairJob.java:181 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] Requesting merkle trees for 
stock_increment_agg (to [/cblade10, cblade1])
ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 
CompactionManager.java:1070 - Cannot start multiple repair sessions over the 
same sstables
ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 Validator.java:246 - 
Failed creating a merkle tree for [repair #0fc98340-6292-11e5-b992-9f13fa8664c8 
on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]], /cblade1(see log for details)
INFO  [AntiEntropyStage:1] 2015-09-24 09:58:37,481 RepairSession.java:181 - 
[repair #0fc98340-6292-11e5-b992-9f13fa8664c8] Received merkle tree for 
stock_increment_agg from /cblade1
ERROR [ValidationExecutor:28] 2015-09-24 09:58:37,481 CassandraDaemon.java:183 
- Exception in thread Thread[ValidationExecutor:28,1,main]
java.lang.RuntimeException: Cannot start multiple repair sessions over the same 
sstables
at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:1071)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$700(CompactionManager.java:94)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
at 
org.apache.cassandra.db.compaction.CompactionManager$10.call(CompactionManager.java:669)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[na:1.8.0_60]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
WARN  [RepairJobTask:1] 2015-09-24 09:58:37,481 RepairJob.java:162 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] stock_increment_agg sync failed
ERROR [RepairJobTask:2] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 - 
Exception in thread Thread[RepairJobTask:2,5,RMI Runtime]
org.apache.cassandra.exceptions.RepairException: [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.dforcom.localdomain/cblade1
at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.1.jar:2.2.1]
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:399)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:158)
 ~[apache-cassandra-2.2.1.jar:2.2.1]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[apache-cassandra-2.2.1.jar:2.2.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_60]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
INFO  [Repair#24:2] 2015-09-24 09:58:37,482 RepairJob.java:107 - [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8] requesting merkle trees for 
receipt_agg_total (to [/cblade10, cblade1.dforcom.localdomain/cblade1])
ERROR [Repair#24:1] 2015-09-24 09:58:37,482 CassandraDaemon.java:183 - 
Exception in thread Thread[Repair#24:1,5,RMI Runtime]
com.google.common.util.concurrent.UncheckedExecutionException: 
org.apache.cassandra.exceptions.RepairException: [repair 
#0fc98340-6292-11e5-b992-9f13fa8664c8 on perspectiv/stock_increment_agg, 
(-5927186132136652665,-5917344746039874798]] Validation failed in 
cblade1.dforcom.localdomain/cblade1
at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1387)
 ~[guava-16.0.jar:na]
at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1373) 
~[guava-16.0.jar:na]
at org.apache.cassandra.repair.RepairJob.run(RepairJob.java:169) 
~[apache-cassandra-2.2.1.jar:2.2.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 

[jira] [Commented] (CASSANDRA-10389) Repair session exception Validation failed

2015-09-23 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14904474#comment-14904474
 ] 

Yuki Morishita commented on CASSANDRA-10389:


What kind of error do you see in replica nodes (in cblade1 or other nodes that 
failed to validate)?

> Repair session exception Validation failed
> --
>
> Key: CASSANDRA-10389
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10389
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Debian 8, Java 1.8.0_60, Cassandra 2.2.1 (datastax 
> compilation)
>Reporter: Jędrzej Sieracki
>
> I'm running a repair on a ring of nodes, that was recently extented from 3 to 
> 13 nodes. The extension was done two days ago, the repair was attempted 
> yesterday.
> {quote}
> [2015-09-22 11:55:55,266] Starting repair command #9, repairing keyspace 
> perspectiv with repair options (parallelism: parallel, primary range: false, 
> incremental: true, job threads: 1, ColumnFamilies: [], dataCenters: [], 
> hosts: [], # of ranges: 517)
> [2015-09-22 11:55:58,043] Repair session 1f7c50c0-6110-11e5-b992-9f13fa8664c8 
> for range (-5927186132136652665,-5917344746039874798] failed with error 
> [repair #1f7c50c0-6110-11e5-b992-9f13fa8664c8 on 
> perspectiv/stock_increment_agg, (-5927186132136652665,-5917344746039874798]] 
> Validation failed in cblade1.XXX/XXX (progress: 0%)
> {quote}
> BTW, I am ignoring the LEAK errors for now, that's outside of the scope of 
> the main issue:
> {quote}
> ERROR [Reference-Reaper:1] 2015-09-22 11:58:27,843 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4d25ad8f) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@896826067:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-73-big
>  was not released before the reference was garbage collected
> {quote}
> I scrubbed the sstable with failed validation on cblade1 with nodetool scrub 
> perspectiv stock_increment_agg:
> {quote}
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  (345466609 bytes)
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:31,615 OutputHandler.java:42 
> - Scrubbing 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  (60496378 bytes)
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ca8951e) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@114161559:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-48-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@eeb6383) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1612685364:/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@1de90543) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@2058626950:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-49-big
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2015-09-22 12:05:31,676 Ref.java:187 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@15616385) to class 
> org.apache.cassandra.io.sstable.format.SSTableReader$InstanceTidier@1386628428:/var/lib/cassandra/data/perspectiv/receipt_agg_total-76abb0625de711e59f6e0b7d98a25b6e/la-47-big
>  was not released before the reference was garbage collected
> INFO  [CompactionExecutor:1703] 2015-09-22 12:05:35,098 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-82-big-Data.db')
>  complete: 51397 rows in new sstable and 0 empty (tombstoned) rows dropped
> INFO  [CompactionExecutor:1704] 2015-09-22 12:05:47,605 OutputHandler.java:42 
> - Scrub of 
> BigTableReader(path='/var/lib/cassandra/data/perspectiv/stock_increment_agg-840cad405de711e5b9929f13fa8664c8/la-83-big-Data.db')
>  complete: 292600 rows in new sstable and 0 empty (tombstoned) rows dropped
> {quote}
> Now, after