[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-07-04 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406844#comment-13406844
 ] 

Anton Winter commented on CASSANDRA-4321:
-

New issue raised as requested: CASSANDRA-4411

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTra

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-07-04 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406395#comment-13406395
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

Damn. Ok, since this has been released with 1.1.2 already, would you mind 
opening a new one?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.a

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-07-03 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406254#comment-13406254
 ] 

Anton Winter commented on CASSANDRA-4321:
-

I have repeatedly run sstablescrub across all my nodes and the exceptions do 
not occur as frequently now, however, the integrity check still throw 
exceptions.  compactionstats shows a large number of pending tasks but no 
progression after this error.

Should this ticket be reopened or a new one raised?

{code}
ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:912,1,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
at 
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
at 
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}


> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.hel

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-29 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404338#comment-13404338
 ] 

Anton Winter commented on CASSANDRA-4321:
-

1.1 dev branch + patches

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-29 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404336#comment-13404336
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

To clarify, is this 1.1.1 release + patches, or 1.1 dev branch + patches?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataT

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-29 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404333#comment-13404333
 ] 

Anton Winter commented on CASSANDRA-4321:
-

Maybe I spoke too soon.  Overnight I've seen the exceptions happen again on 
nodes that were v7 patched & scrubbed.  

{code}
ERROR [CompactionExecutor:1301] 2012-06-29 21:54:12,078 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:1301,1,main]
java.lang.RuntimeException: Last written key 
DecoratedKey(116816802911061669023614481109871014436, 4faa631ca88ef85b8e26ddeb) 
>= current key DecoratedKey(115179899219377463875853982254751557438, 
4fa892bf42d3f24479f627b6) writing into 
/var/lib//data/cassandra/KS/CF/KS-CF-tmp-hd-837655-Data.db
at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
{code}

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-29 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403764#comment-13403764
 ] 

Anton Winter commented on CASSANDRA-4321:
-

I've applied the v7 patches and have successfully offline scrubbed & reinserted 
a number of nodes in my ring without further occurrence of the previous issues. 
 Thanks :)

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 0001-Fix-overlapping-computation-v7.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, 
> 0003-Create-standalone-scrub-v7.txt, 
> 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402995#comment-13402995
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

bq. CompactionsTest.testBlacklistingWithLeveledCompactionStrategy is currently 
failing in trunk because of a similar integrity check that I added to promote()

That's a bug in the integrity check in fact, that should skip the check if 
newLevel=0 (since we can have newLevel=0 following CASSANDRA-4341). With that 
fixed there is no more failure of that unit test (I've committed the fix as 
4725bf71e18550ac60f9).

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, 
> 0003-Create-standalone-scrub-v6.txt, 
> 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.I

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-27 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402518#comment-13402518
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

CompactionsTest.testBlacklistingWithLeveledCompactionStrategy is currently 
failing in trunk because of a similar integrity check that I added to promote() 
for CASSANDRA-1991:

{code}
.   DecoratedKey last = null;
Collections.sort(generations[newLevel], SSTable.sstableComparator);
for (SSTableReader sstable : generations[newLevel])
{
assert last == null || sstable.first.compareTo(last) > 0;
last = sstable.last;
}
{code}

Patch 0001 does not fix that test failure.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, 
> 0003-Create-standalone-scrub-v6.txt, 
> 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.uti

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-26 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401496#comment-13401496
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

Hum, yes that's possible. I figured this wouldn't be the problem since they 
were using 1.1.1 but you are right that if they tried against the 1.1 branch 
that could have been it. Worth checking on current 1.1 branch I guess.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, 
> 0003-Create-standalone-scrub-v6.txt, 
> 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.Interva

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-26 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401494#comment-13401494
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

Could the CASSANDRA-4341 regression have caused what Anton saw, if he was 
running from the 1.1 branch?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, 
> 0003-Create-standalone-scrub-v6.txt, 
> 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
>

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-22 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399760#comment-13399760
 ] 

Anton Winter commented on CASSANDRA-4321:
-

bq. Was I lucky? Are you guys able to reproduce those steps and still get more 
errors?

As discussed, but repeated here just for the ticket's reference; I was patching 
and scrubbing in the same way as described above.  Once the scrubbed nodes were 
restarted in the cluster they were then under normal read/write load and 
experienced the exceptions again.  Given that the sstablescrub and subsequent 
compactions run fine in Sylvain's test, using my out of order sstables, means 
that the sstablescrub command appears to do its job fine.  The root cause, 
originally expected to be resolved with the 0001 patch, still appears to be 
occurring so Sylvain was going to investigate the code further.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-22 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399299#comment-13399299
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

So I had offline-scrubbed the live Cassandra node and I had copied the sstables 
that participated in one of the failed compaction. Assuming the sstables had 
been offline-scrubbed, I had skipped step 2 above locally, so unfortunately I 
can't yet reproduce it locally with a limited set of data.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.In

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-22 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399268#comment-13399268
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

bq. which leads me to suspect if it's due to the "Range" (vs. Bounds) used in 
LeveledCompactionStrategy::getScanners. Any ideas?

No, that Range is correct because this correspond to repair ranges and is 
correctly interpreted. And in fact, when doing compaction that range is 
actually null so that cannot be the problem.


So Anton sent me some sstables that triggered an out-of-order exception when 
compacting them. What I did with that is:
1) apply the last version of the v5 patch on this issue on top of the current 
git cassandra-1.1 branch (using the release of CASSANDRA 1.1.1 would probably 
work too because I don't think there is any other related fix since 1.1.1 that 
went into the git branch but anyway, that's what I used).
2) I ran the offline scrub *while the node was stopped*. I insist on that last 
part because that having the node run during the offline scrub would mess 
things up. I'd actually like to make the offline scrub check if the node is 
running and refuse to work in that case but I'm not sure of what is the best 
way to do that.
3) I restarted the node once the scrub was done

I was then able to compact the node fully (i.e, I ran compaction until there 
was nothing more to compact) and this without hitting any more error.

Was I lucky? Are you guys able to reproduce those steps and still get more 
errors?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.util

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-21 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398563#comment-13398563
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

I experienced the same as Anton's. One observation is that the out-of-order key 
being wrongly iterated by CompactionIterable's MergeIterator which causes the 
exception, happen to be the start of an interval:

DEBUG 18:10:41,693 Creating IntervalNode from [... 
Interval(DecoratedKey(33736808147257072541807562080745136438, ... ), 

which leads me to suspect if it's due to the "Range" (vs. Bounds) used in 
LeveledCompactionStrategy::getScanners. Any ideas?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNo

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-21 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398326#comment-13398326
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

bq. The above error is due to the %d in StandaloneScrubber.java:179's 
interpolation.

Yes, sorry for that typo. I've updated the v5 patch to fix it.

bq. Got "java.lang.OutOfMemoryError: Java heap space" with -Xmx256M

The last version of the offline scrub "loads" all sstable readers, which means 
in particular that it loads the summary of the key index and the sstable bloom 
filter. In other words, it does use a bit more memory, so it's not necessarily 
surprising that -Xmx256M is not enough.

bq. LeveledCompactionStrategyTest:testValidationMultipleSSTablePerLevel fails 
because of junit timeout when I run it together with all other suits, but 
passes when I only run LeveledCompactionStrategyTest suit. Is it related?

I doubt it. I've already seen test timeout when run with the full suit but not 
alone. I wouldn't worry too much about that. At least that test is working fine 
on my machine.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(In

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-20 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398206#comment-13398206
 ] 

Anton Winter commented on CASSANDRA-4321:
-

After working around the issue with the 0003 v5 patch that Omid refers I've had 
an sstablescrub complete on one of my servers.  sstablescrub did detected 
several overlapping sstables, resetting them to L0, but no out of order keys.

The Last written key DecoratedKey >= current key exception however resurfaces 
again after the first set of compactions, 5 minutes after startup, in the exact 
same manner as before.  The same exception occurs for various CF's until 
compactions stop completely.  compactionstats still shows a large number of 
pending compaction tasks after this event.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.ca

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-20 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397814#comment-13397814
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

The above error is due to the %d in StandaloneScrubber.java:179's 
interpolation. Will fix and try again.

Jonathan: Sun Java 6, 1.6.0_26

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397808#comment-13397808
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

Are you using Java6 or Java7?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassa

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-20 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397805#comment-13397805
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Got "java.lang.OutOfMemoryError: Java heap space" with -Xmx256M.

Tried with -Xmx512M and the scrub failed with:

{code}
Checking leveled manifest
d != org.apache.cassandra.io.sstable.SSTableReader
java.util.IllegalFormatConversionException: d != 
org.apache.cassandra.io.sstable.SSTableReader
at 
java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:3999)
at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2709)
at java.util.Formatter$FormatSpecifier.print(Formatter.java:2661)
at java.util.Formatter.format(Formatter.java:2433)
at java.util.Formatter.format(Formatter.java:2367)
at java.lang.String.format(String.java:2769)
at 
org.apache.cassandra.tools.StandaloneScrubber.checkManifest(StandaloneScrubber.java:179)
at 
org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:148)
{code}

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-20 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397632#comment-13397632
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Will try it again. 
LeveledCompactionStrategyTest:testValidationMultipleSSTablePerLevel fails 
because of junit timeout when I run it together with all other suits, but 
passes when I only run LeveledCompactionStrategyTest suit. Is it related?
{code}
[junit] Testsuite: 
org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] 
[junit] Testcase: 
org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest:testValidationMultipleSSTablePerLevel:
   Caused an ERROR
[junit] Timeout occurred. Please note the time in the report does not 
reflect the time until the timeout.
[junit] junit.framework.AssertionFailedError: Timeout occurred. Please note 
the time in the report does not reflect the time until the timeout.
[junit] 
[junit] 
[junit] Test 
org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest FAILED 
(timeout)
{code}

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, 
> 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apa

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-20 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397437#comment-13397437
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

The exceptions happens not quickly afterwards but after some rounds of 
compaction on the CF. I had re-bootstrapped so there are tons of ~(1500) 
pending compaction tasks. If I restart the node and compact the problem happens 
again and I can reproduce it. I'll send you an email about the data.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396906#comment-13396906
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

Did the new exception happened quickly after having started the node with the 
scrubbed files? Are you able to reproduce easily (i.e. if you restart the node 
and compact, do you still get the same error). If you are able to reproduce, 
would you be at liberty to provide a set of sstables that produce the error 
during compaction (in private for instance). It would be much more easy to tack 
that down with an easy way to reproduce.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-19 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396903#comment-13396903
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Let me know if I can provide more data.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTrac

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-19 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396895#comment-13396895
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Exactly.

- Applied v3
- Ran offline scrub and it failed because of tmp files.
- Started Cassandra and saw failures.
- Applied v4 to cassandra-1.1 branch.
- Ran offline scrub successfully.
- Started Cassandra successfully.
- Compaction failed because of above error.

All done on the same instance.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
>  

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-19 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396862#comment-13396862
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

Just to make sure: you did apply 
0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt before 
restarting the server after having run the offline scrub, right?

If so, that would mean we have yet another bug that generates out of order keys 
during compaction.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apa

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-19 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396804#comment-13396804
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Tried v4 patch and offline scrub went through completely. Cassandra started 
without any error but compaction halted again:

{code}
2012-06-19_14:01:03.47432  INFO 14:01:03,474 Compacting 
[SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-67792-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65607-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-63279-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65491-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-68332-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-64720-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65322-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-66557-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-64504-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-68179-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65005-Data.db')]
2012-06-19_14:01:08.73528 ERROR 14:01:08,733 Exception in thread 
Thread[CompactionExecutor:11,1,main]
2012-06-19_14:01:08.73538 java.lang.RuntimeException: Last written key 
DecoratedKey(42351003983459534782466386414991462257, 
313632303432347c3130303632313432) >= current key 
DecoratedKey(38276735926421753773204634663641518108, 
31343638373735327c3439343837333932) writing into 
/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-68399-Data.db
2012-06-19_14:01:08.73572   at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
2012-06-19_14:01:08.73581   at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
2012-06-19_14:01:08.73590   at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
2012-06-19_14:01:08.73600   at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
2012-06-19_14:01:08.73611   at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
2012-06-19_14:01:08.73622   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
2012-06-19_14:01:08.73633   at 
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
2012-06-19_14:01:08.73642   at 
java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
2012-06-19_14:01:08.73650   at java.util.concurrent.FutureTask.run(Unknown 
Source)
2012-06-19_14:01:08.73657   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
2012-06-19_14:01:08.73665   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2012-06-19_14:01:08.73672   at java.lang.Thread.run(Unknown Source)
{code}

All SSTables that participated in compaction were new ones written by the 
offline scrub (according their timestamp and also id range.) although the first 
one didn't exist any more (already promoted before the exception?)

{quote}This is not really a new bug, but I believe that prior to 
CASSANDRA-4142, *this had less consequences*.{quote}

Sylvain, could you please elaborate on this? I'd like to know how pre-1.1.1 
data is affected by the Range-vs-Bounds bug. Only overlapping/duplicate 
sstables on the same level leading to slower reads caused by unneeded sstable 
lookups?


> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-18 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396433#comment-13396433
 ] 

Anton Winter commented on CASSANDRA-4321:
-

I can confirm I also experienced the "Unexpected empty index file" errors on 
some of the nodes that I have run sstablescrub on.

On some other nodes the sstablescrub command appears to complete successfully 
but compactions still stops at the "java.lang.RuntimeException: Last written 
key DecoratedKey" error.

Is there any further information we can supply to help debug?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.Int

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-18 Thread Al Tobey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396319#comment-13396319
 ] 

Al Tobey commented on CASSANDRA-4321:
-

Offline scrub ran fine for me.  I downgraded to 1.1.0 and ran a compaction and 
it looks fine.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(D

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-18 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396246#comment-13396246
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Thanks for the patch. Offline scrub is indeed very useful.

Tried the v3 patches and the scrub didn't complete, possibly because of a 
different issue, with the following failed assertion:

{code}
Exception in thread "main" java.lang.AssertionError: Unexpected empty index 
file: 
RandomAccessReader(filePath='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-33827-Index.db',
 skipIOCache=true)
at 
org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:221)
at 
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:376)
at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203)
at 
org.apache.cassandra.io.sstable.SSTableReader.openNoValidation(SSTableReader.java:143)
at 
org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:79)
{code}

which consequently, it encountered a corrupt SSTable during start-up:

{code}
2012-06-18_20:36:19.89543  INFO 20:36:19,895 Opening 
/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-24984 (1941993 bytes)
2012-06-18_20:36:19.90217 ERROR 20:36:19,900 Exception in thread 
Thread[SSTableBatchOpen:9,5,main]
2012-06-18_20:36:19.90222 java.lang.IllegalStateException: SSTable first key 
DecoratedKey(41255474878128469814942789647212295629, 
31303132393937357c3337313730333536) > last key 
DecoratedKey(41219536226656199861610796307350537953, 
31303234323538397c3331383436373338)
2012-06-18_20:36:19.90261   at 
org.apache.cassandra.io.sstable.SSTableReader.validate(SSTableReader.java:441)
2012-06-18_20:36:19.90275   at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:208)
2012-06-18_20:36:19.90291   at 
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153)
2012-06-18_20:36:19.90309   at 
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:245)
2012-06-18_20:36:19.90324   at 
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
2012-06-18_20:36:19.90389   at 
java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
2012-06-18_20:36:19.90391   at java.util.concurrent.FutureTask.run(Unknown 
Source)
2012-06-18_20:36:19.90391   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
2012-06-18_20:36:19.90392   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2012-06-18_20:36:19.90392   at java.lang.Thread.run(Unknown Source)
{code}

although didn't prevent Cassandra from starting up, but compaction failed 
subsequently:

{code}
2012-06-18_20:51:41.79122 ERROR 20:51:41,790 Exception in thread 
Thread[CompactionExecutor:81,1,main]
2012-06-18_20:51:41.79131 java.lang.RuntimeException: Last written key 
DecoratedKey(12341204629749023303706929560940823070, 33363037353338) >= current 
key DecoratedKey(12167298275958419273792070792442127650, 31363431343537) 
writing into 
/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-40992-Data.db
2012-06-18_20:51:41.79161   at 
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
2012-06-18_20:51:41.79169   at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
2012-06-18_20:51:41.79180   at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
2012-06-18_20:51:41.79189   at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
2012-06-18_20:51:41.79199   at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
2012-06-18_20:51:41.79210   at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
2012-06-18_20:51:41.79218   at 
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
2012-06-18_20:51:41.79227   at 
java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
2012-06-18_20:51:41.79235   at java.util.concurrent.FutureTask.run(Unknown 
Source)
2012-06-18_20:51:41.79242   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
2012-06-18_20:51:41.79250   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
2012-06-18_20:51:41.79259   at java.lang.Thread.run(Unknown Source)
{code}

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>A

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296042#comment-13296042
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

cassandra-1.1 branch

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandra.db.D

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-15 Thread Al Tobey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296040#comment-13296040
 ] 

Al Tobey commented on CASSANDRA-4321:
-

What SHA / tag should these patches apply against? I've tried trunk, 1.1.1 and 
1.1.0 and can't get a clean apply. I'll try a manual merge tomorrow.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, 
> 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> o

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-14 Thread Al Tobey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295290#comment-13295290
 ] 

Al Tobey commented on CASSANDRA-4321:
-

BTW if somebody points me to a build, tag, or commit ID to test, I'll push it 
out right away. It's only a 3-node cluster and I can easily take a filesystem 
snapshot before running.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(Interv

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-14 Thread Al Tobey (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295286#comment-13295286
 ] 

Al Tobey commented on CASSANDRA-4321:
-

I think I just hit the same thing. We're using reverse comparator with 
bytescomparator on the CF's that seem to be having trouble if that's relevant 
at all.

Cluster is 1.1.1 on Ubuntu 12.04 and only has about 7GB per node at the moment.

Stacktrace attached.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt, 
> ooyala-hastur-stacktrace.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:6

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-14 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295171#comment-13295171
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Scrubbed the column family on a node which had booted up with assertions `on` 
and there were still corrupt sstables.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(Inte

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-12 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293762#comment-13293762
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Tried the patch but the server still doesn't start. The StackOverFlow that gets 
thrown causes an already loaded column family to be loaded again:

Load CF1:

{code}
reading saved cache /var/lib/cassandra/abcd/saved_caches/SOMEKSP-CF1-KeyCache
2012-06-12_16:18:04.12387  INFO 16:18:04,123 Opening 
/var/lib/cassandra/abcd/data/SOMEKSP/CF1/SOMEKSP-CF1-hd-2248
...
{code}

Load CF3 which has the corrupted sstables
{code}
2012-06-12_15:31:20.56185  INFO 15:31:20,561 Opening 
/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-7924 (2372910 bytes)
2012-06-12_15:31:20.81897 ERROR 15:31:20,811 Exception in thread 
Thread[OptionalTasks:1,5,main]
2012-06-12_15:31:20.81901 java.lang.StackOverflowError
2012-06-12_15:31:20.81901   at 
org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:90)
2012-06-12_15:31:20.81906   at 
org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38)
2012-06-12_15:31:20.81918   at java.util.Arrays.mergeSort(Unknown Source)
2012-06-12_15:31:20.81927   at java.util.Arrays.mergeSort(Unknown Source)
2012-06-12_15:31:20.81934   at java.util.Arrays.mergeSort(Unknown Source)
2012-06-12_15:31:20.81940   at java.util.Arrays.mergeSort(Unknown Source)
2012-06-12_15:31:20.81946   at java.util.Arrays.mergeSort(Unknown Source)
2012-06-12_15:31:20.81954   at java.util.Arrays.sort(Unknown Source)
2012-06-12_15:31:20.81960   at java.util.Collections.sort(Unknown Source)
2012-06-12_15:31:20.81980   at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
2012-06-12_15:31:20.81981   at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
2012-06-12_15:31:20.81990   at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)

// stacktrace goes on

2012-06-12_15:31:20.88633   at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
2012-06-12_15:31:20.88643   at 
org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
2012-06-12_15:31:20.88654   at 
org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
2012-06-12_15:31:20.88664   at 
org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
2012-06-12_15:31:20.88673   at 
org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
2012-06-12_15:31:20.88683   at 
org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
2012-06-12_15:31:20.88692   at 
org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259)
2012-06-12_15:31:20.88702   at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:234)
2012-06-12_15:31:20.88712   at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:331)
2012-06-12_15:31:20.88723   at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:309)
2012-06-12_15:31:20.88734   at 
org.apache.cassandra.db.Table.initCf(Table.java:367)
2012-06-12_15:31:20.88742   at 
org.apache.cassandra.db.Table.(Table.java:299)
2012-06-12_15:31:20.88750   at 
org.apache.cassandra.db.Table.open(Table.java:114)
2012-06-12_15:31:20.88758   at 
org.apache.cassandra.db.Table.open(Table.java:97)
2012-06-12_15:31:20.88766   at 
org.apache.cassandra.db.Table$2.apply(Table.java:574)
2012-06-12_15:31:20.88773   at 
org.apache.cassandra.db.Table$2.apply(Table.java:571)
2012-06-12_15:31:20.88782   at 
com.google.common.collect.Iterators$8.next(Iterators.java:751)
2012-06-12_15:31:20.88790   at 
org.apache.cassandra.db.ColumnFamilyStore.all(ColumnFamilyStore.java:1625)
2012-06-12_15:31:20.88800   at 
org.apache.cassandra.db.MeteredFlusher.countFlushingBytes(MeteredFlusher.java:118)
2012-06-12_15:31:20.88810   at 
org.apache.cassandra.db.MeteredFlusher.run(MeteredFlusher.java:45)
2012-06-12_15:31:20.88818   at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:79)
2012-06-12_15:31:20.88833   at 
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
2012-06-12_15:31:20.88842   at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)
2012-06-12_15:31:20.88851   at 
java.util.concurrent.FutureTask.runAndReset(Unknown Source)
2012-06-12_15:31:20.88860   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown
 Source)
2012-06-12_15:31:20.88870   at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown
 Source)
2012-06-12_15:31:20.2   at 
java.util.concurrent.S

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-12 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293412#comment-13293412
 ] 

Anton Winter commented on CASSANDRA-4321:
-

If I use the v2 patch startup stops with the following:
{code}
 INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) 
Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live 
bytes, 1 ops)
 INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing 
Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops)
ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java 
(line 134) Exception in thread 
Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null 
>= current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing 
into 
/var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db
{code}

Given the above I scrubbed the system keyspace which removed all sstables, 
leaving only the snapshots eg:

{code}
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 651) Row at 100 is unreadable; skipping to next
 WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java 
(line 602) Non-fatal error reading row (stacktrace follows)
java.lang.RuntimeException: Last written key null >= current key 
DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db
{code}
..eventually resulting in
{code}
WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java 
(line 692) No valid rows found while scrubbing 
SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db');
 it is marked for deletion now. If you want to attempt manual recovery, you can 
find a copy in the pre-scrub snapshot
{code}

A clean bootstrap also stops with similar errors:
{code}
java.lang.RuntimeException: Last written key null >= current key 
DecoratedKey(61078635599166706937511052402724559481, 4c) writing into 
/var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db
{code}
and 
{code}
java.lang.RuntimeException: Last written key null >= current key 
DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into 
/var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db
{code}


> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted t

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293005#comment-13293005
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

v2 LGTM (nit: would like to rename variables to xBounds instead of xRange)

re 0002, does this actually work w/ LCR?  

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.Data

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292856#comment-13292856
 ] 

Sylvain Lebresne commented on CASSANDRA-4321:
-

bq. Is that likely, given that Range vs Bounds is basically an off-by-one

The thing is, even an off-by-one is enough to have essentially 2 identical 
sstable on the same level. If so, our new by-level-iterator will happily write 
a new sstable that is a concatenation of both of those and we'll end up with 
half of the resulting sstable being out of order.

That being said, now that you mention it, leveled limits the size of sstables 
so we should be good on that front.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292858#comment-13292858
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

Comment on the patch itself: intersects is missing the case of {{that}} 
entirely containing {{this}}.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandr

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292855#comment-13292855
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

Backing up: how does incorrect overlapping sets result in out-of-order key 
writes (that pass that last-written-key check)?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292851#comment-13292851
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

bq. this can easily OOM if the sstable has lots of out of order rows

Is that likely, given that Range vs Bounds is basically an off-by-one?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>Assignee: Sylvain Lebresne
> Fix For: 1.1.2
>
> Attachments: 
> 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, 
> 0002-Scrub-detects-and-repair-outOfOrder-rows.txt
>
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:56

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292779#comment-13292779
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

Jonathan, yes I use LeveledCompactionStrategy with non-default 
sstable_size_in_mb = 10

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
> at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
> at 
> org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259)
>   

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292773#comment-13292773
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

Omid and Javier, are you also using LeveledCompactionStrategy?

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
> at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
> at 
> org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259)
> at 
> org.apache.ca

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-08 Thread Omid Aladini (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291958#comment-13291958
 ] 

Omid Aladini commented on CASSANDRA-4321:
-

We're see the same issue after upgrading from 1.0.9 to 1.1.1 on only a single 
node in a 16 node cluster. Wiping the data off and bootstrapping again didn't 
help. Compaction looks to be not progressing (according to compactionstats) and 
I can reproduce this on every "nodetool flush".

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandra.db.DataTracker$View.replace(Dat

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-08 Thread Javier Sotelo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291846#comment-13291846
 ] 

Javier Sotelo commented on CASSANDRA-4321:
--

Our stack overflow is a little different though:


ERROR [CompactionExecutor:35] 2012-06-08 15:52:42,086 
AbstractCassandraDaemon.java (line 134) Exception in thread 
Thread[CompactionExecutor:35,1,main]
java.lang.StackOverflowError
at java.util.AbstractList$Itr.hasNext(Unknown Source)
at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)

[repeats]

at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at com.google.common.collect.Iterators.size(Iterators.java:129)
at com.google.common.collect.Sets$3.size(Sets.java:670)
at com.google.common.collect.Iterables.size(Iterables.java:80)
at 
org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557)
at 
org.apache.cassandra.db.compaction.CompactionController.(CompactionController.java:79)
at 
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
at 
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
at 
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)



> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpe

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-08 Thread Javier Sotelo (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291841#comment-13291841
 ] 

Javier Sotelo commented on CASSANDRA-4321:
--

We are also seeing this. We also upgraded from 1.1.0 to 1.1.1. This problem 
only started after the upgrade. Our cluster is smaller, two DCs of 3 nodes each.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
> at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
> at 
> org.apache.ca

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-07 Thread Anton Winter (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291520#comment-13291520
 ] 

Anton Winter commented on CASSANDRA-4321:
-

The partitioner (RP) was not changed.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39)
> at 
> org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560)
> at 
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617)
> at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320)
> at 
> org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.

[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions

2012-06-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291511#comment-13291511
 ] 

Jonathan Ellis commented on CASSANDRA-4321:
---

bq. SSTable first key DecoratedKey(100294972947100949193477090306072672386, 
4fcf051ef5067d7f17d9fc35) > last key 
DecoratedKey(90250429663386465697464050082134975058, 4fce996e3c1eed8c4b17dd66)

Cassandra checks key ordering for correctness with every row that is added at 
write time, so unless you changed your partitioner (which is emphatically Not 
Supported), I'm not sure how this can happen.  Whatever it is, it's unlikely to 
be related to the 1.1.1 upgrade.

Scrub checks that the sstable content is well-formed and readable.  It doesn't 
check for out-of-order rows.  You can use a tool like sstablekeys to do that.

> stackoverflow building interval tree & possible sstable corruptions
> ---
>
> Key: CASSANDRA-4321
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4321
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.1.1
>Reporter: Anton Winter
>
> After upgrading to 1.1.1 (from 1.1.0) I have started experiencing 
> StackOverflowError's resulting in compaction backlog and failure to restart. 
> The ring currently consists of 6 DC's and 22 nodes using LCS & compression.  
> This issue was first noted on 2 nodes in one DC and then appears to have 
> spread to various other nodes in the other DC's.  
> When the first occurrence of this was found I restarted the instance but it 
> failed to start so I cleared its data and treated it as a replacement node 
> for the token it was previously responsible for.  This node successfully 
> streamed all the relevant data back but failed again a number of hours later 
> with the same StackOverflowError and again was unable to restart. 
> The initial stack overflow error on a running instance looks like this:
> ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 
> AbstractCassandraDaemon.java (line 134) Exception in thread 
> Thread[CompactionExecutor:314,1,main]
> java.lang.StackOverflowError
> at java.util.Arrays.mergeSort(Arrays.java:1157)
> at java.util.Arrays.sort(Arrays.java:1092)
> at java.util.Collections.sort(Collections.java:134)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow.  Compactions stop from this point 
> onwards]
> I restarted this failing instance with DEBUG logging enabled and it throws 
> the following exception part way through startup:
> ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.StackOverflowError
> at 
> org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307)
> at 
> org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276)
> at 
> org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230)
> at 
> org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124)
> at 
> org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> [snip - this repeats until stack overflow]
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
> at 
> org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62)
>