[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406844#comment-13406844 ] Anton Winter commented on CASSANDRA-4321: - New issue raised as requested: CASSANDRA-4411 > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTra
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406395#comment-13406395 ] Sylvain Lebresne commented on CASSANDRA-4321: - Damn. Ok, since this has been released with 1.1.2 already, would you mind opening a new one? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.a
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406254#comment-13406254 ] Anton Winter commented on CASSANDRA-4321: - I have repeatedly run sstablescrub across all my nodes and the exceptions do not occur as frequently now, however, the integrity check still throw exceptions. compactionstats shows a large number of pending tasks but no progression after this error. Should this ticket be reopened or a new one raised? {code} ERROR [CompactionExecutor:912] 2012-07-04 01:07:16,470 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:912,1,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531) at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254) at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:978) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.hel
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404338#comment-13404338 ] Anton Winter commented on CASSANDRA-4321: - 1.1 dev branch + patches > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404336#comment-13404336 ] Jonathan Ellis commented on CASSANDRA-4321: --- To clarify, is this 1.1.1 release + patches, or 1.1 dev branch + patches? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataT
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404333#comment-13404333 ] Anton Winter commented on CASSANDRA-4321: - Maybe I spoke too soon. Overnight I've seen the exceptions happen again on nodes that were v7 patched & scrubbed. {code} ERROR [CompactionExecutor:1301] 2012-06-29 21:54:12,078 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:1301,1,main] java.lang.RuntimeException: Last written key DecoratedKey(116816802911061669023614481109871014436, 4faa631ca88ef85b8e26ddeb) >= current key DecoratedKey(115179899219377463875853982254751557438, 4fa892bf42d3f24479f627b6) writing into /var/lib//data/cassandra/KS/CF/KS-CF-tmp-hd-837655-Data.db at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {code} > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403764#comment-13403764 ] Anton Winter commented on CASSANDRA-4321: - I've applied the v7 patches and have successfully offline scrubbed & reinserted a number of nodes in my ring without further occurrence of the previous issues. Thanks :) > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: 0001-Fix-overlapping-computation-v7.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v7.txt, > 0003-Create-standalone-scrub-v7.txt, > 0004-Add-manifest-integrity-check-v7.txt, cleanup.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402995#comment-13402995 ] Sylvain Lebresne commented on CASSANDRA-4321: - bq. CompactionsTest.testBlacklistingWithLeveledCompactionStrategy is currently failing in trunk because of a similar integrity check that I added to promote() That's a bug in the integrity check in fact, that should skip the check if newLevel=0 (since we can have newLevel=0 following CASSANDRA-4341). With that fixed there is no more failure of that unit test (I've committed the fix as 4725bf71e18550ac60f9). > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, > 0003-Create-standalone-scrub-v6.txt, > 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.I
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402518#comment-13402518 ] Jonathan Ellis commented on CASSANDRA-4321: --- CompactionsTest.testBlacklistingWithLeveledCompactionStrategy is currently failing in trunk because of a similar integrity check that I added to promote() for CASSANDRA-1991: {code} . DecoratedKey last = null; Collections.sort(generations[newLevel], SSTable.sstableComparator); for (SSTableReader sstable : generations[newLevel]) { assert last == null || sstable.first.compareTo(last) > 0; last = sstable.last; } {code} Patch 0001 does not fix that test failure. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, > 0003-Create-standalone-scrub-v6.txt, > 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.uti
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401496#comment-13401496 ] Sylvain Lebresne commented on CASSANDRA-4321: - Hum, yes that's possible. I figured this wouldn't be the problem since they were using 1.1.1 but you are right that if they tried against the 1.1 branch that could have been it. Worth checking on current 1.1 branch I guess. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, > 0003-Create-standalone-scrub-v6.txt, > 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.Interva
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401494#comment-13401494 ] Jonathan Ellis commented on CASSANDRA-4321: --- Could the CASSANDRA-4341 regression have caused what Anton saw, if he was running from the 1.1 branch? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v6.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v6.txt, > 0003-Create-standalone-scrub-v6.txt, > 0004-Add-manifest-integrity-check-v6.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399760#comment-13399760 ] Anton Winter commented on CASSANDRA-4321: - bq. Was I lucky? Are you guys able to reproduce those steps and still get more errors? As discussed, but repeated here just for the ticket's reference; I was patching and scrubbing in the same way as described above. Once the scrubbed nodes were restarted in the cluster they were then under normal read/write load and experienced the exceptions again. Given that the sstablescrub and subsequent compactions run fine in Sylvain's test, using my out of order sstables, means that the sstablescrub command appears to do its job fine. The root cause, originally expected to be resolved with the 0001 patch, still appears to be occurring so Sylvain was going to investigate the code further. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399299#comment-13399299 ] Omid Aladini commented on CASSANDRA-4321: - So I had offline-scrubbed the live Cassandra node and I had copied the sstables that participated in one of the failed compaction. Assuming the sstables had been offline-scrubbed, I had skipped step 2 above locally, so unfortunately I can't yet reproduce it locally with a limited set of data. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.In
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399268#comment-13399268 ] Sylvain Lebresne commented on CASSANDRA-4321: - bq. which leads me to suspect if it's due to the "Range" (vs. Bounds) used in LeveledCompactionStrategy::getScanners. Any ideas? No, that Range is correct because this correspond to repair ranges and is correctly interpreted. And in fact, when doing compaction that range is actually null so that cannot be the problem. So Anton sent me some sstables that triggered an out-of-order exception when compacting them. What I did with that is: 1) apply the last version of the v5 patch on this issue on top of the current git cassandra-1.1 branch (using the release of CASSANDRA 1.1.1 would probably work too because I don't think there is any other related fix since 1.1.1 that went into the git branch but anyway, that's what I used). 2) I ran the offline scrub *while the node was stopped*. I insist on that last part because that having the node run during the offline scrub would mess things up. I'd actually like to make the offline scrub check if the node is running and refuse to work in that case but I'm not sure of what is the best way to do that. 3) I restarted the node once the scrub was done I was then able to compact the node fully (i.e, I ran compaction until there was nothing more to compact) and this without hitting any more error. Was I lucky? Are you guys able to reproduce those steps and still get more errors? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.util
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398563#comment-13398563 ] Omid Aladini commented on CASSANDRA-4321: - I experienced the same as Anton's. One observation is that the out-of-order key being wrongly iterated by CompactionIterable's MergeIterator which causes the exception, happen to be the start of an interval: DEBUG 18:10:41,693 Creating IntervalNode from [... Interval(DecoratedKey(33736808147257072541807562080745136438, ... ), which leads me to suspect if it's due to the "Range" (vs. Bounds) used in LeveledCompactionStrategy::getScanners. Any ideas? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNo
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398326#comment-13398326 ] Sylvain Lebresne commented on CASSANDRA-4321: - bq. The above error is due to the %d in StandaloneScrubber.java:179's interpolation. Yes, sorry for that typo. I've updated the v5 patch to fix it. bq. Got "java.lang.OutOfMemoryError: Java heap space" with -Xmx256M The last version of the offline scrub "loads" all sstable readers, which means in particular that it loads the summary of the key index and the sstable bloom filter. In other words, it does use a bit more memory, so it's not necessarily surprising that -Xmx256M is not enough. bq. LeveledCompactionStrategyTest:testValidationMultipleSSTablePerLevel fails because of junit timeout when I run it together with all other suits, but passes when I only run LeveledCompactionStrategyTest suit. Is it related? I doubt it. I've already seen test timeout when run with the full suit but not alone. I wouldn't worry too much about that. At least that test is working fine on my machine. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(In
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13398206#comment-13398206 ] Anton Winter commented on CASSANDRA-4321: - After working around the issue with the 0003 v5 patch that Omid refers I've had an sstablescrub complete on one of my servers. sstablescrub did detected several overlapping sstables, resetting them to L0, but no out of order keys. The Last written key DecoratedKey >= current key exception however resurfaces again after the first set of compactions, 5 minutes after startup, in the exact same manner as before. The same exception occurs for various CF's until compactions stop completely. compactionstats still shows a large number of pending compaction tasks after this event. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.ca
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397814#comment-13397814 ] Omid Aladini commented on CASSANDRA-4321: - The above error is due to the %d in StandaloneScrubber.java:179's interpolation. Will fix and try again. Jonathan: Sun Java 6, 1.6.0_26 > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397808#comment-13397808 ] Jonathan Ellis commented on CASSANDRA-4321: --- Are you using Java6 or Java7? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassa
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397805#comment-13397805 ] Omid Aladini commented on CASSANDRA-4321: - Got "java.lang.OutOfMemoryError: Java heap space" with -Xmx256M. Tried with -Xmx512M and the scrub failed with: {code} Checking leveled manifest d != org.apache.cassandra.io.sstable.SSTableReader java.util.IllegalFormatConversionException: d != org.apache.cassandra.io.sstable.SSTableReader at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:3999) at java.util.Formatter$FormatSpecifier.printInteger(Formatter.java:2709) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2661) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at org.apache.cassandra.tools.StandaloneScrubber.checkManifest(StandaloneScrubber.java:179) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:148) {code} > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397632#comment-13397632 ] Omid Aladini commented on CASSANDRA-4321: - Will try it again. LeveledCompactionStrategyTest:testValidationMultipleSSTablePerLevel fails because of junit timeout when I run it together with all other suits, but passes when I only run LeveledCompactionStrategyTest suit. Is it related? {code} [junit] Testsuite: org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] [junit] Testcase: org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest:testValidationMultipleSSTablePerLevel: Caused an ERROR [junit] Timeout occurred. Please note the time in the report does not reflect the time until the timeout. [junit] junit.framework.AssertionFailedError: Timeout occurred. Please note the time in the report does not reflect the time until the timeout. [junit] [junit] [junit] Test org.apache.cassandra.db.compaction.LeveledCompactionStrategyTest FAILED (timeout) {code} > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v5.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v5.txt, > 0003-Create-standalone-scrub-v5.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apa
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397437#comment-13397437 ] Omid Aladini commented on CASSANDRA-4321: - The exceptions happens not quickly afterwards but after some rounds of compaction on the CF. I had re-bootstrapped so there are tons of ~(1500) pending compaction tasks. If I restart the node and compact the problem happens again and I can reproduce it. I'll send you an email about the data. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396906#comment-13396906 ] Sylvain Lebresne commented on CASSANDRA-4321: - Did the new exception happened quickly after having started the node with the scrubbed files? Are you able to reproduce easily (i.e. if you restart the node and compact, do you still get the same error). If you are able to reproduce, would you be at liberty to provide a set of sstables that produce the error during compaction (in private for instance). It would be much more easy to tack that down with an easy way to reproduce. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396903#comment-13396903 ] Omid Aladini commented on CASSANDRA-4321: - Let me know if I can provide more data. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTrac
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396895#comment-13396895 ] Omid Aladini commented on CASSANDRA-4321: - Exactly. - Applied v3 - Ran offline scrub and it failed because of tmp files. - Started Cassandra and saw failures. - Applied v4 to cassandra-1.1 branch. - Ran offline scrub successfully. - Started Cassandra successfully. - Compaction failed because of above error. All done on the same instance. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396862#comment-13396862 ] Sylvain Lebresne commented on CASSANDRA-4321: - Just to make sure: you did apply 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt before restarting the server after having run the offline scrub, right? If so, that would mean we have yet another bug that generates out of order keys during compaction. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apa
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396804#comment-13396804 ] Omid Aladini commented on CASSANDRA-4321: - Tried v4 patch and offline scrub went through completely. Cassandra started without any error but compaction halted again: {code} 2012-06-19_14:01:03.47432 INFO 14:01:03,474 Compacting [SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-67792-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65607-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-63279-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65491-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-68332-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-64720-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65322-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-66557-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-64504-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-68179-Data.db'), SSTableReader(path='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-65005-Data.db')] 2012-06-19_14:01:08.73528 ERROR 14:01:08,733 Exception in thread Thread[CompactionExecutor:11,1,main] 2012-06-19_14:01:08.73538 java.lang.RuntimeException: Last written key DecoratedKey(42351003983459534782466386414991462257, 313632303432347c3130303632313432) >= current key DecoratedKey(38276735926421753773204634663641518108, 31343638373735327c3439343837333932) writing into /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-68399-Data.db 2012-06-19_14:01:08.73572 at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 2012-06-19_14:01:08.73581 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 2012-06-19_14:01:08.73590 at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) 2012-06-19_14:01:08.73600 at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 2012-06-19_14:01:08.73611 at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) 2012-06-19_14:01:08.73622 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 2012-06-19_14:01:08.73633 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-19_14:01:08.73642 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-19_14:01:08.73650 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-19_14:01:08.73657 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-19_14:01:08.73665 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-19_14:01:08.73672 at java.lang.Thread.run(Unknown Source) {code} All SSTables that participated in compaction were new ones written by the offline scrub (according their timestamp and also id range.) although the first one didn't exist any more (already promoted before the exception?) {quote}This is not really a new bug, but I believe that prior to CASSANDRA-4142, *this had less consequences*.{quote} Sylvain, could you please elaborate on this? I'd like to know how pre-1.1.1 data is affected by the Range-vs-Bounds bug. Only overlapping/duplicate sstables on the same level leading to slower reads caused by unneeded sstable lookups? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, 0003-Create-standalone-scrub-v4.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396433#comment-13396433 ] Anton Winter commented on CASSANDRA-4321: - I can confirm I also experienced the "Unexpected empty index file" errors on some of the nodes that I have run sstablescrub on. On some other nodes the sstablescrub command appears to complete successfully but compactions still stops at the "java.lang.RuntimeException: Last written key DecoratedKey" error. Is there any further information we can supply to help debug? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.Int
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396319#comment-13396319 ] Al Tobey commented on CASSANDRA-4321: - Offline scrub ran fine for me. I downgraded to 1.1.0 and ran a compaction and it looks fine. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(D
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396246#comment-13396246 ] Omid Aladini commented on CASSANDRA-4321: - Thanks for the patch. Offline scrub is indeed very useful. Tried the v3 patches and the scrub didn't complete, possibly because of a different issue, with the following failed assertion: {code} Exception in thread "main" java.lang.AssertionError: Unexpected empty index file: RandomAccessReader(filePath='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-33827-Index.db', skipIOCache=true) at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:221) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:376) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203) at org.apache.cassandra.io.sstable.SSTableReader.openNoValidation(SSTableReader.java:143) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:79) {code} which consequently, it encountered a corrupt SSTable during start-up: {code} 2012-06-18_20:36:19.89543 INFO 20:36:19,895 Opening /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-24984 (1941993 bytes) 2012-06-18_20:36:19.90217 ERROR 20:36:19,900 Exception in thread Thread[SSTableBatchOpen:9,5,main] 2012-06-18_20:36:19.90222 java.lang.IllegalStateException: SSTable first key DecoratedKey(41255474878128469814942789647212295629, 31303132393937357c3337313730333536) > last key DecoratedKey(41219536226656199861610796307350537953, 31303234323538397c3331383436373338) 2012-06-18_20:36:19.90261 at org.apache.cassandra.io.sstable.SSTableReader.validate(SSTableReader.java:441) 2012-06-18_20:36:19.90275 at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:208) 2012-06-18_20:36:19.90291 at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) 2012-06-18_20:36:19.90309 at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:245) 2012-06-18_20:36:19.90324 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-18_20:36:19.90389 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-18_20:36:19.90391 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-18_20:36:19.90391 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-18_20:36:19.90392 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-18_20:36:19.90392 at java.lang.Thread.run(Unknown Source) {code} although didn't prevent Cassandra from starting up, but compaction failed subsequently: {code} 2012-06-18_20:51:41.79122 ERROR 20:51:41,790 Exception in thread Thread[CompactionExecutor:81,1,main] 2012-06-18_20:51:41.79131 java.lang.RuntimeException: Last written key DecoratedKey(12341204629749023303706929560940823070, 33363037353338) >= current key DecoratedKey(12167298275958419273792070792442127650, 31363431343537) writing into /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-40992-Data.db 2012-06-18_20:51:41.79161 at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 2012-06-18_20:51:41.79169 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 2012-06-18_20:51:41.79180 at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) 2012-06-18_20:51:41.79189 at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 2012-06-18_20:51:41.79199 at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) 2012-06-18_20:51:41.79210 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 2012-06-18_20:51:41.79218 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-18_20:51:41.79227 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-18_20:51:41.79235 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-18_20:51:41.79242 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-18_20:51:41.79250 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-18_20:51:41.79259 at java.lang.Thread.run(Unknown Source) {code} > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >A
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296042#comment-13296042 ] Jonathan Ellis commented on CASSANDRA-4321: --- cassandra-1.1 branch > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.D
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296040#comment-13296040 ] Al Tobey commented on CASSANDRA-4321: - What SHA / tag should these patches apply against? I've tried trunk, 1.1.1 and 1.1.0 and can't get a clean apply. I'll try a manual merge tomorrow. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > o
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295290#comment-13295290 ] Al Tobey commented on CASSANDRA-4321: - BTW if somebody points me to a build, tag, or commit ID to test, I'll push it out right away. It's only a 3-node cluster and I can easily take a filesystem snapshot before running. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(Interv
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295286#comment-13295286 ] Al Tobey commented on CASSANDRA-4321: - I think I just hit the same thing. We're using reverse comparator with bytescomparator on the CF's that seem to be having trouble if that's relevant at all. Cluster is 1.1.1 on Ubuntu 12.04 and only has about 7GB per node at the moment. Stacktrace attached. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt, > ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:6
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295171#comment-13295171 ] Omid Aladini commented on CASSANDRA-4321: - Scrubbed the column family on a node which had booted up with assertions `on` and there were still corrupt sstables. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(Inte
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293762#comment-13293762 ] Omid Aladini commented on CASSANDRA-4321: - Tried the patch but the server still doesn't start. The StackOverFlow that gets thrown causes an already loaded column family to be loaded again: Load CF1: {code} reading saved cache /var/lib/cassandra/abcd/saved_caches/SOMEKSP-CF1-KeyCache 2012-06-12_16:18:04.12387 INFO 16:18:04,123 Opening /var/lib/cassandra/abcd/data/SOMEKSP/CF1/SOMEKSP-CF1-hd-2248 ... {code} Load CF3 which has the corrupted sstables {code} 2012-06-12_15:31:20.56185 INFO 15:31:20,561 Opening /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-7924 (2372910 bytes) 2012-06-12_15:31:20.81897 ERROR 15:31:20,811 Exception in thread Thread[OptionalTasks:1,5,main] 2012-06-12_15:31:20.81901 java.lang.StackOverflowError 2012-06-12_15:31:20.81901 at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:90) 2012-06-12_15:31:20.81906 at org.apache.cassandra.db.DecoratedKey.compareTo(DecoratedKey.java:38) 2012-06-12_15:31:20.81918 at java.util.Arrays.mergeSort(Unknown Source) 2012-06-12_15:31:20.81927 at java.util.Arrays.mergeSort(Unknown Source) 2012-06-12_15:31:20.81934 at java.util.Arrays.mergeSort(Unknown Source) 2012-06-12_15:31:20.81940 at java.util.Arrays.mergeSort(Unknown Source) 2012-06-12_15:31:20.81946 at java.util.Arrays.mergeSort(Unknown Source) 2012-06-12_15:31:20.81954 at java.util.Arrays.sort(Unknown Source) 2012-06-12_15:31:20.81960 at java.util.Collections.sort(Unknown Source) 2012-06-12_15:31:20.81980 at org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) 2012-06-12_15:31:20.81981 at org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) 2012-06-12_15:31:20.81990 at org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) // stacktrace goes on 2012-06-12_15:31:20.88633 at org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) 2012-06-12_15:31:20.88643 at org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) 2012-06-12_15:31:20.88654 at org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) 2012-06-12_15:31:20.88664 at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) 2012-06-12_15:31:20.88673 at org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) 2012-06-12_15:31:20.88683 at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) 2012-06-12_15:31:20.88692 at org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) 2012-06-12_15:31:20.88702 at org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:234) 2012-06-12_15:31:20.88712 at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:331) 2012-06-12_15:31:20.88723 at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:309) 2012-06-12_15:31:20.88734 at org.apache.cassandra.db.Table.initCf(Table.java:367) 2012-06-12_15:31:20.88742 at org.apache.cassandra.db.Table.(Table.java:299) 2012-06-12_15:31:20.88750 at org.apache.cassandra.db.Table.open(Table.java:114) 2012-06-12_15:31:20.88758 at org.apache.cassandra.db.Table.open(Table.java:97) 2012-06-12_15:31:20.88766 at org.apache.cassandra.db.Table$2.apply(Table.java:574) 2012-06-12_15:31:20.88773 at org.apache.cassandra.db.Table$2.apply(Table.java:571) 2012-06-12_15:31:20.88782 at com.google.common.collect.Iterators$8.next(Iterators.java:751) 2012-06-12_15:31:20.88790 at org.apache.cassandra.db.ColumnFamilyStore.all(ColumnFamilyStore.java:1625) 2012-06-12_15:31:20.88800 at org.apache.cassandra.db.MeteredFlusher.countFlushingBytes(MeteredFlusher.java:118) 2012-06-12_15:31:20.88810 at org.apache.cassandra.db.MeteredFlusher.run(MeteredFlusher.java:45) 2012-06-12_15:31:20.88818 at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:79) 2012-06-12_15:31:20.88833 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-12_15:31:20.88842 at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source) 2012-06-12_15:31:20.88851 at java.util.concurrent.FutureTask.runAndReset(Unknown Source) 2012-06-12_15:31:20.88860 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source) 2012-06-12_15:31:20.88870 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source) 2012-06-12_15:31:20.2 at java.util.concurrent.S
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293412#comment-13293412 ] Anton Winter commented on CASSANDRA-4321: - If I use the v2 patch startup stops with the following: {code} INFO [main] 2012-06-12 14:23:33,899 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) INFO [FlushWriter:2] 2012-06-12 14:23:33,903 Memtable.java (line 266) Writing Memtable-LocationInfo@1141455324(41/51 serialized/live bytes, 1 ops) ERROR [FlushWriter:2] 2012-06-12 14:23:33,953 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[FlushWriter:2,5,main]java.lang.RuntimeException: Last written key null >= current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-65597-Data.db {code} Given the above I scrubbed the system keyspace which removed all sstables, leaving only the snapshots eg: {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 651) Row at 100 is unreadable; skipping to next WARN [CompactionExecutor:5] 2012-06-12 14:29:41,672 CompactionManager.java (line 602) Non-fatal error reading row (stacktrace follows) java.lang.RuntimeException: Last written key null >= current key DecoratedKey(135285944860343992175601105924967452217, 63716c) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-37-Data.db {code} ..eventually resulting in {code} WARN [CompactionExecutor:5] 2012-06-12 14:29:41,674 CompactionManager.java (line 692) No valid rows found while scrubbing SSTableReader(path='/var/lib//data/cassandra/system/Versions/system-Versions-hd-35-Data.db'); it is marked for deletion now. If you want to attempt manual recovery, you can find a copy in the pre-scrub snapshot {code} A clean bootstrap also stops with similar errors: {code} java.lang.RuntimeException: Last written key null >= current key DecoratedKey(61078635599166706937511052402724559481, 4c) writing into /var/lib//data/cassandra/system/LocationInfo/system-LocationInfo-tmp-hd-1-Data.db {code} and {code} java.lang.RuntimeException: Last written key null >= current key DecoratedKey(93220794208128599841715671226150005828, 746872696674) writing into /var/lib//data/cassandra/system/Versions/system-Versions-tmp-hd-1-Data.db {code} > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted t
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293005#comment-13293005 ] Jonathan Ellis commented on CASSANDRA-4321: --- v2 LGTM (nit: would like to rename variables to xBounds instead of xRange) re 0002, does this actually work w/ LCR? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v2.txt, > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.Data
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292856#comment-13292856 ] Sylvain Lebresne commented on CASSANDRA-4321: - bq. Is that likely, given that Range vs Bounds is basically an off-by-one The thing is, even an off-by-one is enough to have essentially 2 identical sstable on the same level. If so, our new by-level-iterator will happily write a new sstable that is a concatenation of both of those and we'll end up with half of the resulting sstable being out of order. That being said, now that you mention it, leveled limits the size of sstables so we should be good on that front. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292858#comment-13292858 ] Jonathan Ellis commented on CASSANDRA-4321: --- Comment on the patch itself: intersects is missing the case of {{that}} entirely containing {{this}}. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandr
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292855#comment-13292855 ] Jonathan Ellis commented on CASSANDRA-4321: --- Backing up: how does incorrect overlapping sets result in out-of-order key writes (that pass that last-written-key check)? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292851#comment-13292851 ] Jonathan Ellis commented on CASSANDRA-4321: --- bq. this can easily OOM if the sstable has lots of out of order rows Is that likely, given that Range vs Bounds is basically an off-by-one? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter >Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:56
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292779#comment-13292779 ] Omid Aladini commented on CASSANDRA-4321: - Jonathan, yes I use LeveledCompactionStrategy with non-default sstable_size_in_mb = 10 > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) > at > org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) >
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292773#comment-13292773 ] Jonathan Ellis commented on CASSANDRA-4321: --- Omid and Javier, are you also using LeveledCompactionStrategy? > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) > at > org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) > at > org.apache.ca
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291958#comment-13291958 ] Omid Aladini commented on CASSANDRA-4321: - We're see the same issue after upgrading from 1.0.9 to 1.1.1 on only a single node in a 16 node cluster. Wiping the data off and bootstrapping again didn't help. Compaction looks to be not progressing (according to compactionstats) and I can reproduce this on every "nodetool flush". > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(Dat
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291846#comment-13291846 ] Javier Sotelo commented on CASSANDRA-4321: -- Our stack overflow is a little different though: ERROR [CompactionExecutor:35] 2012-06-08 15:52:42,086 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:35,1,main] java.lang.StackOverflowError at java.util.AbstractList$Itr.hasNext(Unknown Source) at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517) at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114) [repeats] at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517) at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:614) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135) at com.google.common.collect.Iterators.size(Iterators.java:129) at com.google.common.collect.Sets$3.size(Sets.java:670) at com.google.common.collect.Iterables.size(Iterables.java:80) at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557) at org.apache.cassandra.db.compaction.CompactionController.(CompactionController.java:79) at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105) at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpe
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291841#comment-13291841 ] Javier Sotelo commented on CASSANDRA-4321: -- We are also seeing this. We also upgraded from 1.1.0 to 1.1.1. This problem only started after the upgrade. Our cluster is smaller, two DCs of 3 nodes each. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) > at > org.apache.ca
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291520#comment-13291520 ] Anton Winter commented on CASSANDRA-4321: - The partitioner (RP) was not changed. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) > at > org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) > at > org.apache.cassandra.db.ColumnFamilyStore.
[jira] [Commented] (CASSANDRA-4321) stackoverflow building interval tree & possible sstable corruptions
[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291511#comment-13291511 ] Jonathan Ellis commented on CASSANDRA-4321: --- bq. SSTable first key DecoratedKey(100294972947100949193477090306072672386, 4fcf051ef5067d7f17d9fc35) > last key DecoratedKey(90250429663386465697464050082134975058, 4fce996e3c1eed8c4b17dd66) Cassandra checks key ordering for correctness with every row that is added at write time, so unless you changed your partitioner (which is emphatically Not Supported), I'm not sure how this can happen. Whatever it is, it's unlikely to be related to the 1.1.1 upgrade. Scrub checks that the sstable content is well-formed and readable. It doesn't check for out-of-order rows. You can use a tool like sstablekeys to do that. > stackoverflow building interval tree & possible sstable corruptions > --- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.1.1 >Reporter: Anton Winter > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.(IntervalNode.java:62) >