[ https://issues.apache.org/jira/browse/CASSANDRA-4321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396246#comment-13396246 ]
Omid Aladini edited comment on CASSANDRA-4321 at 6/18/12 9:52 PM: ------------------------------------------------------------------ Thanks for the patch. Offline scrub is indeed very useful. Tried the v3 patches and the scrub didn't complete, possibly because of a different issue, with the following failed assertion: {code} Exception in thread "main" java.lang.AssertionError: Unexpected empty index file: RandomAccessReader(filePath='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-33827-Index.db', skipIOCache=true) at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:221) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:376) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203) at org.apache.cassandra.io.sstable.SSTableReader.openNoValidation(SSTableReader.java:143) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:79) {code} which consequently, encountered corrupt SSTables during start-up: {code} 2012-06-18_20:36:19.89543 INFO 20:36:19,895 Opening /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-24984 (1941993 bytes) 2012-06-18_20:36:19.90217 ERROR 20:36:19,900 Exception in thread Thread[SSTableBatchOpen:9,5,main] 2012-06-18_20:36:19.90222 java.lang.IllegalStateException: SSTable first key DecoratedKey(41255474878128469814942789647212295629, 31303132393937357c3337313730333536) > last key DecoratedKey(41219536226656199861610796307350537953, 31303234323538397c3331383436373338) 2012-06-18_20:36:19.90261 at org.apache.cassandra.io.sstable.SSTableReader.validate(SSTableReader.java:441) 2012-06-18_20:36:19.90275 at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:208) 2012-06-18_20:36:19.90291 at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) 2012-06-18_20:36:19.90309 at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:245) 2012-06-18_20:36:19.90324 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-18_20:36:19.90389 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-18_20:36:19.90391 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-18_20:36:19.90391 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-18_20:36:19.90392 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-18_20:36:19.90392 at java.lang.Thread.run(Unknown Source) {code} although didn't prevent Cassandra from starting up, but compaction failed subsequently: {code} 2012-06-18_20:51:41.79122 ERROR 20:51:41,790 Exception in thread Thread[CompactionExecutor:81,1,main] 2012-06-18_20:51:41.79131 java.lang.RuntimeException: Last written key DecoratedKey(12341204629749023303706929560940823070, 33363037353338) >= current key DecoratedKey(12167298275958419273792070792442127650, 31363431343537) writing into /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-40992-Data.db 2012-06-18_20:51:41.79161 at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 2012-06-18_20:51:41.79169 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 2012-06-18_20:51:41.79180 at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) 2012-06-18_20:51:41.79189 at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 2012-06-18_20:51:41.79199 at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) 2012-06-18_20:51:41.79210 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 2012-06-18_20:51:41.79218 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-18_20:51:41.79227 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-18_20:51:41.79235 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-18_20:51:41.79242 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-18_20:51:41.79250 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-18_20:51:41.79259 at java.lang.Thread.run(Unknown Source) {code} was (Author: omid): Thanks for the patch. Offline scrub is indeed very useful. Tried the v3 patches and the scrub didn't complete, possibly because of a different issue, with the following failed assertion: {code} Exception in thread "main" java.lang.AssertionError: Unexpected empty index file: RandomAccessReader(filePath='/var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-33827-Index.db', skipIOCache=true) at org.apache.cassandra.io.sstable.SSTable.estimateRowsFromIndex(SSTable.java:221) at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:376) at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203) at org.apache.cassandra.io.sstable.SSTableReader.openNoValidation(SSTableReader.java:143) at org.apache.cassandra.tools.StandaloneScrubber.main(StandaloneScrubber.java:79) {code} which consequently, it encountered a corrupt SSTable during start-up: {code} 2012-06-18_20:36:19.89543 INFO 20:36:19,895 Opening /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-hd-24984 (1941993 bytes) 2012-06-18_20:36:19.90217 ERROR 20:36:19,900 Exception in thread Thread[SSTableBatchOpen:9,5,main] 2012-06-18_20:36:19.90222 java.lang.IllegalStateException: SSTable first key DecoratedKey(41255474878128469814942789647212295629, 31303132393937357c3337313730333536) > last key DecoratedKey(41219536226656199861610796307350537953, 31303234323538397c3331383436373338) 2012-06-18_20:36:19.90261 at org.apache.cassandra.io.sstable.SSTableReader.validate(SSTableReader.java:441) 2012-06-18_20:36:19.90275 at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:208) 2012-06-18_20:36:19.90291 at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153) 2012-06-18_20:36:19.90309 at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:245) 2012-06-18_20:36:19.90324 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-18_20:36:19.90389 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-18_20:36:19.90391 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-18_20:36:19.90391 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-18_20:36:19.90392 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-18_20:36:19.90392 at java.lang.Thread.run(Unknown Source) {code} although didn't prevent Cassandra from starting up, but compaction failed subsequently: {code} 2012-06-18_20:51:41.79122 ERROR 20:51:41,790 Exception in thread Thread[CompactionExecutor:81,1,main] 2012-06-18_20:51:41.79131 java.lang.RuntimeException: Last written key DecoratedKey(12341204629749023303706929560940823070, 33363037353338) >= current key DecoratedKey(12167298275958419273792070792442127650, 31363431343537) writing into /var/lib/cassandra/abcd/data/SOMEKSP/CF3/SOMEKSP-CF3-tmp-hd-40992-Data.db 2012-06-18_20:51:41.79161 at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) 2012-06-18_20:51:41.79169 at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) 2012-06-18_20:51:41.79180 at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) 2012-06-18_20:51:41.79189 at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) 2012-06-18_20:51:41.79199 at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) 2012-06-18_20:51:41.79210 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) 2012-06-18_20:51:41.79218 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) 2012-06-18_20:51:41.79227 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) 2012-06-18_20:51:41.79235 at java.util.concurrent.FutureTask.run(Unknown Source) 2012-06-18_20:51:41.79242 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) 2012-06-18_20:51:41.79250 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 2012-06-18_20:51:41.79259 at java.lang.Thread.run(Unknown Source) {code} > stackoverflow building interval tree & possible sstable corruptions > ------------------------------------------------------------------- > > Key: CASSANDRA-4321 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4321 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.1.1 > Reporter: Anton Winter > Assignee: Sylvain Lebresne > Fix For: 1.1.2 > > Attachments: > 0001-Change-Range-Bounds-in-LeveledManifest.overlapping-v3.txt, > 0002-Scrub-detects-and-repair-outOfOrder-rows-v3.txt, > 0003-Create-standalone-scrub-v3.txt, ooyala-hastur-stacktrace.txt > > > After upgrading to 1.1.1 (from 1.1.0) I have started experiencing > StackOverflowError's resulting in compaction backlog and failure to restart. > The ring currently consists of 6 DC's and 22 nodes using LCS & compression. > This issue was first noted on 2 nodes in one DC and then appears to have > spread to various other nodes in the other DC's. > When the first occurrence of this was found I restarted the instance but it > failed to start so I cleared its data and treated it as a replacement node > for the token it was previously responsible for. This node successfully > streamed all the relevant data back but failed again a number of hours later > with the same StackOverflowError and again was unable to restart. > The initial stack overflow error on a running instance looks like this: > ERROR [CompactionExecutor:314] 2012-06-07 09:59:43,017 > AbstractCassandraDaemon.java (line 134) Exception in thread > Thread[CompactionExecutor:314,1,main] > java.lang.StackOverflowError > at java.util.Arrays.mergeSort(Arrays.java:1157) > at java.util.Arrays.sort(Arrays.java:1092) > at java.util.Collections.sort(Collections.java:134) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.findMinMedianMax(IntervalNode.java:114) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:49) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > [snip - this repeats until stack overflow. Compactions stop from this point > onwards] > I restarted this failing instance with DEBUG logging enabled and it throws > the following exception part way through startup: > ERROR 11:37:51,046 Exception in thread Thread[OptionalTasks:1,5,main] > java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.<init>(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) > at > org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) > at > org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:234) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:331) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:309) > at org.apache.cassandra.db.Table.initCf(Table.java:367) > at org.apache.cassandra.db.Table.<init>(Table.java:299) > at org.apache.cassandra.db.Table.open(Table.java:114) > at org.apache.cassandra.db.Table.open(Table.java:97) > at org.apache.cassandra.db.Table$2.apply(Table.java:574) > at org.apache.cassandra.db.Table$2.apply(Table.java:571) > at com.google.common.collect.Iterators$8.next(Iterators.java:751) > at > org.apache.cassandra.db.ColumnFamilyStore.all(ColumnFamilyStore.java:1625) > at > org.apache.cassandra.db.MeteredFlusher.countFlushingBytes(MeteredFlusher.java:118) > at org.apache.cassandra.db.MeteredFlusher.run(MeteredFlusher.java:45) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:79) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at > java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > DEBUG 11:37:51,052 Initializing ksU.cfS > And then finally fails with the following: > DEBUG 11:49:03,752 Creating IntervalNode from > [Interval(DecoratedKey(104860264640932324846851821824650966808, > 4fcc88eb0218216164673394), > DecoratedKey(93975306025956344620001177071135439009, > 4fc8fb042c98458c7a58bc3b)), > Interval(DecoratedKey(104860264640932324846851821824650966808, > 4fcc88eb0218216164673394), > DecoratedKey(93975306025956344620001177071135439009, > 4fc8fb042c98458c7a58bc3b)), > Interval(DecoratedKey(104860264640932324846851821824650966808, > 4fcc88eb0218216164673394), > DecoratedKey(93975306025956344620001177071135439009, > 4fc8fb042c98458c7a58bc3b)), > Interval(DecoratedKey(104860264640932324846851821824650966808, > 4fcc88eb0218216164673394), > DecoratedKey(93975306025956344620001177071135439009, > 4fc8fb042c98458c7a58bc3b)), > Interval(DecoratedKey(104860264640932324846851821824650966808, > 4fcc88eb0218216164673394), > DecoratedKey(93975306025956344620001177071135439009, > 4fc8fb042c98458c7a58bc3b)), > Interval(DecoratedKey(104860264640932324846851821824650966808, > 4fcc88eb0218216164673394), > DecoratedKey(93975306025956344620001177071135439009, > 4fc8fb042c98458c7a58bc3b))] > java.lang.reflect.InvocationTargetException > DEBUG 11:49:03,753 Configured datacenter replicas are dc1:2, dc2:2, dc3:2, > dc4:2, dc5:0, dc6:2, dc7:0, dc8:0, dc9:2 > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at > org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160) > Caused by: java.lang.StackOverflowError > at > org.slf4j.helpers.MessageFormatter.safeObjectAppend(MessageFormatter.java:307) > at > org.slf4j.helpers.MessageFormatter.deeplyAppendParameter(MessageFormatter.java:276) > at > org.slf4j.helpers.MessageFormatter.arrayFormat(MessageFormatter.java:230) > at > org.slf4j.helpers.MessageFormatter.format(MessageFormatter.java:124) > at > org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:228) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:45) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > [snip - this repeats until stack overflow] > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:64) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalNode.<init>(IntervalNode.java:62) > at > org.apache.cassandra.utils.IntervalTree.IntervalTree.<init>(IntervalTree.java:39) > at > org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:560) > at > org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:617) > at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:320) > at > org.apache.cassandra.db.DataTracker.addInitialSSTables(DataTracker.java:259) > at > org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:234) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:331) > at > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:309) > at org.apache.cassandra.db.Table.initCf(Table.java:367) > at org.apache.cassandra.db.Table.<init>(Table.java:299) > at org.apache.cassandra.db.Table.open(Table.java:114) > at org.apache.cassandra.db.Table.open(Table.java:97) > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:204) > at > org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:254) > ... 5 more > Cannot load daemon > Service exit with a return value of 3 > Running with assertions enabled allows me to start the instance but when > doing so I get errors such as: > ERROR 01:22:22,753 Exception in thread > Thread[SSTableBatchOpen:2,5,main]java.lang.AssertionError: SSTable first key > DecoratedKey(100294972947100949193477090306072672386, > 4fcf051ef5067d7f17d9fc35) > last key > DecoratedKey(90250429663386465697464050082134975058, 4fce996e3c1eed8c4b17dd66) > at org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:412) > at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:187) > at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:225) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > and: > ERROR 01:27:58,946 Exception in thread Thread[CompactionExecutor:9,1,main] > java.lang.AssertionError: Last written key > DecoratedKey(81958437188197992567937826278457419048, > 4fa1aebad23f81e4321d344d) >= current key > DecoratedKey(64546479828744423263742604083767363606, > 4fcafc0f19f6a8092d4d4f94) writing into > /var/lib/XX/data/cassandra/ks1/cf1/ks1-cf1-tmp-hd-657317-Data.db > at > org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134) > at > org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153) > at > org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159) > at > org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50) > at > org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:150) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > Just like the initial errors compactions appear to stop occurring after this > point. > Given the above this looks like sstables are getting corrupted. By > restarting nodes I am able to identify several hundred sstables exhibiting > the same problem and this appears to be growing. > I have tried scrubbing those affected nodes but the problem continues to > occur. If this is due to sstable corruptions is there another way of > validating sstables for correctness? Given that it has spread to various > servers in other DC's it looks like this is directly related to the 1.1.1 > upgrade recently performed on the ring. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira