[jira] [Comment Edited] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data

Viktor Kuzmin (JIRA) Mon, 10 Mar 2014 02:41:23 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925591#comment-13925591
 ]


Viktor Kuzmin edited comment on CASSANDRA-6285 at 3/10/14 9:40 AM:
-------------------------------------------------------------------

Attached patches for on-heap disruptor.

First pacth (disruptor-high-cpu.patch) turns off any key interests in case 
we're waiting for message to be processed. We need that cause processing may be 
delayed in case of high load and there may be something available to read from 
stream. In that case we'll have 100% cpu core usage.

Second patch (disruptor-memory-corruption.patch) makes copy from off-heap 
ByteBuffer when reading binary data. This binary data may be stored inside 
cassandra as is even after message processing. And binary data can be corrupted 
- cause it's memory may be already deallocated.


was (Author: kvaster):
Attached patches for on-heap disruptor.

First pacth turns off any key interests in case we're waiting for message to be 
processed. We need that cause processing may be delayed in case of high load 
and there may be something available to read from stream. In that case we'll 
have 100% cpu core usage.

Second patch makes copy from off-heap ByteBuffer when reading binary data. This 
binary data may be stored inside cassandra as is even after message processing. 
And binary data can be corrupted - cause it's memory may be already deallocated.

> 2.0 HSHA server introduces corrupt data
> ---------------------------------------
>
>                 Key: CASSANDRA-6285
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6285
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 4 nodes, shortly updated from 1.2.11 to 2.0.2
>            Reporter: David Sauer
>            Assignee: Pavel Yaskevich
>            Priority: Critical
>             Fix For: 2.0.6
>
>         Attachments: 6285_testnotes1.txt, 
> CASSANDRA-6285-disruptor-heap.patch, compaction_test.py, 
> disruptor-high-cpu.patch, disruptor-memory-corruption.patch
>
>
> After altering everything to LCS the table OpsCenter.rollups60 amd one other 
> none OpsCenter-Table got stuck with everything hanging around in L0.
> The compaction started and ran until the logs showed this:
> ERROR [CompactionExecutor:111] 2013-11-01 19:14:53,865 CassandraDaemon.java 
> (line 187) Exception in thread Thread[CompactionExecutor:111,1,RMI Runtime]
> java.lang.RuntimeException: Last written key 
> DecoratedKey(1326283851463420237, 
> 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574426c6f6f6d46696c746572537061636555736564)
>  >= current key DecoratedKey(954210699457429663, 
> 37382e34362e3132382e3139382d6a7576616c69735f6e6f72785f696e6465785f323031335f31305f30382d63616368655f646f63756d656e74736c6f6f6b75702d676574546f74616c4469736b5370616365557365640b0f)
>  writing into 
> /var/lib/cassandra/data/OpsCenter/rollups60/OpsCenter-rollups60-tmp-jb-58656-Data.db
>       at 
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:141)
>       at 
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:164)
>       at 
> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:160)
>       at 
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>       at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
>       at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$6.runMayThrow(CompactionManager.java:296)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:724)
> Moving back to STC worked to keep the compactions running.
> Especialy my own Table i would like to move to LCS.
> After a major compaction with STC the move to LCS fails with the same 
> Exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-6285) 2.0 HSHA server introduces corrupt data

Reply via email to