[ 
https://issues.apache.org/jira/browse/CASSANDRA-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218701#comment-14218701
 ] 

Alexander Sterligov commented on CASSANDRA-8337:
------------------------------------------------

This cluster were always running 2.1 - 2.1.0 and then 2.1.2.

Unfortunatelly I had to downgrade that 18 replicas today to 2.0.11 and the 
problem has gone. I have not succeeded to reproduce problem on a smaller setup.

Some time ago I tried incremental repairs, but had very same problems even 
faster than with full repair. I've unset repairedAt flag as described in 
documentation.

I also truncated some tables with consistency ALL, but after some write load 
exceptions start to raise again for these tables.

I tried sstablescrub for all sstables on totally offline cluster and it found 
no problems. Then parallel repair again failed due to the same exceptions. 
That's not corrupted sstables.

If I retry parallel repair over and over again it finally finishes with no 
errors. Also same effect if I do sequential repair and parallel repair right 
after it finishes. Also repair finishes successfully if the number of streams 
is not high.

There's no dendency to degradation - just spontaneous failures of repairs and 
annoying hanged streams.

Heavy write or read load doesn't directly cause such problems. But after heavy 
write load with low consistency repair fails.

Overall, it looks like reproduction scenario is large number of short streams 
(~30 per node for several seconds) and it seems 2.1 streaming has race 
condition.


> mmap underflow during validation compaction
> -------------------------------------------
>
>                 Key: CASSANDRA-8337
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8337
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alexander Sterligov
>            Assignee: Joshua McKenzie
>         Attachments: thread_dump
>
>
> During full parallel repair I often get errors like the following
> {quote}
> [2014-11-19 01:02:39,355] Repair session 116beaf0-6f66-11e4-afbb-c1c082008cbe 
> for range (3074457345618263602,-9223372036854775808] failed with error 
> org.apache.cassandra.exceptions.RepairException: [repair 
> #116beaf0-6f66-11e4-afbb-c1c082008cbe on iss/target_state_history, 
> (3074457345618263602,-9223372036854775808]] Validation failed in 
> /95.108.242.19
> {quote}
> At the log of the node there are always same exceptions:
> {quote}
> ERROR [ValidationExecutor:2] 2014-11-19 01:02:10,847 
> JVMStabilityInspector.java:94 - JVM state determined to be unstable.  Exiting 
> forcefully due to:
> org.apache.cassandra.io.sstable.CorruptSSTableException: java.io.IOException: 
> mmap segment underflow; remaining is 15 but 47 requested
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1518)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1385)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPositionsForRanges(SSTableReader.java:1315)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1706)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:1694)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionStrategy.getScanners(AbstractCompactionStrategy.java:276)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.db.compaction.WrappingCompactionStrategy.getScanners(WrappingCompactionStrategy.java:320)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:917)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:97)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:557)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
> ~[na:1.7.0_51]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_51]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_51]
>         at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
> Caused by: java.io.IOException: mmap segment underflow; remaining is 15 but 
> 47 requested
>         at 
> org.apache.cassandra.io.util.MappedFileDataInput.readBytes(MappedFileDataInput.java:135)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:348) 
> ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.utils.ByteBufferUtil.readWithShortLength(ByteBufferUtil.java:327)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         at 
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:1460)
>  ~[apache-cassandra-2.1.2.jar:2.1.2]
>         ... 13 common frames omitted
> {quote}
> Now i'm using die disk_failure_policy to determine such conditions faster, 
> but I get them even with stop policy.
> Streams related to host with such exception are hanged. Thread dump is 
> attached. Only restart helps.
> After retry I get errors from other nodes.
> scrub doesn't help and report that sstables are ok.
> Sequential repairs doesn't cause such exceptions.
> Load is about 1000 write rps and 50 read rps per node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to