[ https://issues.apache.org/jira/browse/CASSANDRA-15304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gianluigi Tiesi updated CASSANDRA-15304: ---------------------------------------- Platform: Java8,OpenJDK,Linux,x86,HDD (was: All) > Timout in receiveng streams while repairing causes corruption > ------------------------------------------------------------- > > Key: CASSANDRA-15304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15304 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair, Consistency/Streaming > Reporter: Gianluigi Tiesi > Priority: Normal > > I have 4-node cluster, when doing a repair node3 streams sstables to node1, > if node 3 hangs for some reason (in my case i/o cpu or gc) and timeouts (but > it can happens also because a network problem), node1 leaves corrupted > sstable files without notice. > When node1 start compacting, effectively show a corruption error: > {noformat} > org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: > /var/lib/cassandra/data/keyspace/cf-28189320ff8211e7961c1fd53f574685/md-15146-big-Data.db > at > org.apache.cassandra.io.util.CompressedChunkReader$Mmap.readChunk(CompressedChunkReader.java:227) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:158) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at org.apache.cassandra.cache.ChunkCache.load(ChunkCache.java:39) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > com.github.benmanes.caffeine.cache.BoundedLocalCache$BoundedLocalLoadingCache.lambda$new$0(BoundedLocalCache > {noformat} > by going back in the log I've discovered a timeout while node1 was writing > that file. > node3 at the same time says: > {noformat} > ERROR [ReadRepairStage:60] 2019-09-06 09:28:02,918 CassandraDaemon.java:228 - > Exception in thread Thread[ReadRepairStage:60,5,main] > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - > received only 1 responses. > at > org.apache.cassandra.service.DataResolver$RepairMergeListener.close(DataResolver.java:202) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.close(UnfilteredPartitionIterators.java:175) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.db.transform.BaseIterator.close(BaseIterator.java:92) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.DataResolver.compareResponses(DataResolver.java:79) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.service.AsyncRepairCallback$1.runMayThrow(AsyncRepairCallback.java:50) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_222] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[na:1.8.0_222] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) > ~[apache-cassandra-3.11.4.jar:3.11.4] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_222] > INFO [ScheduledTasks:1] 2019-09-06 09:28:02,920 MessagingService.java:1236 - > MUTATION messages were dropped in last 5000 ms: 12 internal and 0 cross node. > Mean internal dropped latency: 4085 ms and Mean cross-node dropped latency: > 4030 ms > INFO [ScheduledTasks:1] 2019-09-06 09:28:02,920 StatusLogger.java:47 - Pool > Name Active Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-09-06 09:28:02,924 StatusLogger.java:51 - > MutationStage 16 2475 595615310 0 > 0 > ... > ... > INFO [ScheduledTasks:1] 2019-09-06 09:28:07,930 MessagingService.java:1236 - > MUTATION messages were dropped in last 5000 ms: 621 internal and 783 cross > node. Mean internal dropped latency: 3176 ms and Mean cross-node dropped > latency: 2801 ms > .... > INFO [ScheduledTasks:1] 2019-09-06 09:28:12,939 MessagingService.java:1236 - > READ messages were dropped in last 5000 ms: 0 internal and 8 cross node. Mean > internal dropped latency: 0 ms and Mean cross-node dropped latency: 7247 ms > ... > {noformat} > this happens almost always and I'm often unable to scrub the sstable because > of > [CASSANDRA-15284|https://issues.apache.org/jira/projects/CASSANDRA/issues/CASSANDRA-15284] -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org