[ https://issues.apache.org/jira/browse/CASSANDRA-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093491#comment-16093491 ]
Jay Zhuang commented on CASSANDRA-13696: ---------------------------------------- {quote} Question: does this happen in mixed version cluster or all the nodes actually have the same protocol version? {quote} All the nodes are on the same messagingVersion {{[VERSION_3014|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/net/MessagingService.java#L95]}} It can be reproduced with a new 3.0.14 cluster. > Digest mismatch Exception if hints file has UnknownColumnFamily > --------------------------------------------------------------- > > Key: CASSANDRA-13696 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13696 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jay Zhuang > Assignee: Jay Zhuang > Priority: Blocker > Fix For: 3.0.x, 3.11.x, 4.x > > > {noformat} > WARN [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - > Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - > table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file > a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints > ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 > HintsDispatchExecutor.java:234 - Failed to dispatch hints file > a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted > ({}) > org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch > exception > at > org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199) > ~[main/:na] > at > org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164) > ~[main/:na] > at > org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) > ~[main/:na] > at > org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) > ~[main/:na] > at > org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139) > ~[main/:na] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) > ~[main/:na] > at > org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) > ~[main/:na] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268) > [main/:na] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251) > [main/:na] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229) > [main/:na] > at > org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208) > [main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_111] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_111] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_111] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > [main/:na] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111] > Caused by: java.io.IOException: Digest mismatch exception > at > org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216) > ~[main/:na] > at > org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190) > ~[main/:na] > ... 16 common frames omitted > {noformat} > It causes multiple cassandra nodes stop [by > default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188]. > Here is the reproduce steps on a 3 nodes cluster, RF=3: > 1. stop node1 > 2. send some data with quorum (or one), it will generate hints file on > node2/node3 > 3. drop the table > 4. start node1 > node2/node3 will report "corrupted hints file" and stop. The impact is very > bad for a large cluster, when it happens, almost all the nodes are down at > the same time and we have to remove all the hints files (which contain the > dropped table) to bring the node back. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org