[ 
https://issues.apache.org/jira/browse/CASSANDRA-13696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-13696:
-----------------------------------
    Description: 
{noformat}
WARN  [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed 
to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table 
with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file 
a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints
ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 
HintsDispatchExecutor.java:234 - Failed to dispatch hints file 
a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted 
({})
org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch 
exception
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199)
 ~[main/:na]
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164)
 ~[main/:na]
    at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139)
 ~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268)
 [main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251)
 [main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229)
 [main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208)
 [main/:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_111]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_111]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_111]
    at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
 [main/:na]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
Caused by: java.io.IOException: Digest mismatch exception
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216)
 ~[main/:na]
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190)
 ~[main/:na]
    ... 16 common frames omitted
{noformat}

It causes multiple cassandra nodes stop [by 
default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188].

Here is the reproduce steps on a 3 nodes cluster, RF=3:
1. stop node1
2. send some data with quorum (or one), it will generate hints file on 
node2/node3
3. drop the table
4. start node1

node2/node3 will report "corrupted hints file" and stop. The impact is very bad 
for a large cluster, when it happens, almost all the nodes are down at the same 
time and we have to remove all the hints files (which contain the dropped 
table) to bring the node back.


  was:
{noformat}
WARN  [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - Failed 
to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - table 
with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file 
a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints
ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 
HintsDispatchExecutor.java:234 - Failed to dispatch hints file 
a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted 
({})
org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch 
exception
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199)
 ~[main/:na]
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164)
 ~[main/:na]
    at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139)
 ~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) 
~[main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268)
 [main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251)
 [main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229)
 [main/:na]
    at 
org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208)
 [main/:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_111]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_111]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_111]
    at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
 [main/:na]
    at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
Caused by: java.io.IOException: Digest mismatch exception
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216)
 ~[main/:na]
    at 
org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190)
 ~[main/:na]
    ... 16 common frames omitted
{noformat}

It causes multiple cassandra nodes stop [by 
default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188]:

Here is the reproduce steps on a 3 nodes cluster, RF=3:
1. stop node1
2. send some data with quorum (or one)
3. drop the table
4. start node1

node2 and node3 will report "corrupted hints file" and stop. The impact is very 
bad for a large cluster, when it happens, almost all the nodes are down at the 
same time.



> Digest mismatch Exception if hints file has UnknownColumnFamily
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-13696
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13696
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jay Zhuang
>            Assignee: Jay Zhuang
>            Priority: Critical
>
> {noformat}
> WARN  [HintsDispatcher:2] 2017-07-16 22:00:32,579 HintsReader.java:235 - 
> Failed to read a hint for /127.0.0.2: a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0 - 
> table with id 3882bbb0-6a71-11e7-9bca-2759083e3964 is unknown in file 
> a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints
> ERROR [HintsDispatcher:2] 2017-07-16 22:00:32,580 
> HintsDispatchExecutor.java:234 - Failed to dispatch hints file 
> a2b7daf1-a6a4-4dfc-89de-32d12d2d48b0-1500242103097-1.hints: file is corrupted 
> ({})
> org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch 
> exception
>     at 
> org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:199)
>  ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:164)
>  ~[main/:na]
>     at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:157)
>  ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:139)
>  ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:123) 
> ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:95) 
> ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:268)
>  [main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:251)
>  [main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:229)
>  [main/:na]
>     at 
> org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:208)
>  [main/:na]
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_111]
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_111]
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_111]
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_111]
>     at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  [main/:na]
>     at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_111]
> Caused by: java.io.IOException: Digest mismatch exception
>     at 
> org.apache.cassandra.hints.HintsReader$HintsIterator.computeNextInternal(HintsReader.java:216)
>  ~[main/:na]
>     at 
> org.apache.cassandra.hints.HintsReader$HintsIterator.computeNext(HintsReader.java:190)
>  ~[main/:na]
>     ... 16 common frames omitted
> {noformat}
> It causes multiple cassandra nodes stop [by 
> default|https://github.com/apache/cassandra/blob/cassandra-3.0/conf/cassandra.yaml#L188].
> Here is the reproduce steps on a 3 nodes cluster, RF=3:
> 1. stop node1
> 2. send some data with quorum (or one), it will generate hints file on 
> node2/node3
> 3. drop the table
> 4. start node1
> node2/node3 will report "corrupted hints file" and stop. The impact is very 
> bad for a large cluster, when it happens, almost all the nodes are down at 
> the same time and we have to remove all the hints files (which contain the 
> dropped table) to bring the node back.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to