[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Stupp reopened CASSANDRA-6716: ------------------------------------- Seen this issue more than once now on 2.1 + 2.2 cluster. It seems that restarting nodes sometimes helps. Hope to be able to get my hands on one of these clusters soon. There was no "bad" activity on these clusters (like s/o removing sstables manually). All snapshot related operations failed (i.e. nodetool snapshot/scrub/repair). By looking at the code it seems (just a wild guess!) that the {{View}} for the snapshot contains sstables that have been removed. Until now I don't know whether these were (recently) compacted or whether removal of sstables from the view occasionally fails. > nodetool scrub constantly fails with RuntimeException (Tried to hard link to > file that does not exist) > ------------------------------------------------------------------------------------------------------ > > Key: CASSANDRA-6716 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK > 1.7 > Reporter: Nikolai Grigoriev > Attachments: system.log.gz > > > It seems that since recently I have started getting a number of exceptions > like "File not found" on all Cassandra nodes. Currently I am getting an > exception like this every couple of seconds on each node, for different > keyspaces and CFs. > I have tried to restart the nodes, tried to scrub them. No luck so far. It > seems that scrub cannot complete on any of these nodes, at some point it > fails because of the file that it can't find. > One one of the nodes currently the "nodetool scrub" command fails instantly > and consistently with this exception: > {code} > # /opt/cassandra/bin/nodetool scrub > Exception in thread "main" java.lang.RuntimeException: Tried to hard link to > file that does not exist > /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db > at > org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) > at > org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) > at > org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) > at > org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) > at > org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) > at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) > at sun.rmi.transport.Transport$1.run(Transport.java:177) > at sun.rmi.transport.Transport$1.run(Transport.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:173) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > {code} > Also I have noticed that the files that are missing are often (or maybe > always?) referred to in the log as follows: > {quote} > WARN 00:06:10,597 At level 3, > SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db') > [DecoratedKey(-9053060597280257896, > 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300), > DecoratedKey(-8855915848970248008, > 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)] > overlaps > SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db') > [DecoratedKey(-8964446543595889729, > 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100), > DecoratedKey(-8848132752710859808, > 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)]. > This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the > fact that you have dropped sstables from another node into the data > directory. Sending back to L0. If you didn't drop in sstables, and have not > yet run scrub, you should do so since you may also have rows out-of-order > within an sstable > WARN [RMI TCP Connection(2)-10.3.45.158] 2014-02-18 00:06:10,597 > LeveledManifest.java (line 171) At level 3, > SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db') > [DecoratedKey(-9053060597280257896, > 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300), > DecoratedKey(-8855915848970248008, > 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)] > overlaps > SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db') > [DecoratedKey(-8964446543595889729, > 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100), > DecoratedKey(-8848132752710859808, > 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)]. > This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the > fact that you have dropped sstables from another node into the data > directory. Sending back to L0. If you didn't drop in sstables, and have not > yet run scrub, you should do so since you may also have rows out-of-order > within an sstable > {quote} > I never had anything but Cassandra 2.0 on these systems. Also I have > recreated my test data from scratch with 2.0.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)