[ https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235276#comment-15235276 ]
Robert Stupp commented on CASSANDRA-6716: ----------------------------------------- One other issue is that {{nodetool netstats}} shows exactly one pending message for the large pool for one endpoint (except the node itself). It doesn't timeout and get removed from the outgoing-tcp-pool nor does it get removed when the endpoint's restarted. > snapshots constantly fail with "Tried to hard link to file that does not > exist" > ------------------------------------------------------------------------------- > > Key: CASSANDRA-6716 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6716 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK > 1.7 > Reporter: Nikolai Grigoriev > Attachments: system.log.gz > > > It seems that since recently I have started getting a number of exceptions > like "File not found" on all Cassandra nodes. Currently I am getting an > exception like this every couple of seconds on each node, for different > keyspaces and CFs. > I have tried to restart the nodes, tried to scrub them. No luck so far. It > seems that scrub cannot complete on any of these nodes, at some point it > fails because of the file that it can't find. > One one of the nodes currently the "nodetool scrub" command fails instantly > and consistently with this exception: > {code} > # /opt/cassandra/bin/nodetool scrub > Exception in thread "main" java.lang.RuntimeException: Tried to hard link to > file that does not exist > /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db > at > org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75) > at > org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215) > at > org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826) > at > org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122) > at > org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) > at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) > at > com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) > at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) > at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) > at > javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487) > at > javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97) > at > javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328) > at > javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420) > at > javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848) > at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322) > at sun.rmi.transport.Transport$1.run(Transport.java:177) > at sun.rmi.transport.Transport$1.run(Transport.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at sun.rmi.transport.Transport.serviceCall(Transport.java:173) > at > sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808) > at > sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) > {code} > Also I have noticed that the files that are missing are often (or maybe > always?) referred to in the log as follows: > {quote} > WARN 00:06:10,597 At level 3, > SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db') > [DecoratedKey(-9053060597280257896, > 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300), > DecoratedKey(-8855915848970248008, > 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)] > overlaps > SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db') > [DecoratedKey(-8964446543595889729, > 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100), > DecoratedKey(-8848132752710859808, > 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)]. > This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the > fact that you have dropped sstables from another node into the data > directory. Sending back to L0. If you didn't drop in sstables, and have not > yet run scrub, you should do so since you may also have rows out-of-order > within an sstable > WARN [RMI TCP Connection(2)-10.3.45.158] 2014-02-18 00:06:10,597 > LeveledManifest.java (line 171) At level 3, > SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db') > [DecoratedKey(-9053060597280257896, > 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300), > DecoratedKey(-8855915848970248008, > 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)] > overlaps > SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db') > [DecoratedKey(-8964446543595889729, > 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100), > DecoratedKey(-8848132752710859808, > 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)]. > This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the > fact that you have dropped sstables from another node into the data > directory. Sending back to L0. If you didn't drop in sstables, and have not > yet run scrub, you should do so since you may also have rows out-of-order > within an sstable > {quote} > I never had anything but Cassandra 2.0 on these systems. Also I have > recreated my test data from scratch with 2.0.4. -- This message was sent by Atlassian JIRA (v6.3.4#6332)