[ 
https://issues.apache.org/jira/browse/CASSANDRA-6716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005971#comment-14005971
 ] 

Nikolai Grigoriev commented on CASSANDRA-6716:
----------------------------------------------

I have made two more observations, one of them may be unrelated, but still:

1. I had tons of these exceptions when doing compaction or scrubbing on some of 
the nodes. Disabling Datastax agent on them and restarting the nodes eliminated 
the exceptions completely. All under heavy load.

2. Just started having these exceptions again on one of the nodes after a minor 
configuration change (compaction throughput) and restarting the node. Restarted 
again - same thing, several exceptions per second, all FileNotFoundException 
when compacting. Stopped the node. Removed the caches stored in 
/var/lib/cassandra/saved_caches. Started the node. Not a single exception in 
~1,5 hours. Again, all this under heavy load.

Now I am wondering - where else a reference to a non-existing sstable can be 
except the cache? If simple restart does not help and the filesystem really 
does not have the file the server tries to access - then it cannot be something 
about in-memory cache being out of sync, so it's got to be the persistent one.

> nodetool scrub constantly fails with RuntimeException (Tried to hard link to 
> file that does not exist)
> ------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6716
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6716
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra 2.0.5 (built from source), Linux, 6 nodes, JDK 
> 1.7
>            Reporter: Nikolai Grigoriev
>         Attachments: system.log.gz
>
>
> It seems that since recently I have started getting a number of exceptions 
> like "File not found" on all Cassandra nodes. Currently I am getting an 
> exception like this every couple of seconds on each node, for different 
> keyspaces and CFs.
> I have tried to restart the nodes, tried to scrub them. No luck so far. It 
> seems that scrub cannot complete on any of these nodes, at some point it 
> fails because of the file that it can't find.
> One one of the nodes currently the "nodetool scrub" command fails  instantly 
> and consistently with this exception:
> {code}
> # /opt/cassandra/bin/nodetool scrub 
> Exception in thread "main" java.lang.RuntimeException: Tried to hard link to 
> file that does not exist 
> /mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28049-Data.db
>       at 
> org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75)
>       at 
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:1215)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1826)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:1122)
>       at 
> org.apache.cassandra.service.StorageService.scrub(StorageService.java:2159)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
>       at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
>       at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
>       at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
>       at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
>       at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
>       at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
>       at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
>       at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
>       at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
>       at sun.rmi.transport.Transport$1.run(Transport.java:177)
>       at sun.rmi.transport.Transport$1.run(Transport.java:174)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
>       at 
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
>       at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:724)
> {code}
> Also I have noticed that the files that are missing are often (or maybe 
> always?) referred to in the log as follows:
> {quote}
>  WARN 00:06:10,597 At level 3, 
> SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db')
>  [DecoratedKey(-9053060597280257896, 
> 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300),
>  DecoratedKey(-8855915848970248008, 
> 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)]
>  overlaps 
> SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db')
>  [DecoratedKey(-8964446543595889729, 
> 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100),
>  DecoratedKey(-8848132752710859808, 
> 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)].
>   This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the 
> fact that you have dropped sstables from another node into the data 
> directory. Sending back to L0.  If you didn't drop in sstables, and have not 
> yet run scrub, you should do so since you may also have rows out-of-order 
> within an sstable
>  WARN [RMI TCP Connection(2)-10.3.45.158] 2014-02-18 00:06:10,597 
> LeveledManifest.java (line 171) At level 3, 
> SSTableReader(path='/mnt/disk5/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-26776-Data.db')
>  [DecoratedKey(-9053060597280257896, 
> 0010f582cddaca974d7198ae30f194ccfd0c00001000000000004b818d00000000000000010000100000000000004000000000000000000300),
>  DecoratedKey(-8855915848970248008, 
> 00103ce153dfeeb547fb881a51adf611f6cf0000100000000000f04f4700000000000000010000100000000000004000000000000000000500)]
>  overlaps 
> SSTableReader(path='/mnt/disk2/cassandra/data/mykeyspace_jmeter/test_contacts/mykeyspace_jmeter-test_contacts-jb-28022-Data.db')
>  [DecoratedKey(-8964446543595889729, 
> 001043214a8bdcfd46a3b8ea71da2d57bb9a0000100000000001117c0d00000000000000000000100000000000004000000000000000000100),
>  DecoratedKey(-8848132752710859808, 
> 0010d1f6de8039d54218bf5b1e184335df5f000010000000000062e52600000000000000010000100000000000004000000000000000000400)].
>   This could be caused by a bug in Cassandra 1.1.0 .. 1.1.3 or due to the 
> fact that you have dropped sstables from another node into the data 
> directory. Sending back to L0.  If you didn't drop in sstables, and have not 
> yet run scrub, you should do so since you may also have rows out-of-order 
> within an sstable
> {quote}
> I never had anything but Cassandra 2.0 on these systems. Also I have 
> recreated my test data from scratch with 2.0.4.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to