[ 
https://issues.apache.org/jira/browse/ACCUMULO-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846621#comment-13846621
 ] 

Keith Turner commented on ACCUMULO-1998:
----------------------------------------

[~supermallen] were you able to reproduce this yesterday?  How many tservers 
did you have? Were tservers restarted during this test?  Some other things 
useful to look would be when and where the root tablet was assigned.   Also 
could look in the root tablets walogs and try to find the mutations related to 
these files.   

> Exception while attempting to bring up 1.6.0 TabletServer after moderate 
> amount of work
> ---------------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1998
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1998
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Michael Allen
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> The reproduction steps around this are a little bit fuzzy but basically we 
> ran a moderate workload against a 1.6.0 server.  Encryption happened to be 
> turned on but that doesn't seem to be germane to the problem.  After doing a 
> moderate amount of work, Accumulo is refusing to start up, spewing this error 
> over and over to the log:
> {noformat}
> 2013-12-10 10:23:02,529 [tserver.TabletServer] WARN : exception while doing 
> multi-scan 
> java.lang.RuntimeException: java.io.IOException: Failed to open 
> hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf
>       at 
> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1125)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>       at 
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>       at 
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>       at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Failed to open 
> hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf
>       at 
> org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:333)
>       at 
> org.apache.accumulo.tserver.FileManager.access$500(FileManager.java:58)
>       at 
> org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:478)
>       at 
> org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:466)
>       at 
> org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:486)
>       at 
> org.apache.accumulo.tserver.Tablet$ScanDataSource.createIterator(Tablet.java:2027)
>       at 
> org.apache.accumulo.tserver.Tablet$ScanDataSource.iterator(Tablet.java:1989)
>       at 
> org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163)
>       at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1565)
>       at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1672)
>       at 
> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1114)
>       ... 6 more
> Caused by: java.io.FileNotFoundException: File does not exist: 
> /accumulo/tables/!0/table_info/A000042x.rf
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975)
>       at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967)
>       at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
>       at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
>       at 
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:256)
>       at 
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:143)
>       at 
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:212)
>       at 
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
>       at 
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:367)
>       at 
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:143)
>       at 
> org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:825)
>       at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
>       at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openReader(FileOperations.java:119)
>       at 
> org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:314)
>       ... 16 more
> {noformat}
> Here's some other pieces of context:
> HDFS contents:
> {noformat}
> ubuntu@ip-10-10-1-115:/data0/logs/accumulo$ hadoop fs -lsr /accumulo/tables/
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:32 /accumulo/tables/!0
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 01:06 
> /accumulo/tables/!0/default_tablet
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 10:49 
> /accumulo/tables/!0/table_info
> -rw-r--r--   5 accumulo hadoop       1698 2013-12-10 00:34 
> /accumulo/tables/!0/table_info/F0000000.rf
> -rw-r--r--   5 accumulo hadoop      43524 2013-12-10 01:53 
> /accumulo/tables/!0/table_info/F000062q.rf
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:32 /accumulo/tables/+r
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 10:45 
> /accumulo/tables/+r/root_tablet
> -rw-r--r--   5 accumulo hadoop       2070 2013-12-10 10:45 
> /accumulo/tables/+r/root_tablet/A0000738.rf
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:33 /accumulo/tables/1
> drwxr-xr-x   - accumulo hadoop          0 2013-12-10 00:33 
> /accumulo/tables/1/default_tablet
> {noformat}
> ZooKeeper entries
> {noformat}
> [zk: localhost:2181(CONNECTED) 6] get 
> /accumulo/371cfa3e-fe96-4a50-92e9-da7572589ffa/root_tablet/dir 
> hdfs://10.10.1.115:9000/accumulo/tables/+r/root_tablet
> cZxid = 0x1b
> ctime = Tue Dec 10 00:32:56 EST 2013
> mZxid = 0x1b
> mtime = Tue Dec 10 00:32:56 EST 2013
> pZxid = 0x1b
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 54
> numChildren = 0
> {noformat}
> I'm going to preserve the state of this machine in HDFS for a while but not 
> forever, so if there are other pieces of context people need, let me know.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to