[ https://issues.apache.org/jira/browse/ACCUMULO-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032391#comment-15032391 ]
ASF GitHub Bot commented on ACCUMULO-4065: ------------------------------------------ Github user joshelser commented on the pull request: https://github.com/apache/accumulo/pull/54#issuecomment-160749591 As far as testing goes, I haven't been able to recreate the original situation inside of Accumulo. I have been able to verify that what we were doing does not work. I've done some testing locally found some bugs along the way. I am presently running some CI, and will try to kick off a CI run against some >1 nodes later today. > Strange temporary errors in Master after upgrade > ------------------------------------------------ > > Key: ACCUMULO-4065 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4065 > Project: Accumulo > Issue Type: Bug > Components: master > Affects Versions: 1.6.4, 1.7.0 > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 1.6.5, 1.7.1, 1.8.0 > > > I'm running into a problem that I saw quite a while back in ACCUMULO-3653 > I'm still trying to understand what happened, but what I understand so far is > that, Accumulo was running, a newer version was installed beside the running > version, Accumulo was stopped, the symlink changed, and the new version was > started. After this, we started seeing a number of errors in the Master. > Shortly after that, the cluster was restarted and the errors stopped > happening. > This is what I can extract from the logs: > {noformat} > 2015-11-19 22:42:47,115 [rpc.TServerUtils] DEBUG: Instantiating default, > unsecure custom half-async Thrift server > 2015-11-19 22:42:47,122 [master.Master] INFO : Started replication > coordinator service at host3:10001 > 2015-11-19 22:42:47,158 [master.Master] ERROR: Error processing table state > for store Normal Tablets > java.lang.RuntimeException: java.lang.RuntimeException: Failed to create > iterator > at > org.apache.accumulo.server.master.state.MetaDataTableScanner.<init>(MetaDataTableScanner.java:72) > at > org.apache.accumulo.server.master.state.MetaDataTableScanner.<init>(MetaDataTableScanner.java:56) > at > org.apache.accumulo.server.master.state.MetaDataStateStore.iterator(MetaDataStateStore.java:62) > at > org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:172) > Caused by: java.lang.RuntimeException: Failed to create iterator > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.<init>(TabletServerBatchReaderIterator.java:158) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReader.iterator(TabletServerBatchReader.java:115) > at > org.apache.accumulo.server.master.state.MetaDataTableScanner.<init>(MetaDataTableScanner.java:66) > ... 3 more > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: > Error on server host3:9997 > at > org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:116) > at > org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:95) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:463) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocationAndCheckLock(TabletLocatorImpl.java:634) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:625) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:280) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.binRanges(TabletLocatorImpl.java:355) > at > org.apache.accumulo.core.client.impl.TimeoutTabletLocator.binRanges(TimeoutTabletLocator.java:100) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.binRanges(TabletServerBatchReaderIterator.java:233) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.lookup(TabletServerBatchReaderIterator.java:220) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.<init>(TabletServerBatchReaderIterator.java:154) > ... 5 more > Caused by: org.apache.thrift.TApplicationException: Internal error processing > flush > at > org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:232) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:208) > at > org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:98) > ... 15 more > 2015-11-19 22:42:47,178 [impl.ThriftScanner] DEBUG: Scan failed, not serving > tablet (+r<<,host4:9997,35121a475360010) > 2015-11-19 22:42:47,202 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : NotServingTabletException(extent:TKeyExtent(table:2B 72, > endRow:null, prevEndRow:null)) > 2015-11-19 22:42:47,283 [impl.ThriftScanner] DEBUG: Scan failed, not serving > tablet (+r<<,host4:9997,35121a475360010) > 2015-11-19 22:42:47,372 [impl.TabletServerBatchReaderIterator] DEBUG: Server > : host4:9997 msg : startMultiScan failed: unknown result > org.apache.thrift.TApplicationException: startMultiScan failed: unknown result > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:42:47,373 [impl.TabletServerBatchReaderIterator] WARN : Error > on server host4:9997 > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server > host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: startMultiScan failed: > unknown result > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:42:47,376 [master.Master] ERROR: Error processing table state > for store Metadata Tablets > java.lang.RuntimeException: > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server > host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:181) > at > org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:121) > at > org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:173) > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: > Error on server host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: startMultiScan failed: > unknown result > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:324) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > {noformat} > A bit later: > {noformat} > 2015-11-19 22:43:04,572 [recovery.RecoveryManager] DEBUG: Recovering > hdfs://mycluster/apps/accumulo/data/wal/host4+9997/a2831ffa-c980-47bf-9f33-14716a0df6ec > to > hdfs://mycluster/apps/accumulo/data/recovery/a2831ffa-c980-47bf-9f33-14716a0df6ec > 2015-11-19 22:43:04,575 [impl.TabletServerBatchReaderIterator] DEBUG: Server > : host4:9997 msg : closeMultiScan failed: out of sequence response > org.apache.thrift.TApplicationException: closeMultiScan failed: out of > sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:04,575 [impl.TabletServerBatchReaderIterator] WARN : Error > on server host4:9997 > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server > host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: closeMultiScan failed: > out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681) > ... 6 more > 2015-11-19 22:43:04,576 [master.Master] ERROR: Error processing table state > for store Metadata Tablets > java.lang.RuntimeException: > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server > host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:181) > at > org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:121) > at > org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:173) > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: > Error on server host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:695) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: closeMultiScan failed: > out of sequence response > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:76) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeMultiScan(TabletClientService.java:371) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeMultiScan(TabletClientService.java:357) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:681) > ... 6 more > 2015-11-19 22:43:04,882 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got c > 2015-11-19 22:43:04,985 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got 0 > 2015-11-19 22:43:05,089 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got 16 > 2015-11-19 22:43:05,192 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffffd6 > 2015-11-19 22:43:05,296 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got fffffff1 > 2015-11-19 22:43:05,399 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffffb7 > 2015-11-19 22:43:05,502 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffffe4 > 2015-11-19 22:43:05,605 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffff98 > 2015-11-19 22:43:05,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server > : host4:9997 msg : Expected protocol id ffffff82 but got fffffff7 > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 > but got fffffff7 > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:05,688 [impl.TabletServerBatchReaderIterator] DEBUG: > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 > but got fffffff7 > java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got fffffff7 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol > id ffffff82 but got fffffff7 > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:43:05,708 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffffcf > 2015-11-19 22:43:05,793 [impl.TabletServerBatchReaderIterator] DEBUG: Server > : host4:9997 msg : Expected protocol id ffffff82 but got ffffffc6 > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 > but got ffffffc6 > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:05,794 [impl.TabletServerBatchReaderIterator] DEBUG: > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 > but got ffffffc6 > java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffffc6 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol > id ffffff82 but got ffffffc6 > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:43:05,810 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got ffffffd4 > 2015-11-19 22:43:05,913 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got 1 > 2015-11-19 22:43:05,960 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got 1c > 2015-11-19 22:43:05,997 [impl.TabletServerBatchReaderIterator] DEBUG: Server > : host4:9997 msg : Expected protocol id ffffff82 but got 19 > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 > but got 19 > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:05,998 [impl.TabletServerBatchReaderIterator] DEBUG: > org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 > but got 19 > java.io.IOException: org.apache.thrift.protocol.TProtocolException: Expected > protocol id ffffff82 but got 19 > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:702) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.protocol.TProtocolException: Expected protocol > id ffffff82 but got 19 > at > org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) > at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:317) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:297) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:634) > ... 6 more > 2015-11-19 22:43:06,006 [master.Master] WARN : Lost servers > [host5:9997[25121a475480008]] > {noformat} > And even later > {noformat} > 2015-11-19 22:43:41,810 [tracer.ZooTraceClient] DEBUG: Processing event for > trace server zk watch > 2015-11-19 22:43:41,812 [tracer.ZooTraceClient] DEBUG: Scanning trace hosts > in zookeeper: /tracers > 2015-11-19 22:43:41,813 [tracer.ZooTraceClient] DEBUG: Trace hosts: > [10.240.0.76:12234, 10.240.0.76:12234] > 2015-11-19 22:43:42,066 [impl.TabletServerBatchReaderIterator] WARN : null > column family > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.<init>(Key.java:391) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:42,070 [master.Master] ERROR: Error processing table state > for store Metadata Tablets > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.<init>(Key.java:391) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:43,178 [impl.TabletServerBatchReaderIterator] WARN : null > column family > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.<init>(Key.java:391) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:43,178 [master.Master] ERROR: Error processing table state > for store Metadata Tablets > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.<init>(Key.java:391) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:43:44,284 [impl.TabletServerBatchReaderIterator] WARN : null > column family > java.lang.IllegalArgumentException: null column family > at org.apache.accumulo.core.data.Key.<init>(Key.java:391) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:647) > at > org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:349) > at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > {noformat} > And even more > {noformat} > 2015-11-19 22:44:05,375 [recovery.RecoveryManager] DEBUG: Recovering > hdfs://mycluster/apps/accumulo/data/wal/host4+9997/a2831ffa-c980-47bf-9f33-14716a0df6ec > to > hdfs://mycluster/apps/accumulo/data/recovery/a2831ffa-c980-47bf-9f33-14716a0df6ec > 2015-11-19 22:44:05,385 [master.Master] DEBUG: 2 assigned to dead servers: > [!0;~<@(null,host4:9997[35121a475360010],host4:9997[35121a475360010]), > !0<;~@(null,host5:9997[25121a475480008],host5:9997[25121a475480008])]... > 2015-11-19 22:44:05,405 [impl.TabletServerBatchWriter] ERROR: Server side > error on host4:9997: org.apache.thrift.TApplicationException: startUpdate > failed: unknown result > 2015-11-19 22:44:05,405 [master.Master] ERROR: Error processing table state > for store Metadata Tablets > org.apache.accumulo.server.master.state.DistributedStoreException: > org.apache.accumulo.core.client.MutationsRejectedException: # constraint > violations : 0 security codes: {} # server errors 1 # exceptions 0 > at > org.apache.accumulo.server.master.state.MetaDataStateStore.unassign(MetaDataStateStore.java:139) > at > org.apache.accumulo.master.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:738) > at > org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:295) > Caused by: org.apache.accumulo.core.client.MutationsRejectedException: # > constraint violations : 0 security codes: {} # server errors 1 # exceptions > 0 > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:550) > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter.close(TabletServerBatchWriter.java:361) > at > org.apache.accumulo.core.client.impl.BatchWriterImpl.close(BatchWriterImpl.java:54) > at > org.apache.accumulo.server.master.state.MetaDataStateStore.unassign(MetaDataStateStore.java:137) > ... 2 more > 2015-11-19 22:44:05,406 [impl.TabletServerBatchWriter] ERROR: Failed to send > tablet server host4:9997 its batch : Error on server host4:9997 > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server > host4:9997 > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:950) > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.access$1900(TabletServerBatchWriter.java:629) > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:816) > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:780) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.thrift.TApplicationException: startUpdate failed: > unknown result > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startUpdate(TabletClientService.java:403) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startUpdate(TabletClientService.java:381) > at > org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:893) > ... 9 more > {noformat} > And, curiously, after this exception, things seem to get happy: > {noformat} > 2015-11-19 22:46:35,247 [transport.TIOStreamTransport] WARN : Error closing > output stream. > java.io.IOException: The stream is closed > at > org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:118) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at java.io.FilterOutputStream.close(FilterOutputStream.java:158) > at > org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110) > at > org.apache.thrift.transport.TFramedTransport.close(TFramedTransport.java:89) > at > org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.close(ThriftTransportPool.java:309) > at > org.apache.accumulo.core.client.impl.ThriftTransportPool.returnTransport(ThriftTransportPool.java:571) > at > org.apache.accumulo.core.rpc.ThriftUtil.returnClient(ThriftUtil.java:147) > at > org.apache.accumulo.core.client.impl.ThriftScanner.getBatchFromServer(ThriftScanner.java:113) > at > org.apache.accumulo.core.metadata.MetadataLocationObtainer.lookupTablet(MetadataLocationObtainer.java:95) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocation(TabletLocatorImpl.java:463) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.lookupTabletLocationAndCheckLock(TabletLocatorImpl.java:634) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl._locateTablet(TabletLocatorImpl.java:620) > at > org.apache.accumulo.core.client.impl.TabletLocatorImpl.locateTablet(TabletLocatorImpl.java:439) > at org.apache.accumulo.core.client.impl.Writer.update(Writer.java:88) > at > org.apache.accumulo.server.util.MetadataTableUtil.update(MetadataTableUtil.java:153) > at > org.apache.accumulo.server.util.MetadataTableUtil.update(MetadataTableUtil.java:145) > at > org.apache.accumulo.server.util.MetadataTableUtil.addTablet(MetadataTableUtil.java:211) > at > org.apache.accumulo.master.tableOps.PopulateMetadata.call(PopulateMetadata.java:43) > at > org.apache.accumulo.master.tableOps.PopulateMetadata.call(PopulateMetadata.java:25) > at > org.apache.accumulo.master.tableOps.TraceRepo.call(TraceRepo.java:57) > at org.apache.accumulo.fate.Fate$TransactionRunner.run(Fate.java:72) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > at java.lang.Thread.run(Thread.java:745) > 2015-11-19 22:46:35,249 [impl.ThriftScanner] DEBUG: Error getting transport > to host4:9997 : org.apache.thrift.transport.TTransportException: > java.net.SocketTimeoutException: 120000 millis timeout while wai > ting for channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/10.240.0.76:40610 > remote=host4/10.240.0.77:9997] > 2015-11-19 22:46:35,258 [replication.ReplicationDriver] ERROR: Caught > Exception trying to create Replication status records > java.lang.RuntimeException: > org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server > host5:9997 > at > org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:161) > at > org.apache.accumulo.master.replication.StatusMaker.run(StatusMaker.java:94) > at > org.apache.accumulo.master.replication.ReplicationDriver.run(ReplicationDriver.java:87) > Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: > Error on server host5:9997 > at > org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:293) > at > org.apache.accumulo.core.client.impl.ScannerIterator$Reader.run(ScannerIterator.java:80) > at > org.apache.accumulo.core.client.impl.ScannerIterator.hasNext(ScannerIterator.java:151) > ... 2 more > Caused by: org.apache.thrift.TApplicationException: Internal error processing > flush > at > org.apache.thrift.TApplicationException.read(TApplicationException.java:111) > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:232) > at > org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:208) > at > org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:410) > at > org.apache.accumulo.core.client.impl.ThriftScanner.scan(ThriftScanner.java:285) > ... 4 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)