[ https://issues.apache.org/jira/browse/ACCUMULO-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212874#comment-13212874 ]
Keith Turner commented on ACCUMULO-422: --------------------------------------- Found another case where bulk import was failing because a tablet server died. {noformat} 21 19:21:18,386 [fate.Fate] WARN : Failed to execute Repo, tid=7427f4b91dc2fbb0 java.lang.NullPointerException at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:286) at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:262) at org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50) at org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) 21 19:21:18,398 [thrift.MasterClientService$Processor] ERROR: Internal error processing waitForTableOperation java.lang.NullPointerException at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:286) at org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:262) at org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50) at org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62) at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) at java.lang.Thread.run(Thread.java:662) {noformat} This error propogated to the randomwalk test client causing it to die. {noformat} 21 19:21:18,411 [randomwalk.Framework] ERROR: Error during random walk java.lang.Exception: Error running node Concurrent.xml at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259) at org.apache.accumulo.server.test.randomwalk.Framework.run(Framework.java:61) at org.apache.accumulo.server.test.randomwalk.Framework.main(Framework.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.accumulo.start.Main$1.run(Main.java:89) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.Exception: Error running node ct.BulkImport at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259) at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251) ... 8 more Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error processing waitForTableOperation at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:293) at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:261) at org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperationsImpl.java:938) at org.apache.accumulo.server.test.randomwalk.concurrent.BulkImport.visit(BulkImport.java:132) at org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251) ... 9 more Caused by: org.apache.thrift.TApplicationException: Internal error processing waitForTableOperation at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684) at org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83) at $Proxy1.waitForTableOperation(Unknown Source) at org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:233) at org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:275) ... 13 more {noformat} > Bulk import failing when tablet server dies > ------------------------------------------- > > Key: ACCUMULO-422 > URL: https://issues.apache.org/jira/browse/ACCUMULO-422 > Project: Accumulo > Issue Type: Bug > Environment: 10 node cluster running 1.4.0-SNAPSHOT > Reporter: Keith Turner > Labels: 14_qa_bug > Fix For: 1.4.0 > > > Saw this issue while running random walk test w/ agitation. The bulk import > code picks random tablet servers and ask them to bulk load files. If a > tablet server dies it takes 30 seconds for the master to see the zookeeper > lock was lost. During this 30 second period the bulk import code will still > try to use the tserver and fail. After it fails three times it will mark the > file as a failure. This all happens within a second. > The bulk import code should probably catch TTransportException and black list > the tablet server for that bulk import transaction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira