[ 
https://issues.apache.org/jira/browse/ACCUMULO-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212874#comment-13212874
 ] 

Keith Turner commented on ACCUMULO-422:
---------------------------------------

Found another case where bulk import was failing because a tablet server died.  

{noformat}
21 19:21:18,386 [fate.Fate] WARN : Failed to execute Repo, tid=7427f4b91dc2fbb0
java.lang.NullPointerException
        at 
org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:286)
        at 
org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:262)
        at 
org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50)
        at 
org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62)
        at 
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)

21 19:21:18,398 [thrift.MasterClientService$Processor] ERROR: Internal error 
processing waitForTableOperation
java.lang.NullPointerException
        at 
org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:286)
        at 
org.apache.accumulo.server.master.tableOps.CleanUpBulkImport.isReady(BulkImport.java:262)
        at 
org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50)
        at 
org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62)
        at 
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

This error propogated to the randomwalk test client causing it to die.

{noformat}
21 19:21:18,411 [randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node Concurrent.xml
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
        at 
org.apache.accumulo.server.test.randomwalk.Framework.run(Framework.java:61)
        at 
org.apache.accumulo.server.test.randomwalk.Framework.main(Framework.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.accumulo.start.Main$1.run(Main.java:89)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.Exception: Error running node ct.BulkImport
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:259)
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
        ... 8 more
Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error 
processing waitForTableOperation
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:293)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:261)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperationsImpl.java:938)
        at 
org.apache.accumulo.server.test.randomwalk.concurrent.BulkImport.visit(BulkImport.java:132)
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:251)
        ... 9 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
waitForTableOperation
        at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
        at 
org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684)
        at 
org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665)
        at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83)
        at $Proxy1.waitForTableOperation(Unknown Source)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:233)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:275)
        ... 13 more

{noformat}
                
> Bulk import failing when tablet server dies
> -------------------------------------------
>
>                 Key: ACCUMULO-422
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-422
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: 10 node cluster running 1.4.0-SNAPSHOT
>            Reporter: Keith Turner
>              Labels: 14_qa_bug
>             Fix For: 1.4.0
>
>
> Saw this issue while running random walk test w/ agitation.  The bulk import 
> code picks random tablet servers and ask them to bulk load files.  If a 
> tablet server dies it takes 30 seconds for the master to see the zookeeper 
> lock was lost.  During this 30 second period the bulk import code will still 
> try to use the tserver and fail. After it fails three times it will mark the 
> file as a failure.  This all happens within a second.
> The bulk import code should probably catch TTransportException and black list 
> the tablet server for that bulk import transaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to