Siddharth Seth created HIVE-15722:
-------------------------------------

             Summary: LLAP: Avoid marking a query as complete if the AMReporter 
runs into an error
                 Key: HIVE-15722
                 URL: https://issues.apache.org/jira/browse/HIVE-15722
             Project: Hive
          Issue Type: Bug
            Reporter: Siddharth Seth
            Assignee: Siddharth Seth


When the AMReporter runs into an error (typically intermittent), we end up 
killing all fragments on the daemon. This is done by marking the query as 
complete.
The AM would continue to try scheduling on this node - which would lead to task 
failures if the daemon structures are updated.

Instead of clearing the structures, it's better to kill the fragments, and let 
a queryComplete call come in from the AM.

Later, we could make enhancements in the AM to avoid such nodes. That's not 
simple though, since the AM will not find out what happened due to the 
communication failure from the daemon.

Leads to 
{code}
org.apache.hadoop.ipc.RemoteException(java.lang.RuntimeException): Dag query16 
already complete. Rejecting fragment [Map 7, 29, 0]
        at 
org.apache.hadoop.hive.llap.daemon.impl.QueryTracker.registerFragment(QueryTracker.java:149)
        at 
org.apache.hadoop.hive.llap.daemon.impl.ContainerRunnerImpl.submitWork(ContainerRunnerImpl.java:226)
        at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.submitWork(LlapDaemon.java:487)
        at 
org.apache.hadoop.hive.llap.daemon.impl.LlapProtocolServerImpl.submitWork(LlapProtocolServerImpl.java:101)
        at 
org.apache.hadoop.hive.llap.daemon.rpc.LlapDaemonProtocolProtos$LlapDaemonProtocol$2.callBlockingMethod(LlapDaemonProtocolProtos.java:16728)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to