Dong0829 created TEZ-4638:
-----------------------------
Summary: Client authenticate failure when using Kerberos if there
is big DAG plan needed HDFS
Key: TEZ-4638
URL: https://issues.apache.org/jira/browse/TEZ-4638
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.10.2
Reporter: Dong0829
Whenever the DAG plan is big and exceed the limit, the DAG plan will be
uploaded to HDFS. After TEZ AM gets this request, it will need to go to HDFS
to get the data, but in kerberos cluster, it will face below error:
{quote}{{10.239.88.12:0. Failed on local exception: java.io.IOException:
org.apache.hadoop.security.AccessControlException: Client cannot authenticate
via:[TOKEN, KERBEROS]
at
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
....
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:172)
at
org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:8519)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.processCall(ProtobufRpcEngine.java:484)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:595)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1226)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1145)
at
java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
at java.base/javax.security.auth.Subject.doAs(Subject.java:439)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3388)}}
{quote}
For the RCA, its because the submitDAG request is handled by the RPC Sever, and
the hadoop server will use remote RPC client user as the current UGI using doAs
(as above stack)
For the remote UGI, it has no context for the Tez AM which has the tokens
including KMS, HDFS and so on, so when it talking to the HDFS, it will fail.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)