Andrew Olson created OOZIE-3723:
-----------------------------------
Summary: Oozie service permanently caches workflow-supplied
FileSystem connectivity configuration properties for obtaining HDFS Credentials
until restarted
Key: OOZIE-3723
URL: https://issues.apache.org/jira/browse/OOZIE-3723
Project: Oozie
Issue Type: Bug
Components: workflow
Reporter: Andrew Olson
We recently have encountered two separate issues that both required an Oozie
service restart to resolve. In both situations it was apparent that incorrect
workflow-supplied configuration properties related to remote FileSystem
connectivity to support obtaining HDFS credentials for remote clusters (via
{{{}mapreduce.job.hdfs-servers{}}}) are being retained permanently within some
kind of cache in the Oozie service or underlying Hadoop code. These cached
values are superseding corrected values after the workflow configuration is
fixed, giving us no known way to fix the problem without restarting the Oozie
service. We confirmed that the {{hdfs-site.xml}} and {{oozie-site.xml}} files
where Oozie is running had not been updated since the prior restart, so not a
basic case of stale configuration.
We are running Oozie version 5.2.0 in this environment.
Complete stack traces are provided below.
Issue 1:
A workflow incorrectly set {{dfs.namenode.kerberos.principal.pattern}} to our
{{{}*/*@*{}}}, but our system default is {{{}*{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA009: Couldn't set up IO
streams: java.lang.IllegalArgumentException: Server has invalid Kerberos
principal: nn/[email protected], doesn't match the
pattern: '*/*@*'
at
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:363)
at
org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Couldn't set up IO streams:
java.lang.IllegalArgumentException: Server has invalid Kerberos principal:
nn/[email protected], doesn't match the pattern:
'*/*@*'
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:894)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
at
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
at
org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
at
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
at
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at
org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
at
org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
... 11 more
Caused by: java.lang.IllegalArgumentException: Server has invalid Kerberos
principal: nn/[email protected], doesn't match the
pattern: '*/*@*'
at
org.apache.hadoop.security.SaslRpcClient.getServerPrincipal(SaslRpcClient.java:319)
at
org.apache.hadoop.security.SaslRpcClient.createSaslClient(SaslRpcClient.java:240)
at
org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:166)
at
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:392)
at
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:623)
at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:414)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:843)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:839)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:839)
... 44 more
{noformat}
Issue 2:
A workflow incorrectly got FQDNs mixed up, setting
{{dfs.namenode.rpc-address.cluster.nn1}} = {{hostname.another.domain.com:8020}}
instead of {{{}hostname.some.domain.com:8020{}}}.
{noformat}
org.apache.oozie.action.ActionExecutorException: JA001: Invalid host name:
local host is: "hostname.some.domain.com/10.1.2.3"; destination host
is: "hostname.another.domain.com":8020;
java.net.UnknownHostException; For more details see:
http://wiki.apache.org/hadoop/UnknownHost
at
org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:463)
at
org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:443)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1134)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1644)
at
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:243)
at
org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:68)
at org.apache.oozie.command.XCommand.call(XCommand.java:290)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:210)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.net.UnknownHostException: Invalid host name: local host is:
"hostname.some.domain.com/10.1.2.3"; destination host is:
"hostname.another.domain.com":8020; java.net.UnknownHostException;
For more details see: http://wiki.apache.org/hadoop/UnknownHost
at sun.reflect.GeneratedConstructorAccessor185.newInstance(Unknown
Source)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:913)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:841)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:662)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:833)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:414)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1677)
at org.apache.hadoop.ipc.Client.call(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy35.getDelegationToken(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:1134)
at sun.reflect.GeneratedMethodAccessor59.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy36.getDelegationToken(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:734)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1996)
at
org.apache.hadoop.security.token.DelegationTokenIssuer.collectDelegationTokens(DelegationTokenIssuer.java:95)
at
org.apache.hadoop.security.token.DelegationTokenIssuer.addDelegationTokens(DelegationTokenIssuer.java:76)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:143)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:102)
at
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:81)
at
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:103)
at
org.apache.oozie.action.hadoop.HDFSCredentials$1.run(HDFSCredentials.java:100)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at
org.apache.oozie.action.hadoop.HDFSCredentials.obtainTokensForNamenodes(HDFSCredentials.java:99)
at
org.apache.oozie.action.hadoop.HDFSCredentials.updateCredentials(HDFSCredentials.java:65)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.setCredentialTokens(JavaActionExecutor.java:1546)
at
org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:1082)
... 9 more
Caused by: java.net.UnknownHostException
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:664)
... 43 more
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)