[jira] [Created] (HADOOP-11329) should add HADOOP_HOME as part of kms's startup options
Dian Fu created HADOOP-11329: Summary: should add HADOOP_HOME as part of kms's startup options Key: HADOOP-11329 URL: https://issues.apache.org/jira/browse/HADOOP-11329 Project: Hadoop Common Issue Type: Bug Components: kms, security Reporter: Dian Fu Currently, HADOOP_HOME isn't part of the start up options of KMS. If I add the the following configuration to core-site.xml of kms, {code} hadoop.security.crypto.codec.classes.aes.ctr.nopadding org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec {code} kms server will throw the following exception when receive "generateEncryptedKey" request {code} 2014-11-24 10:23:18,189 DEBUG org.apache.hadoop.crypto.OpensslCipher: Failed to load OpenSSL Cipher. java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl(Native Method) at org.apache.hadoop.crypto.OpensslCipher.(OpensslCipher.java:85) at org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.(OpensslAesCtrCryptoCodec.java:50) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129) at org.apache.hadoop.crypto.CryptoCodec.getInstance(CryptoCodec.java:67) at org.apache.hadoop.crypto.CryptoCodec.getInstance(CryptoCodec.java:100) at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension$DefaultCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:256) at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371) at org.apache.hadoop.crypto.key.kms.server.EagerKeyGeneratorKeyProviderCryptoExtension$CryptoExtension$EncryptedQueueRefiller.fillQueueForKey(EagerKeyGeneratorKeyProviderCryptoExtension.java:77) at org.apache.hadoop.crypto.key.kms.ValueQueue$1.load(ValueQueue.java:181) at org.apache.hadoop.crypto.key.kms.ValueQueue$1.load(ValueQueue.java:175) at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228) at com.google.common.cache.LocalCache.get(LocalCache.java:3965) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829) at org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:256) at org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:226) at org.apache.hadoop.crypto.key.kms.server.EagerKeyGeneratorKeyProviderCryptoExtension$CryptoExtension.generateEncryptedKey(EagerKeyGeneratorKeyProviderCryptoExtension.java:126) at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371) at org.apache.hadoop.crypto.key.kms.server.KeyAuthorizationKeyProvider.generateEncryptedKey(KeyAuthorizationKeyProvider.java:192) at org.apache.hadoop.crypto.key.kms.server.KMS$9.run(KMS.java:379) at org.apache.hadoop.crypto.key.kms.server.KMS$9.run(KMS.java:375 {code} The reason is that it cannot find libhadoop.so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11328) ZKFailoverController.java does not log Exception and causes latent problems during failover
Tianyin Xu created HADOOP-11328: --- Summary: ZKFailoverController.java does not log Exception and causes latent problems during failover Key: HADOOP-11328 URL: https://issues.apache.org/jira/browse/HADOOP-11328 Project: Hadoop Common Issue Type: Bug Components: ha Affects Versions: 2.5.1 Reporter: Tianyin Xu In _ZKFailoverController.java_, the _Exception_ caught by the _run()_ method does not have a single error log. This causes latent problems that are only manifested during failover. h5. The problem we encountered An _Exception_ is thrown from the _doRun()_ method during _initHM()_ (caused by a configuration error). If you want to repeat, you can set "_ha.health-monitor.connect-retry-interval.ms_" to be any nonsensical value. {code:title=ZKFailoverController.java|borderStyle=solid} private int doRun(String[] args) ... initRPC(); initHM(); startRPC(); } {code} The Exception is caught in the _run()_ method, as follows, {code:title=ZKFailoverController.java|borderStyle=solid} public int run(final String[] args) throws Exception { ... try { ... @Override public Integer run() { try { return doRun(args); } catch (Exception t) { throw new RuntimeException(t); } finally { if (elector != null) { elector.terminateConnection(); } } } }); } catch (RuntimeException rte) { throw (Exception)rte.getCause(); } } {code} Unfortunately, the Exception (causing the shutdown of the process) is *not logged at all*. This causes latent errors which is only manifested during failover (because ZKFC is dead). The tricky thing here is that everything looks perfectly fine: the _jps_ command shows a running DFSZKFailoverController process and the two NameNode (active and standby) work fine. h5. Patch We strongly suggest to add a error log to notify the error caught, such as, --- hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java (revision 1641307) +++ hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java (working copy) {code:title=@@ -178,6 +178,7 @@|borderStyle=solid} } }); } catch (RuntimeException rte) { + LOG.fatal("The failover controller encounters runtime error: " + rte); throw (Exception)rte.getCause(); } } {code} Thanks! -- This message was sent by Atlassian JIRA (v6.3.4#6332)