[jira] [Created] (HADOOP-11329) should add HADOOP_HOME as part of kms's startup options

2014-11-23 Thread Dian Fu (JIRA)
Dian Fu created HADOOP-11329:


 Summary: should add HADOOP_HOME as part of kms's startup options
 Key: HADOOP-11329
 URL: https://issues.apache.org/jira/browse/HADOOP-11329
 Project: Hadoop Common
  Issue Type: Bug
  Components: kms, security
Reporter: Dian Fu


Currently, HADOOP_HOME isn't part of the start up options of KMS. If I add the 
the following configuration to core-site.xml of kms,
{code} 

  hadoop.security.crypto.codec.classes.aes.ctr.nopadding
  org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec
 
{code} kms server will throw the following exception when receive 
"generateEncryptedKey" request
{code}
2014-11-24 10:23:18,189 DEBUG org.apache.hadoop.crypto.OpensslCipher: Failed to 
load OpenSSL Cipher.
java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsOpenssl(Native 
Method)
at 
org.apache.hadoop.crypto.OpensslCipher.(OpensslCipher.java:85)
at 
org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec.(OpensslAesCtrCryptoCodec.java:50)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
at org.apache.hadoop.crypto.CryptoCodec.getInstance(CryptoCodec.java:67)
at 
org.apache.hadoop.crypto.CryptoCodec.getInstance(CryptoCodec.java:100)
at 
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension$DefaultCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:256)
at 
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371)
at 
org.apache.hadoop.crypto.key.kms.server.EagerKeyGeneratorKeyProviderCryptoExtension$CryptoExtension$EncryptedQueueRefiller.fillQueueForKey(EagerKeyGeneratorKeyProviderCryptoExtension.java:77)
at 
org.apache.hadoop.crypto.key.kms.ValueQueue$1.load(ValueQueue.java:181)
at 
org.apache.hadoop.crypto.key.kms.ValueQueue$1.load(ValueQueue.java:175)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3969)
at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4829)
at 
org.apache.hadoop.crypto.key.kms.ValueQueue.getAtMost(ValueQueue.java:256)
at 
org.apache.hadoop.crypto.key.kms.ValueQueue.getNext(ValueQueue.java:226)
at 
org.apache.hadoop.crypto.key.kms.server.EagerKeyGeneratorKeyProviderCryptoExtension$CryptoExtension.generateEncryptedKey(EagerKeyGeneratorKeyProviderCryptoExtension.java:126)
at 
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.generateEncryptedKey(KeyProviderCryptoExtension.java:371)
at 
org.apache.hadoop.crypto.key.kms.server.KeyAuthorizationKeyProvider.generateEncryptedKey(KeyAuthorizationKeyProvider.java:192)
at org.apache.hadoop.crypto.key.kms.server.KMS$9.run(KMS.java:379)
at org.apache.hadoop.crypto.key.kms.server.KMS$9.run(KMS.java:375
{code}
The reason is that it cannot find libhadoop.so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11328) ZKFailoverController.java does not log Exception and causes latent problems during failover

2014-11-23 Thread Tianyin Xu (JIRA)
Tianyin Xu created HADOOP-11328:
---

 Summary: ZKFailoverController.java does not log Exception and 
causes latent problems during failover
 Key: HADOOP-11328
 URL: https://issues.apache.org/jira/browse/HADOOP-11328
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 2.5.1
Reporter: Tianyin Xu


In _ZKFailoverController.java_, the _Exception_ caught by the _run()_ method 
does not have a single error log. This causes latent problems that are only 
manifested during failover.

h5. The problem we encountered

An _Exception_ is thrown from the _doRun()_ method during _initHM()_ (caused by 
a configuration error). If you want to repeat, you can set 
"_ha.health-monitor.connect-retry-interval.ms_" to be any nonsensical value.
{code:title=ZKFailoverController.java|borderStyle=solid}
  private int doRun(String[] args)
...
initRPC();
initHM();
startRPC();

  }
{code}

The Exception is caught in the _run()_ method, as follows,
{code:title=ZKFailoverController.java|borderStyle=solid}
  public int run(final String[] args) throws Exception {
...
try {
  ...
@Override
public Integer run() {
  try {
return doRun(args);
  } catch (Exception t) {
throw new RuntimeException(t);
  } finally {
if (elector != null) {
  elector.terminateConnection();
}
  }
}
  });
} catch (RuntimeException rte) {
  throw (Exception)rte.getCause();
}
  }
{code}

Unfortunately, the Exception (causing the shutdown of the process) is *not 
logged at all*. This causes latent errors which is only manifested during 
failover (because ZKFC is dead). The tricky thing here is that everything looks 
perfectly fine: the _jps_ command shows a running DFSZKFailoverController 
process and the two NameNode (active and standby) work fine. 

h5. Patch

We strongly suggest to add a error log to notify the error caught, such as,

--- 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
(revision 1641307)
+++ 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/ZKFailoverController.java
(working copy)
{code:title=@@ -178,6 +178,7 @@|borderStyle=solid}
 }
   });
 } catch (RuntimeException rte) {
+  LOG.fatal("The failover controller encounters runtime error: " + rte);
   throw (Exception)rte.getCause();
 }
   }
{code}

Thanks!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)