[jira] [Updated] (HDFS-7340) rollingUpgrade prepare command does not have retry cache support
[ https://issues.apache.org/jira/browse/HDFS-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-7340: -- Assignee: Jing Zhao > rollingUpgrade prepare command does not have retry cache support > > > Key: HDFS-7340 > URL: https://issues.apache.org/jira/browse/HDFS-7340 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.6.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > > I was running this on a HA cluster with > dfs.client.test.drop.namenode.response.number set to 1. So the first request > goes through but the response is dropped. Which then causes another request > which fails and says a request is already in progress. We should add retry > cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7340) rollingUpgrade prepare command does not have retry cache support
Arpit Gupta created HDFS-7340: - Summary: rollingUpgrade prepare command does not have retry cache support Key: HDFS-7340 URL: https://issues.apache.org/jira/browse/HDFS-7340 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Arpit Gupta I was running this on a HA cluster with dfs.client.test.drop.namenode.response.number set to 1. So the first request goes through but the response is dropped. Which then causes another request which fails and says a request is already in progress. We should add retry cache support for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7305) NPE seen in wbhdfs FS while running SLive
Arpit Gupta created HDFS-7305: - Summary: NPE seen in wbhdfs FS while running SLive Key: HDFS-7305 URL: https://issues.apache.org/jira/browse/HDFS-7305 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Arpit Gupta Priority: Critical {code} 2014-10-23 10:22:49,066 WARN [main] org.apache.hadoop.mapred.Task: Task status: "Failed at running due to java.lang.NullPointerException at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:529) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:605) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.append(WebHdfsFileSystem.java:1154) at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1163) at org.apache.hadoop.fs.slive.AppendOp.run(AppendOp.java:80) at org.apache.hadoop.fs.slive.ObserveableOp.run(ObserveableOp.java:63) at org.apache.hadoop.fs.slive.SliveMapper.runOperation(SliveMapper.java:122) at org.apache.hadoop.fs.slive.SliveMapper.map(SliveMapper.java:168) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) " truncated to max limit (512 characters) Activity {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-6715) webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode
Arpit Gupta created HDFS-6715: - Summary: webhdfs wont fail over when it gets java.io.IOException: Namenode is in startup mode Key: HDFS-6715 URL: https://issues.apache.org/jira/browse/HDFS-6715 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.2.0 Reporter: Arpit Gupta Noticed in our HA testing when we run MR job with webhdfs file system we some times run into {code} 2014-04-17 05:08:06,346 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1397710493213_0001_r_08_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 2014-04-17 05:08:10,205 ERROR [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not commit job java.io.IOException: Namenode is in startup mode at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6354) NN startup does not fail when it fails to login with the spnego principal
[ https://issues.apache.org/jira/browse/HDFS-6354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006562#comment-14006562 ] Arpit Gupta commented on HDFS-6354: --- [~daryn] I was manually just testing out i cannot recall what JDK version was being used at that time. This was tested on 2.4 > NN startup does not fail when it fails to login with the spnego principal > - > > Key: HDFS-6354 > URL: https://issues.apache.org/jira/browse/HDFS-6354 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Arpit Gupta > > I have noticed where the NN startup did not report any issues the login fails > because either the keytab is wrong or the principal does not exist etc. This > can be mis leading and lead to authentication failures when a client tries to > authenticate to the spnego principal. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6312) WebHdfs HA failover is broken on secure clusters
[ https://issues.apache.org/jira/browse/HDFS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985788#comment-13985788 ] Arpit Gupta commented on HDFS-6312: --- Ah i see. I was just curious to see if could add more tests to reach this issue :). > WebHdfs HA failover is broken on secure clusters > > > Key: HDFS-6312 > URL: https://issues.apache.org/jira/browse/HDFS-6312 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.0.0, 2.4.0 >Reporter: Daryn Sharp >Priority: Blocker > > When webhdfs does a failover, it blanks out the delegation token. This will > cause subsequent operations against the other NN to acquire a new token. > Tasks cannot acquire a token (no kerberos credentials) so jobs will fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6312) WebHdfs HA failover is broken on secure clusters
[ https://issues.apache.org/jira/browse/HDFS-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985657#comment-13985657 ] Arpit Gupta commented on HDFS-6312: --- [~daryn] in our testing with webhdfs + HA on a secure cluster we hit HADOOP-10519. I am curious what kind of job did you run that actually started running tasks. > WebHdfs HA failover is broken on secure clusters > > > Key: HDFS-6312 > URL: https://issues.apache.org/jira/browse/HDFS-6312 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 3.0.0, 2.4.0 >Reporter: Daryn Sharp >Priority: Blocker > > When webhdfs does a failover, it blanks out the delegation token. This will > cause subsequent operations against the other NN to acquire a new token. > Tasks cannot acquire a token (no kerberos credentials) so jobs will fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6245) datanode fails to start with a bad disk even when failed volumes is set
[ https://issues.apache.org/jira/browse/HDFS-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968983#comment-13968983 ] Arpit Gupta commented on HDFS-6245: --- Here is the stack trace {code} 2014-04-14 22:17:23,688 INFO datanode.DataNode (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT] 2014-04-14 22:17:23,750 WARN common.Util (Util.java:stringAsURI(56)) - Path /grid/0/hdp/hdfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2014-04-14 22:17:23,751 WARN common.Util (Util.java:stringAsURI(56)) - Path /grid/1/hdp/hdfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2014-04-14 22:17:23,751 WARN common.Util (Util.java:stringAsURI(56)) - Path /grid/2/hdp/hdfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2014-04-14 22:17:23,751 WARN common.Util (Util.java:stringAsURI(56)) - Path /grid/3/hdp/hdfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2014-04-14 22:17:23,751 WARN common.Util (Util.java:stringAsURI(56)) - Path /grid/4/hdp/hdfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2014-04-14 22:17:23,752 WARN common.Util (Util.java:stringAsURI(56)) - Path /grid/5/hdp/hdfs/data should be specified as a URI in configuration files. Please update hdfs configuration. 2014-04-14 22:17:23,769 FATAL datanode.DataNode (DataNode.java:secureMain(1995)) - Exception in secureMain java.lang.IllegalArgumentException: Failed to parse conf property dfs.datanode.data.dir: /grid/5/hdp/hdfs/data at org.apache.hadoop.hdfs.server.datanode.DataNode.getStorageLocations(DataNode.java:1786) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1768) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1812) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1988) at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243) Caused by: java.io.IOException: Input/output error at java.io.UnixFileSystem.canonicalize0(Native Method) at java.io.UnixFileSystem.canonicalize(UnixFileSystem.java:157) at java.io.File.getCanonicalPath(File.java:559) at java.io.File.getCanonicalFile(File.java:583) at org.apache.hadoop.hdfs.server.common.Util.fileAsURI(Util.java:73) at org.apache.hadoop.hdfs.server.common.Util.stringAsURI(Util.java:58) at org.apache.hadoop.hdfs.server.datanode.StorageLocation.parse(StorageLocation.java:94) at org.apache.hadoop.hdfs.server.datanode.DataNode.getStorageLocations(DataNode.java:1784) ... 9 more 2014-04-14 22:17:23,772 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1 2014-04-14 22:17:23,774 INFO datanode.DataNode (StringUtils.java:run(640)) - SHUTDOWN_MSG: {code} > datanode fails to start with a bad disk even when failed volumes is set > --- > > Key: HDFS-6245 > URL: https://issues.apache.org/jira/browse/HDFS-6245 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Arpit Agarwal > > Data node startup failed even when failed volumes was set. Had to remove the > bad disk from the config to get it to boot. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6245) datanode fails to start with a bad disk even when failed volumes is set
Arpit Gupta created HDFS-6245: - Summary: datanode fails to start with a bad disk even when failed volumes is set Key: HDFS-6245 URL: https://issues.apache.org/jira/browse/HDFS-6245 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Arpit Agarwal Data node startup failed even when failed volumes was set. Had to remove the bad disk from the config to get it to boot. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6207) ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken()
[ https://issues.apache.org/jira/browse/HDFS-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-6207: -- Assignee: Jing Zhao > ConcurrentModificationException in > AbstractDelegationTokenSelector.selectToken() > > > Key: HDFS-6207 > URL: https://issues.apache.org/jira/browse/HDFS-6207 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > > While running a hive job on a HA cluster saw ConcurrentModificationException > in AbstractDelegationTokenSelector.selectToken() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6207) ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken()
[ https://issues.apache.org/jira/browse/HDFS-6207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963550#comment-13963550 ] Arpit Gupta commented on HDFS-6207: --- {code} Caused by: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:894) at java.util.HashMap$ValueIterator.next(HashMap.java:922) at java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1067) at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSelector.selectToken(AbstractDelegationTokenSelector.java:53) at org.apache.hadoop.hdfs.HAUtil.cloneDelegationTokenForLogicalUri(HAUtil.java:260) at org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider. {code} > ConcurrentModificationException in > AbstractDelegationTokenSelector.selectToken() > > > Key: HDFS-6207 > URL: https://issues.apache.org/jira/browse/HDFS-6207 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.4.0 >Reporter: Arpit Gupta > > While running a hive job on a HA cluster saw ConcurrentModificationException > in AbstractDelegationTokenSelector.selectToken() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6207) ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken()
Arpit Gupta created HDFS-6207: - Summary: ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken() Key: HDFS-6207 URL: https://issues.apache.org/jira/browse/HDFS-6207 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta While running a hive job on a HA cluster saw ConcurrentModificationException in AbstractDelegationTokenSelector.selectToken() -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6127) sLive with webhdfs fails on secure HA cluster with does not contain valid host port authority error
[ https://issues.apache.org/jira/browse/HDFS-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940837#comment-13940837 ] Arpit Gupta commented on HDFS-6127: --- Here is the stack trace {code} /usr/lib/hadoop/bin/hadoop org.apache.hadoop.fs.slive.SliveTest -rename 14,uniform -packetSize 65536 -baseDir webhdfs://ha-2-secure/user/user/ha-slive -seed 12345678 -sleep 100,1000 -duration 600 -append 14,uniform -blockSize 16777216,33554432 -create 16,uniform -mkdir 14,uniform -maps 12 -ls 14,uniform -writeSize 1,134217728 -files 1024 -ops 1 -read 14,uniform -replication 1,3 -appendSize 1,134217728 -reduces 6 -resFile /grid/0/tmp/hwqe/artifacts/ha-slive-2-namenode2-1395127484.out -readSize 1,4294967295 -dirSize 16 -delete 14,uniform INFO|Initial wait for Service namenode2: 60 14/03/18 07:24:44 INFO slive.SliveTest: Running with option list -rename 14,uniform -packetSize 65536 -baseDir webhdfs://ha-2-secure/user/user/ha-slive -seed 12345678 -sleep 100,1000 -duration 600 -append 14,uniform -blockSize 16777216,33554432 -create 16,uniform -mkdir 14,uniform -maps 12 -ls 14,uniform -writeSize 1,134217728 -files 1024 -ops 1 -read 14,uniform -replication 1,3 -appendSize 1,134217728 -reduces 6 -resFile /grid/0/tmp/hwqe/artifacts/ha-slive-2-namenode2-1395127484.out -readSize 1,4294967295 -dirSize 16 -delete 14,uniform 14/03/18 07:24:44 INFO slive.SliveTest: Options are: 14/03/18 07:24:44 INFO slive.ConfigExtractor: Base directory = webhdfs://ha-2-secure/user/user/ha-slive/slive 14/03/18 07:24:44 INFO slive.ConfigExtractor: Data directory = webhdfs://ha-2-secure/user/user/ha-slive/slive/data 14/03/18 07:24:44 INFO slive.ConfigExtractor: Output directory = webhdfs://ha-2-secure/user/user/ha-slive/slive/output 14/03/18 07:24:44 INFO slive.ConfigExtractor: Result file = /grid/0/tmp/hwqe/artifacts/ha-slive-2-namenode2-1395127484.out 14/03/18 07:24:44 INFO slive.ConfigExtractor: Grid queue = default 14/03/18 07:24:44 INFO slive.ConfigExtractor: Should exit on first error = false 14/03/18 07:24:44 INFO slive.ConfigExtractor: Duration = 60 milliseconds 14/03/18 07:24:44 INFO slive.ConfigExtractor: Map amount = 12 14/03/18 07:24:44 INFO slive.ConfigExtractor: Reducer amount = 6 14/03/18 07:24:44 INFO slive.ConfigExtractor: Operation amount = 1 14/03/18 07:24:44 INFO slive.ConfigExtractor: Total file limit = 1024 14/03/18 07:24:44 INFO slive.ConfigExtractor: Total dir file limit = 16 14/03/18 07:24:44 INFO slive.ConfigExtractor: Read size = 1,4294967295 bytes 14/03/18 07:24:44 INFO slive.ConfigExtractor: Write size = 1,134217728 bytes 14/03/18 07:24:44 INFO slive.ConfigExtractor: Append size = 1,134217728 bytes 14/03/18 07:24:44 INFO slive.ConfigExtractor: Block size = 16777216,33554432 bytes 14/03/18 07:24:44 INFO slive.ConfigExtractor: Random seed = 12345678 14/03/18 07:24:44 INFO slive.ConfigExtractor: Sleep range = 100,1000 milliseconds 14/03/18 07:24:44 INFO slive.ConfigExtractor: Replication amount = 1,3 14/03/18 07:24:44 INFO slive.ConfigExtractor: Operations are: 14/03/18 07:24:44 INFO slive.ConfigExtractor: READ 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 14% 14/03/18 07:24:44 INFO slive.ConfigExtractor: APPEND 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 14% 14/03/18 07:24:44 INFO slive.ConfigExtractor: MKDIR 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 14% 14/03/18 07:24:44 INFO slive.ConfigExtractor: LS 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 14% 14/03/18 07:24:44 INFO slive.ConfigExtractor: DELETE 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 14% 14/03/18 07:24:44 INFO slive.ConfigExtractor: RENAME 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 14% 14/03/18 07:24:44 INFO slive.ConfigExtractor: CREATE 14/03/18 07:24:44 INFO slive.ConfigExtractor: UNIFORM 14/03/18 07:24:44 INFO slive.ConfigExtractor: 16% 14/03/18 07:24:44 INFO slive.SliveTest: Running job: 14/03/18 07:24:44 WARN hdfs.DFSClient: dfs.client.test.drop.namenode.response.number is set to 1, this hacked client will proactively drop responses 14/03/18 07:24:45 WARN hdfs.DFSClient: dfs.client.test.drop.namenode.response.number is set to 1, this hacked client will proactively drop responses 14/03/18 07:24:45 WARN hdfs.DFSClient: dfs.client.test.drop.namenode.response.number is set to 1, this hacked client will proactively drop responses 14/03/18 07:24:48 WARN token.Token: Cannot find class for token kind WEBHDFS delegation 14/03/18 07:24:48 INFO security.TokenCache: Got dt for webhdfs://ha-2-secure; Kind: WEBHDFS delegation, Service: ha-hdfs:ha-2-secure, Ident: 00 06 68 72 74 5f 71 61
[jira] [Created] (HDFS-6127) sLive with webhdfs fails on secure HA cluster with does not contain valid host port authority error
Arpit Gupta created HDFS-6127: - Summary: sLive with webhdfs fails on secure HA cluster with does not contain valid host port authority error Key: HDFS-6127 URL: https://issues.apache.org/jira/browse/HDFS-6127 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Haohui Mai -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6100) webhdfs filesystem does not failover in HA mode
Arpit Gupta created HDFS-6100: - Summary: webhdfs filesystem does not failover in HA mode Key: HDFS-6100 URL: https://issues.apache.org/jira/browse/HDFS-6100 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Haohui Mai While running slive with a webhdfs file system reducers fail as they keep trying to write to standby namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-6089: -- Description: The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. was: The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out connection. > Standby NN while transitioning to active throws a connection refused error > when the prior active NN process is suspended > > > Key: HDFS-6089 > URL: https://issues.apache.org/jira/browse/HDFS-6089 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > > The following scenario was tested: > * Determine Active NN and suspend the process (kill -19) > * Wait about 60s to let the standby transition to active > * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to > active. > What was noticed that some times the call to get the service state of nn2 got > a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930954#comment-13930954 ] Arpit Gupta commented on HDFS-6089: --- Here is the console log {code} sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn1" hdfs active exit code = 0 sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn2" hdfs standby exit code = 0 ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null hostname "sudo su - -c \"cat /grid/0/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid | xargs kill -19\" hdfs" sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn1" hdfs Operation failed: Call From host1/ip to host1:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 2 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=host1/ip:35192 remote=host1/ip:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout exit code = 255 sudo su - -c "/usr/bin/hdfs haadmin -getServiceState nn2" hdfs Operation failed: Call From host2/ip to host2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 2 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=host2/ip:37640 remote=host2/68.142.247.217:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout exit code = 255 {code} > Standby NN while transitioning to active throws a connection refused error > when the prior active NN process is suspended > > > Key: HDFS-6089 > URL: https://issues.apache.org/jira/browse/HDFS-6089 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.4.0 >Reporter: Arpit Gupta >Assignee: Jing Zhao > > The following scenario was tested: > * Determine Active NN and suspend the process (kill -19) > * Wait about 60s to let the standby transition to active > * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to > active. > What was noticed that some times the call to get the service state of nn2 got > a socket time out connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
Arpit Gupta created HDFS-6089: - Summary: Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6077) running slive with webhdfs on secure HA cluster fails with un kown host exception
[ https://issues.apache.org/jira/browse/HDFS-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924433#comment-13924433 ] Arpit Gupta commented on HDFS-6077: --- {code} RUNNING: /usr/lib/hadoop/bin/hadoop org.apache.hadoop.fs.slive.SliveTest -rename 14,uniform -packetSize 65536 -baseDir webhdfs://ha-2-secure:50070/user/hrt_qa/ha-slive -seed 12345678 -sleep 100,1000 -duration 600 -append 14,uniform -blockSize 16777216,33554432 -create 16,uniform -mkdir 14,uniform -maps 15 -ls 14,uniform -writeSize 1,134217728 -files 1024 -ops 1 -read 14,uniform -replication 1,3 -appendSize 1,134217728 -reduces 8 -resFile /grid/0/tmp/hwqe/artifacts/ha-slive-6-namenode2-1394091404.out -readSize 1,4294967295 -dirSize 16 -delete 14,uniform INFO|Initial wait for Service namenode2: 60 14/03/06 07:36:44 INFO slive.SliveTest: Running with option list -rename 14,uniform -packetSize 65536 -baseDir webhdfs://ha-2-secure:50070/user/hrt_qa/ha-slive -seed 12345678 -sleep 100,1000 -duration 600 -append 14,uniform -blockSize 16777216,33554432 -create 16,uniform -mkdir 14,uniform -maps 15 -ls 14,uniform -writeSize 1,134217728 -files 1024 -ops 1 -read 14,uniform -replication 1,3 -appendSize 1,134217728 -reduces 8 -resFile /grid/0/tmp/hwqe/artifacts/ha-slive-6-namenode2-1394091404.out -readSize 1,4294967295 -dirSize 16 -delete 14,uniform 14/03/06 07:36:44 INFO slive.SliveTest: Options are: 14/03/06 07:36:44 INFO slive.ConfigExtractor: Base directory = webhdfs://ha-2-secure:50070/user/hrt_qa/ha-slive/slive 14/03/06 07:36:44 INFO slive.ConfigExtractor: Data directory = webhdfs://ha-2-secure:50070/user/hrt_qa/ha-slive/slive/data 14/03/06 07:36:44 INFO slive.ConfigExtractor: Output directory = webhdfs://ha-2-secure:50070/user/hrt_qa/ha-slive/slive/output 14/03/06 07:36:44 INFO slive.ConfigExtractor: Result file = /grid/0/tmp/hwqe/artifacts/ha-slive-6-namenode2-1394091404.out 14/03/06 07:36:44 INFO slive.ConfigExtractor: Grid queue = default 14/03/06 07:36:44 INFO slive.ConfigExtractor: Should exit on first error = false 14/03/06 07:36:44 INFO slive.ConfigExtractor: Duration = 60 milliseconds 14/03/06 07:36:44 INFO slive.ConfigExtractor: Map amount = 15 14/03/06 07:36:44 INFO slive.ConfigExtractor: Reducer amount = 8 14/03/06 07:36:44 INFO slive.ConfigExtractor: Operation amount = 1 14/03/06 07:36:44 INFO slive.ConfigExtractor: Total file limit = 1024 14/03/06 07:36:44 INFO slive.ConfigExtractor: Total dir file limit = 16 14/03/06 07:36:44 INFO slive.ConfigExtractor: Read size = 1,4294967295 bytes 14/03/06 07:36:44 INFO slive.ConfigExtractor: Write size = 1,134217728 bytes 14/03/06 07:36:44 INFO slive.ConfigExtractor: Append size = 1,134217728 bytes 14/03/06 07:36:44 INFO slive.ConfigExtractor: Block size = 16777216,33554432 bytes 14/03/06 07:36:44 INFO slive.ConfigExtractor: Random seed = 12345678 14/03/06 07:36:44 INFO slive.ConfigExtractor: Sleep range = 100,1000 milliseconds 14/03/06 07:36:44 INFO slive.ConfigExtractor: Replication amount = 1,3 14/03/06 07:36:44 INFO slive.ConfigExtractor: Operations are: 14/03/06 07:36:44 INFO slive.ConfigExtractor: LS 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 14% 14/03/06 07:36:44 INFO slive.ConfigExtractor: READ 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 14% 14/03/06 07:36:44 INFO slive.ConfigExtractor: APPEND 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 14% 14/03/06 07:36:44 INFO slive.ConfigExtractor: CREATE 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 16% 14/03/06 07:36:44 INFO slive.ConfigExtractor: RENAME 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 14% 14/03/06 07:36:44 INFO slive.ConfigExtractor: DELETE 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 14% 14/03/06 07:36:44 INFO slive.ConfigExtractor: MKDIR 14/03/06 07:36:44 INFO slive.ConfigExtractor: UNIFORM 14/03/06 07:36:44 INFO slive.ConfigExtractor: 14% 14/03/06 07:36:44 INFO slive.SliveTest: Running job: 14/03/06 07:36:44 WARN hdfs.DFSClient: dfs.client.test.drop.namenode.response.number is set to 1, this hacked client will proactively drop responses 14/03/06 07:36:45 WARN hdfs.DFSClient: dfs.client.test.drop.namenode.response.number is set to 1, this hacked client will proactively drop responses 14/03/06 07:36:45 WARN hdfs.DFSClient: dfs.client.test.drop.namenode.response.number is set to 1, this hacked client will proactively drop responses 14/03/06 07:36:45 ERROR slive.SliveTest: Unable to run job due to error: java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-2-secure at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.ja
[jira] [Created] (HDFS-6077) running slive with webhdfs on secure HA cluster fails with un kown host exception
Arpit Gupta created HDFS-6077: - Summary: running slive with webhdfs on secure HA cluster fails with un kown host exception Key: HDFS-6077 URL: https://issues.apache.org/jira/browse/HDFS-6077 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Arpit Gupta Assignee: Jing Zhao -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13891074#comment-13891074 ] Arpit Gupta commented on HDFS-5399: --- [~atm] bq. Am I correct in assuming that the test you were running did not manually cause the NN to enter or leave safemode? Yes that is correct. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887320#comment-13887320 ] Arpit Gupta commented on HDFS-5399: --- bq. Can you comment on how frequently/quickly the active NN is killed and restarted in this test? The tests were killing the active namenode every 5 Mins. bq. I'm guessing you meant "NOT a flaw in the test" here? Or do I misunderstand your point? Yes you are correct i meant not :). bq. I'm specifically curious about whether or not the standby NN was given enough time to get out of startup safemode before a failover to it was attempted. I wanted to make sure i understand this scenario. To me this would happen if the current standby namenode (nn2) was active before and recently (a few seconds ago) was killed and started causing it be in safemode and then the active (nn1) at the same time was killed causing the client to go to nn2 and its still in safemode. Did i understand it right? I dont believe we hit this scenario as we restarted the active NN every 5 mins. However i can see the need of client retires to make sure even during the above scenario dfsclient is able to retry and wait for the nn to come out of safemode. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5532) Enable the webhdfs by default to support new HDFS web UI
[ https://issues.apache.org/jira/browse/HDFS-5532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886842#comment-13886842 ] Arpit Gupta commented on HDFS-5532: --- A property already exists for this dfs.web.authentication.kerberos.principal And there is another property that defines where your keytab is. I am not sure we should set defaults for these as we dont do them for any other principal and keytab properties. We should probably update any documentation we have regarding secure setup. > Enable the webhdfs by default to support new HDFS web UI > > > Key: HDFS-5532 > URL: https://issues.apache.org/jira/browse/HDFS-5532 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 3.0.0 >Reporter: Vinay >Assignee: Vinay > Fix For: 2.3.0 > > Attachments: HDFS-5532.patch, HDFS-5532.patch > > > Recently in HDFS-5444, new HDFS web UI is made as default. > but this needs webhdfs to be enabled. > WebHDFS is disabled by default. Lets enable it by default to support new > really cool web UI. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13886208#comment-13886208 ] Arpit Gupta commented on HDFS-5399: --- bq. The test you observed this issue in didn't run long enough for the standby NN to leave startup safemode on its own before the failover was attempted. The NN will delay processing block reports for block IDs it doesn't recognize (because they're created in edits that the NN hasn't read yet) and then only on transition to active do we fully catch up by reading all the edits, and then re-process the delayed block reports, triggering the NN to leave startup safemode. Its not the test that directly fails. We see exceptions in the RM when its trying to talk to HDFS or in RS when its trying to talk to HDFS which causes the actual MR job etc to fail. So its not something that the test can control. For example we are running an MR job and are periodically killing the active NN and the job eventually fails as the tasks that want to talk to hdfs fail or the RM runs into this exception causing the application to fail. Hence i would argue that its a flaw in the test :). > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5399) Revisit SafeModeException and corresponding retry policies
[ https://issues.apache.org/jira/browse/HDFS-5399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885916#comment-13885916 ] Arpit Gupta commented on HDFS-5399: --- We had run into this issue while testing HA. You can see in HDFS-5291 that the standby NN after transitioning to active went into safemode. We saw issues where Resource Manager and Region Servers would crash/complain because of this. We ran into this frequently before HDFS-5291 was fixed. > Revisit SafeModeException and corresponding retry policies > -- > > Key: HDFS-5399 > URL: https://issues.apache.org/jira/browse/HDFS-5399 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao > > Currently for NN SafeMode, we have the following corresponding retry policies: > # In non-HA setup, for certain API call ("create"), the client will retry if > the NN is in SafeMode. Specifically, the client side's RPC adopts > MultipleLinearRandomRetry policy for a wrapped SafeModeException when retry > is enabled. > # In HA setup, the client will retry if the NN is Active and in SafeMode. > Specifically, the SafeModeException is wrapped as a RetriableException in the > server side. Client side's RPC uses FailoverOnNetworkExceptionRetry policy > which recognizes RetriableException (see HDFS-5291). > There are several possible issues in the current implementation: > # The NN SafeMode can be a "Manual" SafeMode (i.e., started by administrator > through CLI), and the clients may not want to retry on this type of SafeMode. > # Client may want to retry on other API calls in non-HA setup. > # We should have a single generic strategy to address the mapping between > SafeMode and retry policy for both HA and non-HA setup. A possible > straightforward solution is to always wrap the SafeModeException in the > RetriableException to indicate that the clients should retry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup
[ https://issues.apache.org/jira/browse/HDFS-5653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-5653: -- Priority: Minor (was: Major) > Log namenode hostname in various exceptions being thrown in a HA setup > -- > > Key: HDFS-5653 > URL: https://issues.apache.org/jira/browse/HDFS-5653 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ha >Affects Versions: 2.2.0 >Reporter: Arpit Gupta >Priority: Minor > > In a HA setup any time we see an exception such as safemode or namenode in > standby etc we dont know which namenode it came from. The user has to go to > the logs of the namenode and determine which one was active and/or standby > around the same time. > I think it would help with debugging if any such exceptions could include the > namenode hostname so the user could know exactly which namenode served the > request. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HDFS-5653) Log namenode hostname in various exceptions being thrown in a HA setup
Arpit Gupta created HDFS-5653: - Summary: Log namenode hostname in various exceptions being thrown in a HA setup Key: HDFS-5653 URL: https://issues.apache.org/jira/browse/HDFS-5653 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 2.2.0 Reporter: Arpit Gupta In a HA setup any time we see an exception such as safemode or namenode in standby etc we dont know which namenode it came from. The user has to go to the logs of the namenode and determine which one was active and/or standby around the same time. I think it would help with debugging if any such exceptions could include the namenode hostname so the user could know exactly which namenode served the request. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5382) Implement the UI of browsing filesystems in HTML 5 page
[ https://issues.apache.org/jira/browse/HDFS-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798221#comment-13798221 ] Arpit Gupta commented on HDFS-5382: --- Will this also handle when webhdfs is not configured and when security is on? > Implement the UI of browsing filesystems in HTML 5 page > --- > > Key: HDFS-5382 > URL: https://issues.apache.org/jira/browse/HDFS-5382 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-5382.000.patch > > > The UI of browsing filesystems can be implemented as an HTML 5 application. > The UI can pull the data from WebHDFS. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5322) HDFS delegation token not found in cache errors seen on secure HA clusters
[ https://issues.apache.org/jira/browse/HDFS-5322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793061#comment-13793061 ] Arpit Gupta commented on HDFS-5322: --- Bunch of secure HA tests were run last night with this change and we did not see test failures because of this. +1 > HDFS delegation token not found in cache errors seen on secure HA clusters > -- > > Key: HDFS-5322 > URL: https://issues.apache.org/jira/browse/HDFS-5322 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.1-beta >Reporter: Arpit Gupta >Assignee: Jing Zhao > Attachments: HDFS-5322.000.patch, HDFS-5322.000.patch, > HDFS-5322.001.patch, HDFS-5322.002.patch, HDFS-5322.003.patch, > HDFS-5322.004.patch, HDFS-5322.005.patch, HDFS-5322.006.patch > > > While running HA tests we have seen issues were we see HDFS delegation token > not found in cache errors causing jobs running to fail. > {code} > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) > |2013-10-06 20:14:51,193 INFO [main] mapreduce.Job: Task Id : > attempt_1381090351344_0001_m_07_0, Status : FAILED > Error: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 11 for hrt_qa) can't be found in cache > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at org.apache.hadoop.ipc.Client.call(Client.java:1300) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5335) Hive query failed with possible race in dfs output stream
Arpit Gupta created HDFS-5335: - Summary: Hive query failed with possible race in dfs output stream Key: HDFS-5335 URL: https://issues.apache.org/jira/browse/HDFS-5335 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.1-beta Reporter: Arpit Gupta Assignee: Haohui Mai Here is the stack trace from the client {code} java.nio.channels.ClosedChannelException at org.apache.hadoop.hdfs.DFSOutputStream.checkClosed(DFSOutputStream.java:1317) at org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1810) at org.apache.hadoop.hdfs.DFSOutputStream.flushInternal(DFSOutputStream.java:1789) at org.apache.hadoop.hdfs.DFSOutputStream.close(DFSOutputStream.java:1877) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:54) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.hadoop.mapreduce.JobSubmitter.copyRemoteFiles(JobSubmitter.java:139) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:212) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:300) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:387) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348) at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446) at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Job Submission failed with exception 'java.nio.channels.ClosedChannelException(null)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HDFS-5322) HDFS delegation token not found in cache errors seen on secure HA clusters
Arpit Gupta created HDFS-5322: - Summary: HDFS delegation token not found in cache errors seen on secure HA clusters Key: HDFS-5322 URL: https://issues.apache.org/jira/browse/HDFS-5322 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.1-beta Reporter: Arpit Gupta Assignee: Jing Zhao While running HA tests we have seen issues were we see HDFS delegation token not found in cache errors causing jobs running to fail. {code} at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) |2013-10-06 20:14:51,193 INFO [main] mapreduce.Job: Task Id : attempt_1381090351344_0001_m_07_0, Status : FAILED Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 11 for hrt_qa) can't be found in cache at org.apache.hadoop.ipc.Client.call(Client.java:1347) at org.apache.hadoop.ipc.Client.call(Client.java:1300) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source) {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HDFS-5291) Standby namenode after transition to active goes into safemode
[ https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13784647#comment-13784647 ] Arpit Gupta commented on HDFS-5291: --- This is seen in our nightlies where we see other services being impacted by namenode being in safemode. In our tests we are killing the active namenode every 5 mins and some times we see that after the transition from standby to active the namenode goes into safemode. > Standby namenode after transition to active goes into safemode > -- > > Key: HDFS-5291 > URL: https://issues.apache.org/jira/browse/HDFS-5291 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.1-beta >Reporter: Arpit Gupta >Assignee: Jing Zhao >Priority: Critical > Attachments: nn.log > > > Some log snippets > standby state to active transition > {code} > 2013-10-02 00:13:49,482 INFO ipc.Server (Server.java:run(2068)) - IPC Server > handler 69 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 > Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation > category WRITE is not supported in state standby > 2013-10-02 00:13:49,689 INFO ipc.Server (Server.java:saslProcess(1342)) - > Auth successful for nn/hostn...@example.com (auth:SIMPLE) > 2013-10-02 00:13:49,696 INFO authorize.ServiceAuthorizationManager > (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful > for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface > org.apache.hadoop.ha.HAServiceProtocol > 2013-10-02 00:13:49,700 INFO namenode.FSNamesystem > (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for > standby state > 2013-10-02 00:13:49,701 WARN ha.EditLogTailer > (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454) > at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail > 2013-10-02 00:13:49,704 INFO namenode.FSNamesystem > (FSNamesystem.java:startActiveServices(885)) - Starting services required for > active state > 2013-10-02 00:13:49,719 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting > recovery process for unclosed journal segments... > 2013-10-02 00:13:49,755 INFO ipc.Server (Server.java:saslProcess(1342)) - > Auth successful for hbase/hostn...@example.com (auth:SIMPLE) > 2013-10-02 00:13:49,761 INFO authorize.ServiceAuthorizationManager > (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful > for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface > org.apache.hadoop.hdfs.protocol.ClientProtocol > 2013-10-02 00:13:49,839 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully > started new epoch 85 > 2013-10-02 00:13:49,839 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery > of unclosed segment starting at txid 887112 > 2013-10-02 00:13:49,874 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare > phase complete. Responses: > IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true > } lastWriterEpoch: 84 lastCommittedTxId: 887530 > 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 > isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530 > 2013-10-02 00:13:49,875 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recover > {code} > And then we get into safemode > {code} > Construction[IP:1019|RBW]]} size 0 > 2013-10-02 00:13:50,277 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap > updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], > ReplicaUnderConstruction[172.18.145.96:10
[jira] [Updated] (HDFS-5291) Standby namenode after transition to active goes into safemode
[ https://issues.apache.org/jira/browse/HDFS-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-5291: -- Attachment: nn.log > Standby namenode after transition to active goes into safemode > -- > > Key: HDFS-5291 > URL: https://issues.apache.org/jira/browse/HDFS-5291 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.1-beta >Reporter: Arpit Gupta >Assignee: Jing Zhao >Priority: Critical > Attachments: nn.log > > > Some log snippets > standby state to active transition > {code} > 2013-10-02 00:13:49,482 INFO ipc.Server (Server.java:run(2068)) - IPC Server > handler 69 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 > Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation > category WRITE is not supported in state standby > 2013-10-02 00:13:49,689 INFO ipc.Server (Server.java:saslProcess(1342)) - > Auth successful for nn/hostn...@example.com (auth:SIMPLE) > 2013-10-02 00:13:49,696 INFO authorize.ServiceAuthorizationManager > (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful > for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface > org.apache.hadoop.ha.HAServiceProtocol > 2013-10-02 00:13:49,700 INFO namenode.FSNamesystem > (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for > standby state > 2013-10-02 00:13:49,701 WARN ha.EditLogTailer > (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted > java.lang.InterruptedException: sleep interrupted > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454) > at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail > 2013-10-02 00:13:49,704 INFO namenode.FSNamesystem > (FSNamesystem.java:startActiveServices(885)) - Starting services required for > active state > 2013-10-02 00:13:49,719 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting > recovery process for unclosed journal segments... > 2013-10-02 00:13:49,755 INFO ipc.Server (Server.java:saslProcess(1342)) - > Auth successful for hbase/hostn...@example.com (auth:SIMPLE) > 2013-10-02 00:13:49,761 INFO authorize.ServiceAuthorizationManager > (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful > for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface > org.apache.hadoop.hdfs.protocol.ClientProtocol > 2013-10-02 00:13:49,839 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully > started new epoch 85 > 2013-10-02 00:13:49,839 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery > of unclosed segment starting at txid 887112 > 2013-10-02 00:13:49,874 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare > phase complete. Responses: > IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true > } lastWriterEpoch: 84 lastCommittedTxId: 887530 > 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 > isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530 > 2013-10-02 00:13:49,875 INFO client.QuorumJournalManager > (QuorumJournalManager.java:recover > {code} > And then we get into safemode > {code} > Construction[IP:1019|RBW]]} size 0 > 2013-10-02 00:13:50,277 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap > updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], > ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde > rConstruction[IP:1019|RBW]]} size 0 > 2013-10-02 00:13:50,279 INFO hdfs.StateChange > (FSNamesystem.java:reportStatus(4703)) - STATE* Safe mode ON. > The reported blocks 1071 needs additional 5 blocks to reach the threshold > 1. of total blocks 1075. > Safe mode will be turne
[jira] [Created] (HDFS-5291) Standby namenode after transition to active goes into safemode
Arpit Gupta created HDFS-5291: - Summary: Standby namenode after transition to active goes into safemode Key: HDFS-5291 URL: https://issues.apache.org/jira/browse/HDFS-5291 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.1-beta Reporter: Arpit Gupta Assignee: Jing Zhao Priority: Critical Attachments: nn.log Some log snippets standby state to active transition {code} 2013-10-02 00:13:49,482 INFO ipc.Server (Server.java:run(2068)) - IPC Server handler 69 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewLease from IP:33911 Call#1483 Retry#1: error: org.apache.hadoop.ipc.StandbyException: Operation category WRITE is not supported in state standby 2013-10-02 00:13:49,689 INFO ipc.Server (Server.java:saslProcess(1342)) - Auth successful for nn/hostn...@example.com (auth:SIMPLE) 2013-10-02 00:13:49,696 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful for nn/hostn...@example.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.ha.HAServiceProtocol 2013-10-02 00:13:49,700 INFO namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1013)) - Stopping services started for standby state 2013-10-02 00:13:49,701 WARN ha.EditLogTailer (EditLogTailer.java:doWork(336)) - Edit log tailer interrupted java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:334) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1463) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:454) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTail 2013-10-02 00:13:49,704 INFO namenode.FSNamesystem (FSNamesystem.java:startActiveServices(885)) - Starting services required for active state 2013-10-02 00:13:49,719 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnfinalizedSegments(419)) - Starting recovery process for unclosed journal segments... 2013-10-02 00:13:49,755 INFO ipc.Server (Server.java:saslProcess(1342)) - Auth successful for hbase/hostn...@example.com (auth:SIMPLE) 2013-10-02 00:13:49,761 INFO authorize.ServiceAuthorizationManager (ServiceAuthorizationManager.java:authorize(111)) - Authorization successful for hbase/hostn...@example.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol 2013-10-02 00:13:49,839 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnfinalizedSegments(421)) - Successfully started new epoch 85 2013-10-02 00:13:49,839 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnclosedSegment(249)) - Beginning recovery of unclosed segment starting at txid 887112 2013-10-02 00:13:49,874 INFO client.QuorumJournalManager (QuorumJournalManager.java:recoverUnclosedSegment(258)) - Recovery prepare phase complete. Responses: IP:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530 172.18.145.97:8485: segmentState { startTxId: 887112 endTxId: 887531 isInProgress: true } lastWriterEpoch: 84 lastCommittedTxId: 887530 2013-10-02 00:13:49,875 INFO client.QuorumJournalManager (QuorumJournalManager.java:recover {code} And then we get into safemode {code} Construction[IP:1019|RBW]]} size 0 2013-10-02 00:13:50,277 INFO BlockStateChange (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap updated: IP:1019 is added to blk_IP157{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[IP:1019|RBW], ReplicaUnderConstruction[172.18.145.96:1019|RBW], ReplicaUnde rConstruction[IP:1019|RBW]]} size 0 2013-10-02 00:13:50,279 INFO hdfs.StateChange (FSNamesystem.java:reportStatus(4703)) - STATE* Safe mode ON. The reported blocks 1071 needs additional 5 blocks to reach the threshold 1. of total blocks 1075. Safe mode will be turned off automatically 2013-10-02 00:13:50,279 INFO BlockStateChange (BlockManager.java:logAddStoredBlock(2237)) - BLOCK* addStoredBlock: blockMap updated: IP:1019 is added to blk_IP158{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.18.145.99:1019|RBW], ReplicaUnderConstruction[172.18.145.97:1019|RBW], R
[jira] [Commented] (HDFS-5221) hftp: does not work with HA NN configuration
[ https://issues.apache.org/jira/browse/HDFS-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770140#comment-13770140 ] Arpit Gupta commented on HDFS-5221: --- This might be a dup of HDFS-5123 > hftp: does not work with HA NN configuration > > > Key: HDFS-5221 > URL: https://issues.apache.org/jira/browse/HDFS-5221 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, hdfs-client >Affects Versions: 2.0.5-alpha >Reporter: Joep Rottinghuis >Priority: Blocker > > When copying data between clusters of significant different version (say from > Hadoop 1.x equivalent to Hadoop 2.x) we have to use hftp. > When HA is configured, you have to point to a single (active) NN. > Now, when the active NN becomes standby, the the hftp: addresses will fail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5176) WebHDFS should support logical service names in URIs
[ https://issues.apache.org/jira/browse/HDFS-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762428#comment-13762428 ] Arpit Gupta commented on HDFS-5176: --- Is this a dup of HDFS-5122? > WebHDFS should support logical service names in URIs > > > Key: HDFS-5176 > URL: https://issues.apache.org/jira/browse/HDFS-5176 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins > > Having WebHDFS support logical URIs would allow users to eg distcp from one > system to another (using a webhdfs source) w/o having to first figure out the > hostname for the active NameNode on the source. Eventually we can make > WebHdfsFileSystem fully support HA (eg failover) but this would be a useful > intermediate point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
[ https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756728#comment-13756728 ] Arpit Gupta commented on HDFS-5147: --- Ah thanks Konstantin. Yes by default we should always go to the active NN. > Certain dfsadmin commands such as safemode do not interact with the active > namenode in ha setup > --- > > Key: HDFS-5147 > URL: https://issues.apache.org/jira/browse/HDFS-5147 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jing Zhao > > There are certain commands in dfsadmin return the status of the first > namenode specified in the configs rather than interacting with the active > namenode > For example. Issue > hdfs dfsadmin -safemode get > and it will return the status of the first namenode in the configs rather > than the active namenode. > I think all dfsadmin commands should determine which is the active namenode > do the operation on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
[ https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754933#comment-13754933 ] Arpit Gupta commented on HDFS-5147: --- Also we should add an optional argument to take in the namenode hostname or dfs.ha.namenodes.${dfs.nameservices} value so the user can do admin operation on any namenode. > Certain dfsadmin commands such as safemode do not interact with the active > namenode in ha setup > --- > > Key: HDFS-5147 > URL: https://issues.apache.org/jira/browse/HDFS-5147 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta > > There are certain commands in dfsadmin return the status of the first > namenode specified in the configs rather than interacting with the active > namenode > For example. Issue > hdfs dfsadmin -safemode get > and it will return the status of the first namenode in the configs rather > than the active namenode. > I think all dfsadmin commands should determine which is the active namenode > do the operation on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
[ https://issues.apache.org/jira/browse/HDFS-5147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754930#comment-13754930 ] Arpit Gupta commented on HDFS-5147: --- Currently dfsadmin has the following commands {code} Note: Administrative commands can only be run as the HDFS superuser. [-report] [-safemode enter | leave | get | wait] [-allowSnapshot ] [-disallowSnapshot ] [-saveNamespace] [-rollEdits] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade] [-metasave filename] [-refreshServiceAcl] [-refreshUserToGroupsMappings] [-refreshSuperUserGroupsConfiguration] [-printTopology] [-refreshNamenodes datanodehost:port] [-deleteBlockPool datanode-host:port blockpoolId [force]] [-setQuota ...] [-clrQuota ...] [-setSpaceQuota ...] [-clrSpaceQuota ...] [-setBalancerBandwidth ] [-fetchImage ] {code} > Certain dfsadmin commands such as safemode do not interact with the active > namenode in ha setup > --- > > Key: HDFS-5147 > URL: https://issues.apache.org/jira/browse/HDFS-5147 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta > > There are certain commands in dfsadmin return the status of the first > namenode specified in the configs rather than interacting with the active > namenode > For example. Issue > hdfs dfsadmin -safemode get > and it will return the status of the first namenode in the configs rather > than the active namenode. > I think all dfsadmin commands should determine which is the active namenode > do the operation on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5147) Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup
Arpit Gupta created HDFS-5147: - Summary: Certain dfsadmin commands such as safemode do not interact with the active namenode in ha setup Key: HDFS-5147 URL: https://issues.apache.org/jira/browse/HDFS-5147 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta There are certain commands in dfsadmin return the status of the first namenode specified in the configs rather than interacting with the active namenode For example. Issue hdfs dfsadmin -safemode get and it will return the status of the first namenode in the configs rather than the active namenode. I think all dfsadmin commands should determine which is the active namenode do the operation on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5140) Too many safemode monitor threads being created in the standby namenode
[ https://issues.apache.org/jira/browse/HDFS-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752639#comment-13752639 ] Arpit Gupta commented on HDFS-5140: --- Here is the stack trace from the standby namenode {code} 2013-08-28 08:58:45,519 INFO hdfs.StateChange (FSNamesystem.java:reportStatus(4677)) - STATE* Safe mode extension entered. The reported blocks 833 has reached the threshold 1. of total blocks 833. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 29 seconds. 2013-08-28 08:58:45,524 ERROR namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(203)) - Encountered exception on operation CloseOp [length=0, inodeId=0, path=/user/hrt_qa/ha-loadgenerator/100-threads/dir3/dir2/dir5/dir4/dir2/dir1/hostname63, replication=3, mtime=1377680236411, atime=1377680236320, blockSize=134217728, blocks=[blk_1073940431_205511], permissions=hrt_qa:hrt_qa:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=1141116] java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:640) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.checkMode(FSNamesystem.java:4521) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.incrementSafeBlockCount(FSNamesystem.java:4568) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$SafeModeInfo.access$1900(FSNamesystem.java:4275) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.incrementSafeBlockCount(FSNamesystem.java:4854) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:596) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.completeBlock(BlockManager.java:608) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.forceCompleteBlock(BlockManager.java:621) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:696) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:372) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:198) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) 2013-08-28 08:58:45,597 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(328)) - Unknown error encountered while tailing edits. Shutting down standby NN. java.io.IOException: Failed to apply edit log operation CloseOp [length=0, inodeId=0, path=/user/hrt_qa/ha-loadgenerator/100-threads/dir3/dir2/dir5/dir4/dir2/dir1/hostname63, replication=3, mtime=1377680236411, atime=1377680236320, blockSize=134217728, blocks=[blk_1073940431_205511], permissions=hrt_qa:hrt_qa:rw-r--r--, clientName=, clientMachine=, opCode=OP_CLOSE, txid=1141116]: error unable to create new native thread at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:204) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:111) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296) at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292) 2013-08-28 08:58:45,636 INFO util.ExitUtil (ExitUtil.java:terminate(1
[jira] [Updated] (HDFS-5140) Too many safemode monitor threads being created in the standby namenode causing it to fail with out of memory error
[ https://issues.apache.org/jira/browse/HDFS-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-5140: -- Summary: Too many safemode monitor threads being created in the standby namenode causing it to fail with out of memory error (was: Too many safemode monitor threads being created in the standby namenode) > Too many safemode monitor threads being created in the standby namenode > causing it to fail with out of memory error > --- > > Key: HDFS-5140 > URL: https://issues.apache.org/jira/browse/HDFS-5140 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jing Zhao >Priority: Blocker > > While running namenode load generator with 100 threads for 10 mins namenode > was being failed over ever 2 mins. > The standby namenode shut itself down as it ran out of memory and was not > able to create another thread. > When we searched for 'Safe mode extension entered' in the standby log it was > present 55000+ times -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5140) Too many safemode monitor threads being created in the standby namenode
Arpit Gupta created HDFS-5140: - Summary: Too many safemode monitor threads being created in the standby namenode Key: HDFS-5140 URL: https://issues.apache.org/jira/browse/HDFS-5140 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta Assignee: Jing Zhao Priority: Blocker While running namenode load generator with 100 threads for 10 mins namenode was being failed over ever 2 mins. The standby namenode shut itself down as it ran out of memory and was not able to create another thread. When we searched for 'Safe mode extension entered' in the standby log it was present 55000+ times -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5132) Deadlock in namenode while running load generator with 15 threads
[ https://issues.apache.org/jira/browse/HDFS-5132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-5132: -- Attachment: jstack.log Attaching the output of jstack > Deadlock in namenode while running load generator with 15 threads > - > > Key: HDFS-5132 > URL: https://issues.apache.org/jira/browse/HDFS-5132 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta > Attachments: jstack.log > > > While running nn load generator with 15 threads for 20 mins the standby > namenode deadlocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5132) Deadlock in namenode while running load generator with 15 threads
Arpit Gupta created HDFS-5132: - Summary: Deadlock in namenode while running load generator with 15 threads Key: HDFS-5132 URL: https://issues.apache.org/jira/browse/HDFS-5132 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.1.0-beta Reporter: Arpit Gupta While running nn load generator with 15 threads for 20 mins the standby namenode deadlocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5122) webhdfs paths on an ha cluster still require the use of the active nn address rather than using the nameservice
[ https://issues.apache.org/jira/browse/HDFS-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746948#comment-13746948 ] Arpit Gupta commented on HDFS-5122: --- This is similar to HDFS-5123 > webhdfs paths on an ha cluster still require the use of the active nn address > rather than using the nameservice > --- > > Key: HDFS-5122 > URL: https://issues.apache.org/jira/browse/HDFS-5122 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta > > For example if the dfs.nameservices is set to arpit > {code} > hdfs dfs -ls webhdfs://arpit:50070/tmp > or > hdfs dfs -ls webhdfs://arpit/tmp > {code} > does not work > You have to provide the exact active namenode hostname. On an HA cluster > using dfs client one should not need to provide the active nn hostname -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-5123) hftp paths on an ha cluster still require the use of the active nn address rather than using the nameservice
[ https://issues.apache.org/jira/browse/HDFS-5123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13746949#comment-13746949 ] Arpit Gupta commented on HDFS-5123: --- This is similar to HDFS-5122 > hftp paths on an ha cluster still require the use of the active nn address > rather than using the nameservice > > > Key: HDFS-5123 > URL: https://issues.apache.org/jira/browse/HDFS-5123 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta > > For example if the dfs.nameservices is set to arpit > {code} > hdfs dfs -ls hftp://arpit:50070/tmp > or > hdfs dfs -ls hftp://arpit/tmp > {code} > does not work > You have to provide the exact active namenode hostname. On an HA cluster > using dfs client one should not need to provide the active nn hostname -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5123) hftp paths on an ha cluster still require the use of the active nn address rather than using the nameservice
Arpit Gupta created HDFS-5123: - Summary: hftp paths on an ha cluster still require the use of the active nn address rather than using the nameservice Key: HDFS-5123 URL: https://issues.apache.org/jira/browse/HDFS-5123 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta For example if the dfs.nameservices is set to arpit {code} hdfs dfs -ls hftp://arpit:50070/tmp or hdfs dfs -ls hftp://arpit/tmp {code} does not work You have to provide the exact active namenode hostname. On an HA cluster using dfs client one should not need to provide the active nn hostname -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-5122) webhdfs paths on an ha cluster still require the use of the active nn address rather than using the nameservice
[ https://issues.apache.org/jira/browse/HDFS-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-5122: -- Description: For example if the dfs.nameservices is set to arpit {code} hdfs dfs -ls webhdfs://arpit:50070/tmp or hdfs dfs -ls webhdfs://arpit/tmp {code} does not work You have to provide the exact active namenode hostname. On an HA cluster using dfs client one should not need to provide the active nn hostname was: For example if the dfs.nameservices is set to arpit {code} hdfs dfs -ls webhdfs://arpit:50070/tmp or hdfs dfs -ls webhdfs://arpit/tmp {code} does not work You have to provide the exact active namenode hostname > webhdfs paths on an ha cluster still require the use of the active nn address > rather than using the nameservice > --- > > Key: HDFS-5122 > URL: https://issues.apache.org/jira/browse/HDFS-5122 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta > > For example if the dfs.nameservices is set to arpit > {code} > hdfs dfs -ls webhdfs://arpit:50070/tmp > or > hdfs dfs -ls webhdfs://arpit/tmp > {code} > does not work > You have to provide the exact active namenode hostname. On an HA cluster > using dfs client one should not need to provide the active nn hostname -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-5122) webhdfs paths on an ha cluster still require the use of the active nn address rather than using the nameservice
Arpit Gupta created HDFS-5122: - Summary: webhdfs paths on an ha cluster still require the use of the active nn address rather than using the nameservice Key: HDFS-5122 URL: https://issues.apache.org/jira/browse/HDFS-5122 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.1.0-beta Reporter: Arpit Gupta For example if the dfs.nameservices is set to arpit {code} hdfs dfs -ls webhdfs://arpit:50070/tmp or hdfs dfs -ls webhdfs://arpit/tmp {code} does not work You have to provide the exact active namenode hostname -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4594) WebHDFS open sets Content-Length header to what is specified by length parameter rather than how much data is actually returned.
Arpit Gupta created HDFS-4594: - Summary: WebHDFS open sets Content-Length header to what is specified by length parameter rather than how much data is actually returned. Key: HDFS-4594 URL: https://issues.apache.org/jira/browse/HDFS-4594 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.3-alpha Reporter: Arpit Gupta This was noticed on 2.0.3 alpha Lets say we have a file of length x We make an webhdfs open call specifying length=x+1 The response of the call redirected to the datanode sets the content length header to value x+1 rather than x. Now this causes an error when the client tries to read the data. For the test i was using HttpResponse.getEntity().getContent() This failed with message "Premature end of Content-Length delimited message body (expected: 71898; received: 71897" This was not seen in hadoop 1 as we did not set the content length header. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4565) use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary namenode and namenode http server
[ https://issues.apache.org/jira/browse/HDFS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13596224#comment-13596224 ] Arpit Gupta commented on HDFS-4565: --- No tests added as method being used already has tests. Test failure is unrelated to this patch. > use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary > namenode and namenode http server > > > Key: HDFS-4565 > URL: https://issues.apache.org/jira/browse/HDFS-4565 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta >Priority: Minor > Attachments: HDFS-4565.patch > > > use the method introduced by HDFS-4540 to the spengo keytab key. Better as we > have unit test coverage for the new method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4565) use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary namenode and namenode http server
[ https://issues.apache.org/jira/browse/HDFS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4565: -- Summary: use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary namenode and namenode http server (was: use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and namenode http server) > use DFSUtil.getSpnegoKeytabKey() to get the spnego keytab key in secondary > namenode and namenode http server > > > Key: HDFS-4565 > URL: https://issues.apache.org/jira/browse/HDFS-4565 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta >Priority: Minor > Attachments: HDFS-4565.patch > > > use the method introduced by HDFS-4540 to the spengo keytab key. Better as we > have unit test coverage for the new method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4565) use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and namenode http server
[ https://issues.apache.org/jira/browse/HDFS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4565: -- Status: Patch Available (was: Open) > use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and > namenode http server > > > Key: HDFS-4565 > URL: https://issues.apache.org/jira/browse/HDFS-4565 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta >Priority: Minor > Attachments: HDFS-4565.patch > > > use the method introduced by HDFS-4540 to the spengo keytab key. Better as we > have unit test coverage for the new method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4565) use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and namenode http server
[ https://issues.apache.org/jira/browse/HDFS-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4565: -- Attachment: HDFS-4565.patch > use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and > namenode http server > > > Key: HDFS-4565 > URL: https://issues.apache.org/jira/browse/HDFS-4565 > Project: Hadoop HDFS > Issue Type: Improvement > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta >Priority: Minor > Attachments: HDFS-4565.patch > > > use the method introduced by HDFS-4540 to the spengo keytab key. Better as we > have unit test coverage for the new method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4565) use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and namenode http server
Arpit Gupta created HDFS-4565: - Summary: use DFSUtil.getSpnegoKeytabKey to get the key in secondary namenode and namenode http server Key: HDFS-4565 URL: https://issues.apache.org/jira/browse/HDFS-4565 Project: Hadoop HDFS Issue Type: Improvement Components: security Affects Versions: 2.0.3-alpha Reporter: Arpit Gupta Assignee: Arpit Gupta Priority: Minor use the method introduced by HDFS-4540 to the spengo keytab key. Better as we have unit test coverage for the new method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Attachment: HDFS-4540.patch Minor updates, add a check and test for null > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch, HDFS-4540.patch, HDFS-4540.patch, > HDFS-4540.patch, HDFS-4540.patch, HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Attachment: HDFS-4540.patch added timeout to the test. > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch, HDFS-4540.patch, HDFS-4540.patch, > HDFS-4540.patch, HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Attachment: HDFS-4540.patch generate the patch from the correct branch duh! > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch, HDFS-4540.patch, HDFS-4540.patch, > HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Attachment: HDFS-4540.patch Attached a patch where the code is moved to DFSUtil and added a test. Also removed unused imports. > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch, HDFS-4540.patch, HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
[ https://issues.apache.org/jira/browse/HDFS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590926#comment-13590926 ] Arpit Gupta commented on HDFS-4541: --- No tests added as this is a change to shell scripts. Manually verified that the secure datanode logs are being written to the appropriate directory. Test failure is unrelated. > set hadoop.log.dir and hadoop.id.str when starting secure datanode so it > writes the logs to the correct dir by default > -- > > Key: HDFS-4541 > URL: https://issues.apache.org/jira/browse/HDFS-4541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4541.patch, HDFS-4541.patch > > > currently in hadoop-config.sh we set the following > {code} > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" > {code} > however when this file is sourced we dont know whether we are starting a > secure data node. > In the hdfs script when we determine whether we are starting secure data node > or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
[ https://issues.apache.org/jira/browse/HDFS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4541: -- Status: Patch Available (was: Open) > set hadoop.log.dir and hadoop.id.str when starting secure datanode so it > writes the logs to the correct dir by default > -- > > Key: HDFS-4541 > URL: https://issues.apache.org/jira/browse/HDFS-4541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4541.patch, HDFS-4541.patch > > > currently in hadoop-config.sh we set the following > {code} > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" > {code} > however when this file is sourced we dont know whether we are starting a > secure data node. > In the hdfs script when we determine whether we are starting secure data node > or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
[ https://issues.apache.org/jira/browse/HDFS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590819#comment-13590819 ] Arpit Gupta commented on HDFS-4541: --- @Chris Actually it will appear 3 times after this change for secure datanode. It appears twice for any hdfs service right now. hadoop-deamon.sh -> hadoop (sources hadoop-config.sh) -> hdfs (source hdfs-config.sh which sources hadoop-config.sh) and then we set it again. I think the problem of duplicates in OPTS is a an issue that should be solved in a different jira. > set hadoop.log.dir and hadoop.id.str when starting secure datanode so it > writes the logs to the correct dir by default > -- > > Key: HDFS-4541 > URL: https://issues.apache.org/jira/browse/HDFS-4541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4541.patch, HDFS-4541.patch > > > currently in hadoop-config.sh we set the following > {code} > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" > {code} > however when this file is sourced we dont know whether we are starting a > secure data node. > In the hdfs script when we determine whether we are starting secure data node > or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
[ https://issues.apache.org/jira/browse/HDFS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4541: -- Attachment: HDFS-4541.patch regenerated the patch --no-prefix > set hadoop.log.dir and hadoop.id.str when starting secure datanode so it > writes the logs to the correct dir by default > -- > > Key: HDFS-4541 > URL: https://issues.apache.org/jira/browse/HDFS-4541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4541.patch, HDFS-4541.patch > > > currently in hadoop-config.sh we set the following > {code} > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" > {code} > however when this file is sourced we dont know whether we are starting a > secure data node. > In the hdfs script when we determine whether we are starting secure data node > or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Attachment: HDFS-4540.patch updated the patch with the check. > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch, HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590763#comment-13590763 ] Arpit Gupta commented on HDFS-4540: --- Good point Suresh. Let me update the patch with more checks. > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
[ https://issues.apache.org/jira/browse/HDFS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590735#comment-13590735 ] Arpit Gupta commented on HDFS-4541: --- Attached a patch with it applied secure datanode logs will be written to the correct dir by default. Without this it wanted to write to your_log_dir/root dir. > set hadoop.log.dir and hadoop.id.str when starting secure datanode so it > writes the logs to the correct dir by default > -- > > Key: HDFS-4541 > URL: https://issues.apache.org/jira/browse/HDFS-4541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4541.patch > > > currently in hadoop-config.sh we set the following > {code} > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" > {code} > however when this file is sourced we dont know whether we are starting a > secure data node. > In the hdfs script when we determine whether we are starting secure data node > or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
[ https://issues.apache.org/jira/browse/HDFS-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4541: -- Attachment: HDFS-4541.patch > set hadoop.log.dir and hadoop.id.str when starting secure datanode so it > writes the logs to the correct dir by default > -- > > Key: HDFS-4541 > URL: https://issues.apache.org/jira/browse/HDFS-4541 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4541.patch > > > currently in hadoop-config.sh we set the following > {code} > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" > HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" > {code} > however when this file is sourced we dont know whether we are starting a > secure data node. > In the hdfs script when we determine whether we are starting secure data node > or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4541) set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default
Arpit Gupta created HDFS-4541: - Summary: set hadoop.log.dir and hadoop.id.str when starting secure datanode so it writes the logs to the correct dir by default Key: HDFS-4541 URL: https://issues.apache.org/jira/browse/HDFS-4541 Project: Hadoop HDFS Issue Type: Bug Components: datanode, security Affects Versions: 2.0.3-alpha Reporter: Arpit Gupta Assignee: Arpit Gupta currently in hadoop-config.sh we set the following {code} HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR" HADOOP_OPTS="$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING" {code} however when this file is sourced we dont know whether we are starting a secure data node. In the hdfs script when we determine whether we are starting secure data node or not we should also update HADOOP_OPTS -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Target Version/s: 2.0.4-beta > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Fix Version/s: (was: 2.0.4-beta) > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13590240#comment-13590240 ] Arpit Gupta commented on HDFS-4540: --- No new tests added as this is a security related change. Confirmed namenode starts up with this change when using a different keytab for spnego principal. > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Fix For: 2.0.4-beta > > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) namenode http server should use the web authentication keytab for spnego principal
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Summary: namenode http server should use the web authentication keytab for spnego principal (was: Spnego principal should be looked up in the web authentication kerberos keytab before the namenode's keytab) > namenode http server should use the web authentication keytab for spnego > principal > -- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Fix For: 2.0.4-beta > > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) Spnego principal should be looked up in the web authentication kerberos keytab before the namenode's keytab
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Attachment: HDFS-4540.patch Patch that uses the correct config if available > Spnego principal should be looked up in the web authentication kerberos > keytab before the namenode's keytab > --- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Fix For: 2.0.4-beta > > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4540) Spnego principal should be looked up in the web authentication kerberos keytab before the namenode's keytab
[ https://issues.apache.org/jira/browse/HDFS-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4540: -- Status: Patch Available (was: Open) > Spnego principal should be looked up in the web authentication kerberos > keytab before the namenode's keytab > --- > > Key: HDFS-4540 > URL: https://issues.apache.org/jira/browse/HDFS-4540 > Project: Hadoop HDFS > Issue Type: Bug > Components: security >Affects Versions: 2.0.3-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Fix For: 2.0.4-beta > > Attachments: HDFS-4540.patch > > > This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego > should look for dfs.web.authentication.kerberos.keytab before using > dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4540) Spnego principal should be looked up in the web authentication kerberos keytab before the namenode's keytab
Arpit Gupta created HDFS-4540: - Summary: Spnego principal should be looked up in the web authentication kerberos keytab before the namenode's keytab Key: HDFS-4540 URL: https://issues.apache.org/jira/browse/HDFS-4540 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.0.3-alpha Reporter: Arpit Gupta Assignee: Arpit Gupta Fix For: 2.0.4-beta This is similar to HDFS-4105. in the NameNodeHttpServer.start() initSpnego should look for dfs.web.authentication.kerberos.keytab before using dfs.namenode.keytab.file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3727) When using SPNEGO, NN should not try to log in using KSSL principal
[ https://issues.apache.org/jira/browse/HDFS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3727: -- Affects Version/s: 1.1.1 > When using SPNEGO, NN should not try to log in using KSSL principal > --- > > Key: HDFS-3727 > URL: https://issues.apache.org/jira/browse/HDFS-3727 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 1.2.0 > > Attachments: HDFS-3727.patch > > > When performing a checkpoint with security enabled, the NN will attempt to > relogin from its keytab before making an HTTP request back to the 2NN to > fetch the newly-merged image. However, it always attempts to log in using the > KSSL principal, even if SPNEGO is configured to be used. > This issue was discovered by Stephen Chu. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3727) When using SPNEGO, NN should not try to log in using KSSL principal
[ https://issues.apache.org/jira/browse/HDFS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3727: -- Affects Version/s: 1.1.0 > When using SPNEGO, NN should not try to log in using KSSL principal > --- > > Key: HDFS-3727 > URL: https://issues.apache.org/jira/browse/HDFS-3727 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 1.2.0 > > Attachments: HDFS-3727.patch > > > When performing a checkpoint with security enabled, the NN will attempt to > relogin from its keytab before making an HTTP request back to the 2NN to > fetch the newly-merged image. However, it always attempts to log in using the > KSSL principal, even if SPNEGO is configured to be used. > This issue was discovered by Stephen Chu. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HDFS-3727) When using SPNEGO, NN should not try to log in using KSSL principal
[ https://issues.apache.org/jira/browse/HDFS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta reopened HDFS-3727: --- Can we commit this to branch 1.1 so that the next release can pull it in. Also a couple of unused imports got left in the class after this patch. > When using SPNEGO, NN should not try to log in using KSSL principal > --- > > Key: HDFS-3727 > URL: https://issues.apache.org/jira/browse/HDFS-3727 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Aaron T. Myers >Assignee: Aaron T. Myers > Fix For: 1.2.0 > > Attachments: HDFS-3727.patch > > > When performing a checkpoint with security enabled, the NN will attempt to > relogin from its keytab before making an HTTP request back to the 2NN to > fetch the newly-merged image. However, it always attempts to log in using the > KSSL principal, even if SPNEGO is configured to be used. > This issue was discovered by Stephen Chu. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4219) Port slive to branch-1
[ https://issues.apache.org/jira/browse/HDFS-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501711#comment-13501711 ] Arpit Gupta commented on HDFS-4219: --- Here is the output from test patch {code} [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 78 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. [exec] [exec] [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] {code} Findbug warnings are not related to this patch. > Port slive to branch-1 > -- > > Key: HDFS-4219 > URL: https://issues.apache.org/jira/browse/HDFS-4219 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4219.branch-1.patch > > > Originally it was committed in HDFS-708 and MAPREDUCE-1804 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4219) Port slive to branch-1
[ https://issues.apache.org/jira/browse/HDFS-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501671#comment-13501671 ] Arpit Gupta commented on HDFS-4219: --- I will update the jira with the results of test patch when done. > Port slive to branch-1 > -- > > Key: HDFS-4219 > URL: https://issues.apache.org/jira/browse/HDFS-4219 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4219.branch-1.patch > > > Originally it was committed in HDFS-708 and MAPREDUCE-1804 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4219) Port slive to branch-1
[ https://issues.apache.org/jira/browse/HDFS-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501670#comment-13501670 ] Arpit Gupta commented on HDFS-4219: --- It was a straight forward port by taking the code from trunk (hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/fs/slive) to branch-1. Had to change SliveMapper.java {code} if(conf.get(MRJobConfig.TASK_ATTEMPT_ID) != null ) { this.taskId = TaskAttemptID.forName(conf.get(MRJobConfig.TASK_ATTEMPT_ID)) .getTaskID().getId(); } else { // So that branch-1/0.20 can run this same code as well this.taskId = TaskAttemptID.forName(conf.get("mapred.task.id")) .getTaskID().getId(); } {code} and remove the if/else block and just make it {code} this.taskId = TaskAttemptID.forName(conf.get("mapred.task.id")) .getTaskID().getId(); {code} As the MRJobConfig is not available in branch-1 > Port slive to branch-1 > -- > > Key: HDFS-4219 > URL: https://issues.apache.org/jira/browse/HDFS-4219 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4219.branch-1.patch > > > Originally it was committed in HDFS-708 and MAPREDUCE-1804 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4219) Port slive to branch-1
[ https://issues.apache.org/jira/browse/HDFS-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4219: -- Description: Originally it was committed in HDFS-708 and MAPREDUCE-1804 > Port slive to branch-1 > -- > > Key: HDFS-4219 > URL: https://issues.apache.org/jira/browse/HDFS-4219 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4219.branch-1.patch > > > Originally it was committed in HDFS-708 and MAPREDUCE-1804 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4219) Port slive to branch-1
[ https://issues.apache.org/jira/browse/HDFS-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4219: -- Attachment: HDFS-4219.branch-1.patch > Port slive to branch-1 > -- > > Key: HDFS-4219 > URL: https://issues.apache.org/jira/browse/HDFS-4219 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4219.branch-1.patch > > > Originally it was committed in HDFS-708 and MAPREDUCE-1804 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4219) Port slive to branch-1
Arpit Gupta created HDFS-4219: - Summary: Port slive to branch-1 Key: HDFS-4219 URL: https://issues.apache.org/jira/browse/HDFS-4219 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 1.1.0 Reporter: Arpit Gupta -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4219) Port slive to branch-1
[ https://issues.apache.org/jira/browse/HDFS-4219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta reassigned HDFS-4219: - Assignee: Arpit Gupta > Port slive to branch-1 > -- > > Key: HDFS-4219 > URL: https://issues.apache.org/jira/browse/HDFS-4219 > Project: Hadoop HDFS > Issue Type: New Feature >Affects Versions: 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487237#comment-13487237 ] Arpit Gupta commented on HDFS-4105: --- patched a secure hadoop 1.1.0 deploy with the patch and now the secondary namenode is able to log in. Question if the HTTP principal fails to login should we not stop the secondary namenode server? I think we should do that as the image calls would fail without the if the HTTP principal was not available. Let me know and i can log a different jira for it. > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4108) In a secure cluster, in the HDFS WEBUI , clicking on a datanode in the node list , gives an error
[ https://issues.apache.org/jira/browse/HDFS-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484224#comment-13484224 ] Arpit Gupta commented on HDFS-4108: --- What user would they get the delegation token for? Should they not be using SPNEGO and making the client provide kerberos credentials? > In a secure cluster, in the HDFS WEBUI , clicking on a datanode in the node > list , gives an error > - > > Key: HDFS-4108 > URL: https://issues.apache.org/jira/browse/HDFS-4108 > Project: Hadoop HDFS > Issue Type: Bug > Components: security, webhdfs >Affects Versions: 1.1.0 >Reporter: Benoy Antony >Assignee: Benoy Antony >Priority: Minor > Attachments: HDFS-4108-1-1.patch > > > This issue happens in secure cluster. > To reproduce : > Go to the NameNode WEB UI. (dfshealth.jsp) > Click to bring up the list of LiveNodes (dfsnodelist.jsp) > Click on a datanode to bring up the filesystem web page ( > browsedirectory.jsp) > The page containing the directory listing does not come up. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482755#comment-13482755 ] Arpit Gupta commented on HDFS-4105: --- no tests are added as changes are related to secure setup. here is the test patch output for branch-1 {code} [exec] BUILD SUCCESSFUL [exec] Total time: 5 minutes 0 seconds [exec] [exec] [exec] [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings. {code} Findbugs warnings are not related to this patch. > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4105: -- Attachment: HDFS-4105.patch patch for trunk. > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4105: -- Status: Patch Available (was: Open) > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha, 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4105: -- Attachment: HDFS-4105.branch-1.patch patch for branch-1 > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
Arpit Gupta created HDFS-4105: - Summary: the SPNEGO user for secondary namenode should use the web keytab Key: HDFS-4105 URL: https://issues.apache.org/jira/browse/HDFS-4105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha, 1.1.0 Reporter: Arpit Gupta Assignee: Arpit Gupta This is similar to HDFS-3466 where we made sure the namenode checks for the web keytab before it uses the namenode keytab. The same needs to be done for secondary namenode as well. {code} String httpKeytab = conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); if (httpKeytab != null && !httpKeytab.isEmpty()) { params.put("kerberos.keytab", httpKeytab); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory
[ https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480256#comment-13480256 ] Arpit Gupta commented on HDFS-4084: --- @Brandon Can we make the new commands case insensitive? We can log a different jira to make existing commands also case insensitive. > provide CLI support for allow and disallow snapshot on a directory > -- > > Key: HDFS-4084 > URL: https://issues.apache.org/jira/browse/HDFS-4084 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node, tools >Affects Versions: HDFS-2802 >Reporter: Brandon Li >Assignee: Brandon Li > Attachments: HDFS-4084.patch > > > To provide CLI support to allow snapshot, disallow snapshot on a directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4063) Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script.
[ https://issues.apache.org/jira/browse/HDFS-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477149#comment-13477149 ] Arpit Gupta commented on HDFS-4063: --- These scripts were written to help with setup for rpm's being generated. Given the discussion on HADOOP-8925 which talks about removing packaging from hadoop does it make sense to wait for resolution and close this as wont fix after that? > Unable to change JAVA_HOME directory in hadoop-setup-conf.sh script. > > > Key: HDFS-4063 > URL: https://issues.apache.org/jira/browse/HDFS-4063 > Project: Hadoop HDFS > Issue Type: Bug > Components: scripts, tools >Affects Versions: 1.0.3, 1.1.0, 2.0.2-alpha > Environment: Fedora 17 3.3.4-5.fc17.x86_64t, java version > "1.7.0_06-icedtea", Rackspace Cloud (NextGen) >Reporter: Haoquan Wang >Priority: Minor > Labels: patch > Original Estimate: 1h > Remaining Estimate: 1h > > The JAVA_HOME directory remains unchanged no matter what you enter when you > run hadoop-setup-conf.sh to generate hadoop configurations. Please see below > example: > * > [root@hadoop-slave ~]# /sbin/hadoop-setup-conf.sh > Setup Hadoop Configuration > Where would you like to put config directory? (/etc/hadoop) > Where would you like to put log directory? (/var/log/hadoop) > Where would you like to put pid directory? (/var/run/hadoop) > What is the host of the namenode? (hadoop-slave) > Where would you like to put namenode data directory? > (/var/lib/hadoop/hdfs/namenode) > Where would you like to put datanode data directory? > (/var/lib/hadoop/hdfs/datanode) > What is the host of the jobtracker? (hadoop-slave) > Where would you like to put jobtracker/tasktracker data directory? > (/var/lib/hadoop/mapred) > Where is JAVA_HOME directory? (/usr/java/default) *+/usr/lib/jvm/jre+* > Would you like to create directories/copy conf files to localhost? (Y/n) > Review your choices: > Config directory: /etc/hadoop > Log directory : /var/log/hadoop > PID directory : /var/run/hadoop > Namenode host : hadoop-slave > Namenode directory : /var/lib/hadoop/hdfs/namenode > Datanode directory : /var/lib/hadoop/hdfs/datanode > Jobtracker host : hadoop-slave > Mapreduce directory : /var/lib/hadoop/mapred > Task scheduler : org.apache.hadoop.mapred.JobQueueTaskScheduler > JAVA_HOME directory : *+/usr/java/default+* > Create dirs/copy conf files : y > Proceed with generate configuration? (y/N) n > User aborted setup, exiting... > * > Resolution: > Amend line 509 in file /sbin/hadoop-setup-conf.sh > from: > JAVA_HOME=${USER_USER_JAVA_HOME:-$JAVA_HOME} > to: > JAVA_HOME=${USER_JAVA_HOME:-$JAVA_HOME} > will resolve this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3977) Incompatible change between hadoop-1 and hadoop-2 when the dfs.hosts and dfs.hosts.exclude files are not present
[ https://issues.apache.org/jira/browse/HDFS-3977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta resolved HDFS-3977. --- Resolution: Invalid Thanks Todd. Resolving it as invalid. > Incompatible change between hadoop-1 and hadoop-2 when the dfs.hosts and > dfs.hosts.exclude files are not present > > > Key: HDFS-3977 > URL: https://issues.apache.org/jira/browse/HDFS-3977 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > > While testing hadoop-1 and hadoop-2 the following was noticed > if the files in the properties dfs.hosts and dfs.hosts.exclude do not exist > in hadoop-1 namenode format and start went through successfully. > in hadoop-2 we get a file not found exception and both the format and the > namenode start commands fail. > We should be logging a warning in the case when the file is not found so that > we are compatible with hadoop-1 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3978) Document backward incompatible changes between hadoop-1.x and 2.x
[ https://issues.apache.org/jira/browse/HDFS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3978: -- Description: We should create a new site document to explicitly list down the know incompatible changes between hadoop 1.x and 2.x This will make it easier for users to determine these differenence was: The following incompatible changes were noticed between branch-1 and branch-2 caused by HADOOP-8551 1. mkdir would create parent directories in branch-1 if they did not exist. In branch-2 users have to explicitly send mkdir -p 2. Create a multi level dir in branch 1 something like mkdir /test/1 /test would get permissions 755 and /test/1 would get the permissions based on your umask settings however if you run the command in branch-2 mkdir -p /test/1 both /test and /test/1 will get the permissions based on your umask. These are significant changes that we should document. > Document backward incompatible changes between hadoop-1.x and 2.x > - > > Key: HDFS-3978 > URL: https://issues.apache.org/jira/browse/HDFS-3978 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.0.2-alpha >Reporter: Arpit Gupta > > We should create a new site document to explicitly list down the know > incompatible changes between hadoop 1.x and 2.x > This will make it easier for users to determine these differenence -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3978) Document backward incompatible changes between hadoop-1.x and 2.x
[ https://issues.apache.org/jira/browse/HDFS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3978: -- Description: We should create a new site document to explicitly list down the know incompatible changes between hadoop 1.x and 2.x I believe this will make it easier for users to determine all the changes one needs to make when moving from 1.x to 2.x was: We should create a new site document to explicitly list down the know incompatible changes between hadoop 1.x and 2.x This will make it easier for users to determine these differenence > Document backward incompatible changes between hadoop-1.x and 2.x > - > > Key: HDFS-3978 > URL: https://issues.apache.org/jira/browse/HDFS-3978 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.0.2-alpha >Reporter: Arpit Gupta > > We should create a new site document to explicitly list down the know > incompatible changes between hadoop 1.x and 2.x > I believe this will make it easier for users to determine all the changes one > needs to make when moving from 1.x to 2.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3978) Document backward incompatible changes between hadoop-1.x and 2.x
[ https://issues.apache.org/jira/browse/HDFS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463917#comment-13463917 ] Arpit Gupta commented on HDFS-3978: --- The following incompatible changes were noticed between branch-1 and branch-2 caused by HADOOP-8551 1. mkdir would create parent directories in branch-1 if they did not exist. In branch-2 users have to explicitly send mkdir -p 2. Create a multi level dir in branch 1 something like mkdir /test/1 /test would get permissions 755 and /test/1 would get the permissions based on your umask settings however if you run the command in branch-2 mkdir -p /test/1 both /test and /test/1 will get the permissions based on your umask. These are significant changes that we should document. > Document backward incompatible changes between hadoop-1.x and 2.x > - > > Key: HDFS-3978 > URL: https://issues.apache.org/jira/browse/HDFS-3978 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.0.2-alpha >Reporter: Arpit Gupta > > The following incompatible changes were noticed between branch-1 and branch-2 > caused by HADOOP-8551 > 1. mkdir would create parent directories in branch-1 if they did not exist. > In branch-2 users have to explicitly send mkdir -p > 2. Create a multi level dir in branch 1 something like > mkdir /test/1 > /test would get permissions 755 and /test/1 would get the permissions based > on your umask settings > however if you run the command in branch-2 > mkdir -p /test/1 > both /test and /test/1 will get the permissions based on your umask. > These are significant changes that we should document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3978) Document backward incompatible changes between hadoop-1.x and 2.x
[ https://issues.apache.org/jira/browse/HDFS-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3978: -- Summary: Document backward incompatible changes between hadoop-1.x and 2.x (was: Document backward incompatible changes introduced by HADOOP-8551) > Document backward incompatible changes between hadoop-1.x and 2.x > - > > Key: HDFS-3978 > URL: https://issues.apache.org/jira/browse/HDFS-3978 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 2.0.2-alpha >Reporter: Arpit Gupta > > The following incompatible changes were noticed between branch-1 and branch-2 > caused by HADOOP-8551 > 1. mkdir would create parent directories in branch-1 if they did not exist. > In branch-2 users have to explicitly send mkdir -p > 2. Create a multi level dir in branch 1 something like > mkdir /test/1 > /test would get permissions 755 and /test/1 would get the permissions based > on your umask settings > however if you run the command in branch-2 > mkdir -p /test/1 > both /test and /test/1 will get the permissions based on your umask. > These are significant changes that we should document. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira