[jira] [Commented] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482158#comment-13482158 ] Hadoop QA commented on HDFS-4105: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550378/HDFS-4105.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3384//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3384//console This message is automatically generated. > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482131#comment-13482131 ] Jing Zhao commented on HDFS-4067: - testcase failure reported in HDFS-3948 before. Will run TestUnderReplicatedBlocks in loop later. > TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException > --- > > Key: HDFS-4067 > URL: https://issues.apache.org/jira/browse/HDFS-4067 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Jing Zhao > Labels: test-fail > Attachments: HDFS-4067.trunk.001.patch > > > After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see > the root cause of the failure is ReplicaAlreadyExistsException: > {noformat} > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 > already exists in state FINALIZED and thus cannot be created. > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:155) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482127#comment-13482127 ] Hadoop QA commented on HDFS-4067: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550126/HDFS-4067.trunk.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHDFS {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3383//console This message is automatically generated. > TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException > --- > > Key: HDFS-4067 > URL: https://issues.apache.org/jira/browse/HDFS-4067 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Jing Zhao > Labels: test-fail > Attachments: HDFS-4067.trunk.001.patch > > > After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see > the root cause of the failure is ReplicaAlreadyExistsException: > {noformat} > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 > already exists in state FINALIZED and thus cannot be created. > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:155) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482123#comment-13482123 ] Suresh Srinivas edited comment on HDFS-4097 at 10/23/12 5:31 AM: - The design document already covers admin operation for allowing snapshots and marking it as snapshottable. Here are the Admin CLIs for that: * Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }} * Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot }} User/Admin CLIs for creating, deleting and listing snapshots: * Creating a snapshot: {{hadoop dfs -createSnapshot }} * Deleting a snapshot: {{hadoop dfs –deleteSnapshot }} * Listing of the snapshots: {{hadoop dfs -listSnapshots }} Will try to get an updated document posted by tomorrow with this information. was (Author: sureshms): The design document already covers admin operation for allowing snapshots and marking it as snapshottable. Here are the Admin CLIs for that: * Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }} * Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot }} User/Admin CLIs for creating, deleting and listing snapshots: * Creating a snapshot: {{hadoop dfs -createSnapshot }} * Deleting a snapshot: {{hadoop dfs –deleteSnapshot }} * Listing of the snapshots: {{hadoop dfs -listSnapshots}} Will try to get an updated document posted by tomorrow with this information. > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482123#comment-13482123 ] Suresh Srinivas edited comment on HDFS-4097 at 10/23/12 5:30 AM: - The design document already covers admin operation for allowing snapshots and marking it as snapshottable. Here are the Admin CLIs for that: * Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }} * Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot }} User/Admin CLIs for creating, deleting and listing snapshots: * Creating a snapshot: {{hadoop dfs -createSnapshot }} * Deleting a snapshot: {{hadoop dfs –deleteSnapshot }} * Listing of the snapshots: {{hadoop dfs -listSnapshots}} Will try to get an updated document posted by tomorrow with this information. was (Author: sureshms): The design document already covers admin operation for allowing snapshots and marking it as snapshottable. Here are the Admin CLIs for that: * Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }} * Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot }} User/Admin CLIs for creating, deleting and listing snapshots: * Creating a snapshot: {{hadoop dfs -createSnapshot }} * Deleting a snapshot: {{hadoop dfs –deleteSnapshot }} * Listing of the snapshots: {{hadoop dfs -listSnapshot}} Will try to get an updated document posted by tomorrow with this information. > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory
[ https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-4084: -- Labels: needs-test (was: ) > provide CLI support for allow and disallow snapshot on a directory > -- > > Key: HDFS-4084 > URL: https://issues.apache.org/jira/browse/HDFS-4084 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node, tools >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4084.patch, HDFS-4084.patch > > > To provide CLI support to allow snapshot, disallow snapshot on a directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory
[ https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482125#comment-13482125 ] Suresh Srinivas commented on HDFS-4084: --- Comments: # When you are overriding, no need to add javadoc to the method if you plan on inheriting changes from the super class. So javadoc for DistributedFileSystem methods you have added can be deleted. # Why are you adding @VisibleForTesting for snapshot related methods in FSNamesystem. Also please proper javadoc to the methods. # FSNamesystem.java is unnecessarily importing SnapshotInfo? # NamenodeRpcServer.java remove comment "Client Protocol" for overridden protocol methods > provide CLI support for allow and disallow snapshot on a directory > -- > > Key: HDFS-4084 > URL: https://issues.apache.org/jira/browse/HDFS-4084 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node, tools >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4084.patch, HDFS-4084.patch > > > To provide CLI support to allow snapshot, disallow snapshot on a directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482123#comment-13482123 ] Suresh Srinivas commented on HDFS-4097: --- The design document already covers admin operation for allowing snapshots and marking it as snapshottable. Here are the Admin CLIs for that: * Making a directory snapshottable: {{hadoop dfsadmin -allowSnapshot }} * Making a directory not snapshottable: {{hadoop dfsadmin -disallowSnapshot }} User/Admin CLIs for creating, deleting and listing snapshots: * Creating a snapshot: {{hadoop dfs -createSnapshot }} * Deleting a snapshot: {{hadoop dfs –deleteSnapshot }} * Listing of the snapshots: {{hadoop dfs -listSnapshot}} Will try to get an updated document posted by tomorrow with this information. > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482119#comment-13482119 ] Aaron T. Myers commented on HDFS-4097: -- bq. Aaron, a user should be able create snapshots for the directories he is allowed to create snapshots on. Right, but the question is what directories can a user create snapshots on? Is it a super-user only operation? This is why I asked the following in HDFS-2802: {quote} One question regarding the user experience that I don't see described in the document: will creating a snapshot require super user privileges? Or can any user create a snapshot of a subdirectory? If the latter, what permissions are required to create a snapshot? What if the user doesn't have permissions on some files under the subtree of the snapshot target? Does this result in an incomplete snapshot? Or a completely failed snapshot? {quote} To which you responded: {quote} Will add usecases {quote} So, what's the answer to my question? > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482116#comment-13482116 ] Suresh Srinivas commented on HDFS-4097: --- Aaron, a user should be able create snapshots for the directories he is allowed to create snapshots on. Please see HDFS-4084, where admin related commands are being added. > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482115#comment-13482115 ] Aaron T. Myers commented on HDFS-4097: -- I'm a little skeptical that this should be part of the FileSystem class. Is it expected that creating a snapshot is something that will be done by ordinary users? In a comment in HDFS-2802, I recommended that we make creating and deleting a snapshot a super-user only operation. If that's the case, then I think we should move this functionality to the HdfsAdmin class. > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482111#comment-13482111 ] Suresh Srinivas commented on HDFS-4097: --- Comments: # FileSystem.java - "doesn't support listSnapshot" - change listSnapshot to listSnapshots # SnapshotInfo.java - this class cannot be private and evolving since it is exposed in FileSystem.java a class that has audience public. # SnapshotCommand.java #* there are couple of places your using "_" instead of "-" in registerCommand() #* Instead of printing "Snap Name" you could just print "Name" #* I will format the printing based on "ls" command out put. At least the snapshot name should be printed in the end (given its length could vary). What format are you printing date in? Can you post an example output? # javadoc {{@see ClientProtocol#deleteSnap(String snapshotName, String snapshotRoot)}} - method should be deleteSnapshot # When you are overriding, no need to add javadoc to the method if you plan on inheriting changes from the super class. So javadoc for DistributedFileSystem methods you have added can be deleted. # FSNamsystem.java #* Why are you adding @VisibleForTesting for snapshot related methods in FSNamesystem. Also please proper javadoc to the methods. #* Why no just return new SnapshotInfo[0] from #listSnapshots() > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482108#comment-13482108 ] Jing Zhao commented on HDFS-4106: - Failing testcases are related to HDFS-3616 (TestWebHdfsWithMultipleNameNodes) and HDFS-4067 (TestUnderReplicatedBlocks). > BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be > declared as volatile > -- > > Key: HDFS-4106 > URL: https://issues.apache.org/jira/browse/HDFS-4106 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-4106-trunk.001.patch > > > All these variables may be assigned/read by a testing thread (through > BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus > they should be declared as volatile to make sure the "happens-before" > consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-3616: --- Assignee: Jing Zhao > TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException > in DN shutdown > -- > > Key: HDFS-3616 > URL: https://issues.apache.org/jira/browse/HDFS-3616 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Jing Zhao > > I have seen this in precommit build #2743 > {noformat} > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) > at java.util.HashMap$EntryIterator.next(HashMap.java:834) > at java.util.HashMap$EntryIterator.next(HashMap.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) > at > org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3616) TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException in DN shutdown
[ https://issues.apache.org/jira/browse/HDFS-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482105#comment-13482105 ] Jing Zhao commented on HDFS-3616: - Also got this exception in HDFS-4106. Seems like the exception happens because a thread is iterating the hashmap bpSlices (FsVolumeImpl#shutdown) while another thread is remove entries from the same hashMap (FsVolumeImpl#shutdownBlockPool). A quick fix can be changing bpSlices from a HashMap to a ConcurrentHashMap. > TestWebHdfsWithMultipleNameNodes fails with ConcurrentModificationException > in DN shutdown > -- > > Key: HDFS-3616 > URL: https://issues.apache.org/jira/browse/HDFS-3616 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node >Affects Versions: 2.0.0-alpha, 3.0.0 >Reporter: Uma Maheswara Rao G > > I have seen this in precommit build #2743 > {noformat} > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) > at java.util.HashMap$EntryIterator.next(HashMap.java:834) > at java.util.HashMap$EntryIterator.next(HashMap.java:832) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.shutdown(FsVolumeImpl.java:209) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.shutdown(FsVolumeList.java:168) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.shutdown(FsDatasetImpl.java:1214) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:1105) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:1324) > at > org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1304) > at > org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.shutdownCluster(TestWebHdfsWithMultipleNameNodes.java:100) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-4062. --- Resolution: Fixed Hadoop Flags: Reviewed I committed the patch to branch-1. Thank you Jing for fixing this. > In branch-1, FSNameSystem#invalidateWorkForOneNode and > FSNameSystem#computeReplicationWorkForBlock should print logs outside of the > namesystem lock > --- > > Key: HDFS-4062 > URL: https://issues.apache.org/jira/browse/HDFS-4062 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-4062.b1.001.patch > > > Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode > and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long > log info level information outside of the namesystem lock. We create this > separate jira since the description and code is different for 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4062) In branch-1, FSNameSystem#invalidateWorkForOneNode and FSNameSystem#computeReplicationWorkForBlock should print logs outside of the namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482101#comment-13482101 ] Suresh Srinivas commented on HDFS-4062: --- test-patch is broken in branch-1. Tests not needed because this patch just moves the logs outside the lock. +1 for the patch. > In branch-1, FSNameSystem#invalidateWorkForOneNode and > FSNameSystem#computeReplicationWorkForBlock should print logs outside of the > namesystem lock > --- > > Key: HDFS-4062 > URL: https://issues.apache.org/jira/browse/HDFS-4062 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-4062.b1.001.patch > > > Similar to HDFS-4052 for trunk, both FSNameSystem#invalidateWorkForOneNode > and FSNameSystem#computeReplicationWorkForBlock in branch-1 should print long > log info level information outside of the namesystem lock. We create this > separate jira since the description and code is different for 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482098#comment-13482098 ] Hadoop QA commented on HDFS-4106: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550382/HDFS-4106-trunk.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3382//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3382//console This message is automatically generated. > BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be > declared as volatile > -- > > Key: HDFS-4106 > URL: https://issues.apache.org/jira/browse/HDFS-4106 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-4106-trunk.001.patch > > > All these variables may be assigned/read by a testing thread (through > BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus > they should be declared as volatile to make sure the "happens-before" > consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4107) Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
[ https://issues.apache.org/jira/browse/HDFS-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4107: - Status: Patch Available (was: Open) > Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction > - > > Key: HDFS-4107 > URL: https://issues.apache.org/jira/browse/HDFS-4107 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4107_20121022.patch > > > In the namenode code, there are many individual routines checking whether an > inode is null and whether it could be cast to > INodeFile/INodeFileUnderConstruction. Let's add utility methods for such > checks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4107) Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
[ https://issues.apache.org/jira/browse/HDFS-4107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4107: - Attachment: h4107_20121022.patch h4107_20121022.patch: adds INodeFile.valueOf(..) and INodeFileUnderConstruction.valueOf(..). > Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction > - > > Key: HDFS-4107 > URL: https://issues.apache.org/jira/browse/HDFS-4107 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4107_20121022.patch > > > In the namenode code, there are many individual routines checking whether an > inode is null and whether it could be cast to > INodeFile/INodeFileUnderConstruction. Let's add utility methods for such > checks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4107) Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction
Tsz Wo (Nicholas), SZE created HDFS-4107: Summary: Add utility methods to cast INode to INodeFile and INodeFileUnderConstruction Key: HDFS-4107 URL: https://issues.apache.org/jira/browse/HDFS-4107 Project: Hadoop HDFS Issue Type: Bug Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE In the namenode code, there are many individual routines checking whether an inode is null and whether it could be cast to INodeFile/INodeFileUnderConstruction. Let's add utility methods for such checks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2434) TestNameNodeMetrics.testCorruptBlock fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482076#comment-13482076 ] Suresh Srinivas commented on HDFS-2434: --- Did you run test in a loop to ensure it does not fail? > TestNameNodeMetrics.testCorruptBlock fails intermittently > - > > Key: HDFS-2434 > URL: https://issues.apache.org/jira/browse/HDFS-2434 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Uma Maheswara Rao G >Assignee: Jing Zhao > Labels: test-fail > Attachments: HDFS-2434.001.patch, HDFS-2434.002.patch, > HDFS-2434.trunk.003.patch, HDFS-2434.trunk.004.patch, > HDFS-2434.trunk.005.patch > > > java.lang.AssertionError: Bad value for metric CorruptBlocks expected:<1> but > was:<0> > at org.junit.Assert.fail(Assert.java:91) > at org.junit.Assert.failNotEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:126) > at org.junit.Assert.assertEquals(Assert.java:470) > at > org.apache.hadoop.test.MetricsAsserts.assertGauge(MetricsAsserts.java:185) > at > org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.__CLR3_0_2t8sh531i1k(TestNameNodeMetrics.java:175) > at > org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics.testCorruptBlock(TestNameNodeMetrics.java:164) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory
[ https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482058#comment-13482058 ] Hadoop QA commented on HDFS-4104: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550383/hdfs-4104.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3381//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3381//console This message is automatically generated. > dfs -test -d prints inappropriate error on nonexistent directory > > > Key: HDFS-4104 > URL: https://issues.apache.org/jira/browse/HDFS-4104 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-4104.txt > > > Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It > should not generate any output due to missing files. Alas, it prints an > error message when {{foo}} does not exist. > {code} > $ hdfs dfs -test -d foo; echo $? > test: `foo': No such file or directory > 1 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4045) SecondaryNameNode cannot read from QuorumJournal URI
[ https://issues.apache.org/jira/browse/HDFS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482022#comment-13482022 ] Hadoop QA commented on HDFS-4045: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550359/hdfs-4045.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3380//console This message is automatically generated. > SecondaryNameNode cannot read from QuorumJournal URI > > > Key: HDFS-4045 > URL: https://issues.apache.org/jira/browse/HDFS-4045 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0 >Reporter: Vinithra Varadharajan >Assignee: Andy Isaacson > Attachments: hdfs-4045.txt > > > If HDFS is set up in basic mode (non-HA) with QuorumJournal, and the > dfs.namenode.edits.dir is set to only the QuorumJournal URI and no local dir, > the SecondaryNameNode is unable to do a checkpoint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482013#comment-13482013 ] Tsz Wo (Nicholas), SZE commented on HDFS-3540: -- > After the default is changed, all existing tests except TestEditLogToleration > ... I should say "except for a few" instead since some Recovery Mode tests run with DFS_NAMENODE_EDITS_TOLERATION_LENGTH_KEY == -1. > Further improvement on recovery mode and edit log toleration in branch-1 > > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.2.0 >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 1.2.0 > > Attachments: h3540_20120925.patch, h3540_20120926.patch, > h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch > > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the > recovery mode feature in branch-1 is dramatically different from the recovery > mode in trunk since the edit log implementations in these two branch are > different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not > in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy > UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further > improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE resolved HDFS-3540. -- Resolution: Fixed Fix Version/s: 1.2.0 Hadoop Flags: Reviewed I have committed this. > Further improvement on recovery mode and edit log toleration in branch-1 > > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.2.0 >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Fix For: 1.2.0 > > Attachments: h3540_20120925.patch, h3540_20120926.patch, > h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch > > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the > recovery mode feature in branch-1 is dramatically different from the recovery > mode in trunk since the edit log implementations in these two branch are > different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not > in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy > UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further > improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481992#comment-13481992 ] Tsz Wo (Nicholas), SZE commented on HDFS-3540: -- After the default is changed, all existing tests except TestEditLogToleration run with DFS_NAMENODE_EDITS_TOLERATION_LENGTH_KEY==0. > Further improvement on recovery mode and edit log toleration in branch-1 > > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.2.0 >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h3540_20120925.patch, h3540_20120926.patch, > h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch > > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the > recovery mode feature in branch-1 is dramatically different from the recovery > mode in trunk since the edit log implementations in these two branch are > different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not > in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy > UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further > improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4106: Status: Patch Available (was: Open) > BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be > declared as volatile > -- > > Key: HDFS-4106 > URL: https://issues.apache.org/jira/browse/HDFS-4106 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-4106-trunk.001.patch > > > All these variables may be assigned/read by a testing thread (through > BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus > they should be declared as volatile to make sure the "happens-before" > consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
[ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4067: Status: Patch Available (was: Open) > TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException > --- > > Key: HDFS-4067 > URL: https://issues.apache.org/jira/browse/HDFS-4067 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Jing Zhao > Labels: test-fail > Attachments: HDFS-4067.trunk.001.patch > > > After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see > the root cause of the failure is ReplicaAlreadyExistsException: > {noformat} > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002 > already exists in state FINALIZED and thus cannot be created. > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:155) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1
[ https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481983#comment-13481983 ] Suresh Srinivas commented on HDFS-3540: --- +1 for the patch. Are there any tests for DFS_NAMENODE_EDITS_TOLERATION_LENGTH_KEY set to 0? > Further improvement on recovery mode and edit log toleration in branch-1 > > > Key: HDFS-3540 > URL: https://issues.apache.org/jira/browse/HDFS-3540 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 1.2.0 >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h3540_20120925.patch, h3540_20120926.patch, > h3540_20120927.patch, h3540_20121009.patch, HDFS-3540-b1.004.patch > > > *Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the > recovery mode feature in branch-1 is dramatically different from the recovery > mode in trunk since the edit log implementations in these two branch are > different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not > in trunk. > *Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy > UNCHECKED_REGION_LENGTH and to tolerate edit log corruption. > There are overlaps between these two features. We study potential further > improvement in this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory
[ https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4104: Status: Patch Available (was: Open) > dfs -test -d prints inappropriate error on nonexistent directory > > > Key: HDFS-4104 > URL: https://issues.apache.org/jira/browse/HDFS-4104 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-4104.txt > > > Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It > should not generate any output due to missing files. Alas, it prints an > error message when {{foo}} does not exist. > {code} > $ hdfs dfs -test -d foo; echo $? > test: `foo': No such file or directory > 1 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory
[ https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4104: Attachment: hdfs-4104.txt Remove code to print errors on file not found from {{dfs -test}}, for feature parity with /usr/bin/test. The existing test frameworks don't provide a way to verify that stderr is clean, so no tests updated. > dfs -test -d prints inappropriate error on nonexistent directory > > > Key: HDFS-4104 > URL: https://issues.apache.org/jira/browse/HDFS-4104 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > Attachments: hdfs-4104.txt > > > Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It > should not generate any output due to missing files. Alas, it prints an > error message when {{foo}} does not exist. > {code} > $ hdfs dfs -test -d foo; echo $? > test: `foo': No such file or directory > 1 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481978#comment-13481978 ] Hadoop QA commented on HDFS-4101: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550352/HDFS-4101-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3379//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3379//console This message is automatically generated. > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Assignee: Damien Hardy >Priority: Minor > Labels: newbie > Attachments: HDFS-4101-1.patch > > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory
[ https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4104: Summary: dfs -test -d prints inappropriate error on nonexistent directory (was: dfs -test -d prints inappropriate error on nonexistent directory, and -z should be -s) > dfs -test -d prints inappropriate error on nonexistent directory > > > Key: HDFS-4104 > URL: https://issues.apache.org/jira/browse/HDFS-4104 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > > Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It > should not generate any output due to missing files. Alas, it prints an > error message when {{foo}} does not exist. > {code} > $ hdfs dfs -test -d foo; echo $? > test: `foo': No such file or directory > 1 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
[ https://issues.apache.org/jira/browse/HDFS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4106: Attachment: HDFS-4106-trunk.001.patch > BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be > declared as volatile > -- > > Key: HDFS-4106 > URL: https://issues.apache.org/jira/browse/HDFS-4106 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Minor > Attachments: HDFS-4106-trunk.001.patch > > > All these variables may be assigned/read by a testing thread (through > BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus > they should be declared as volatile to make sure the "happens-before" > consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4106) BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile
Jing Zhao created HDFS-4106: --- Summary: BPServiceActor#lastHeartbeat, lastBlockReport and lastDeletedReport should be declared as volatile Key: HDFS-4106 URL: https://issues.apache.org/jira/browse/HDFS-4106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Minor Attachments: HDFS-4106-trunk.001.patch All these variables may be assigned/read by a testing thread (through BPServiceActor#triggerXXX) while also assigned/read by the actor thread. Thus they should be declared as volatile to make sure the "happens-before" consistency. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4080) Add an option to disable block-level state change logging
[ https://issues.apache.org/jira/browse/HDFS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481970#comment-13481970 ] Tsz Wo (Nicholas), SZE commented on HDFS-4080: -- Let's put the block-level log in BlockManager state change log (a new log). Then, it can be set independently of the NN state change log. > Add an option to disable block-level state change logging > - > > Key: HDFS-4080 > URL: https://issues.apache.org/jira/browse/HDFS-4080 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Kihwal Lee > > Although the block-level logging in namenode is useful for debugging, it can > add a significant overhead to busy hdfs clusters since they are done while > the namespace write lock is held. One example is shown in HDFS-4075. In this > example, the write lock was held for 5 minutes while logging 11 million log > messages for 5.5 million block invalidation events. > It will be useful if we have an option to disable these block-level log > messages and keep other state change messages going. If others feel that > they can turned into DEBUG (with addition of isDebugEnabled() checks), that > may also work too, but there might be people depending on the messages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4105: -- Attachment: HDFS-4105.patch patch for trunk. > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4105: -- Status: Patch Available (was: Open) > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha, 1.1.0 >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch, HDFS-4105.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory, and -z should be -s
[ https://issues.apache.org/jira/browse/HDFS-4104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4104: Summary: dfs -test -d prints inappropriate error on nonexistent directory, and -z should be -s (was: dfs -test -d prints inappropriate error on nonexistent directory) > dfs -test -d prints inappropriate error on nonexistent directory, and -z > should be -s > - > > Key: HDFS-4104 > URL: https://issues.apache.org/jira/browse/HDFS-4104 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Andy Isaacson >Priority: Minor > > Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It > should not generate any output due to missing files. Alas, it prints an > error message when {{foo}} does not exist. > {code} > $ hdfs dfs -test -d foo; echo $? > test: `foo': No such file or directory > 1 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager
[ https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481942#comment-13481942 ] Kan Zhang commented on HDFS-4056: - {quote} bq. I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be enabled simultaneously {quote} My above comment should read "I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be CONFIGURED simultaneously." bq. In the absence of a new config key, the ambiguity introduced by SIMPLE effectively allows token-free operation. That's why I suggest earlier there should be 2 config keys, one for initial auth method and one for subsequent one. And if we have those configs, we can avoid the unnecessary issuing of NN tokens when SIMPLE + SIMPLE is configured. > Always start the NN's SecretManager > --- > > Key: HDFS-4056 > URL: https://issues.apache.org/jira/browse/HDFS-4056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-4056.patch > > > To support the ability to use tokens regardless of whether kerberos is > enabled, the NN's secret manager should always be started. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
[ https://issues.apache.org/jira/browse/HDFS-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-4105: -- Attachment: HDFS-4105.branch-1.patch patch for branch-1 > the SPNEGO user for secondary namenode should use the web keytab > > > Key: HDFS-4105 > URL: https://issues.apache.org/jira/browse/HDFS-4105 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.1.0, 2.0.2-alpha >Reporter: Arpit Gupta >Assignee: Arpit Gupta > Attachments: HDFS-4105.branch-1.patch > > > This is similar to HDFS-3466 where we made sure the namenode checks for the > web keytab before it uses the namenode keytab. > The same needs to be done for secondary namenode as well. > {code} > String httpKeytab = > conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); > if (httpKeytab != null && !httpKeytab.isEmpty()) { > params.put("kerberos.keytab", httpKeytab); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4105) the SPNEGO user for secondary namenode should use the web keytab
Arpit Gupta created HDFS-4105: - Summary: the SPNEGO user for secondary namenode should use the web keytab Key: HDFS-4105 URL: https://issues.apache.org/jira/browse/HDFS-4105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha, 1.1.0 Reporter: Arpit Gupta Assignee: Arpit Gupta This is similar to HDFS-3466 where we made sure the namenode checks for the web keytab before it uses the namenode keytab. The same needs to be done for secondary namenode as well. {code} String httpKeytab = conf.get(DFSConfigKeys.DFS_SECONDARY_NAMENODE_KEYTAB_FILE_KEY); if (httpKeytab != null && !httpKeytab.isEmpty()) { params.put("kerberos.keytab", httpKeytab); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager
[ https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481913#comment-13481913 ] Daryn Sharp commented on HDFS-4056: --- bq. {quote}The NN does not allow a UGI auth of token to issue, renew, or cancel tokens.{quote} bq. Since only connections authenticated using the initial auth method(s) are allowed to fetch tokens (I assume we keep that behavior) [...] Yes, that behavior has not changed. bq. [...] the server needs to be able to make a determination on whether a connection is authenticated as an initial connection or a subsequent one. I completely understand the point you are trying to make here. With a secure cluster, a task (subsequent connection) must use DIGEST-MD5 with a token, else it will fail because it lacks a TGT for KERBEROS. The distinction between initial and subsequent connection is unambiguous based on KERBEROS/DIGEST-MD5. That distinction will hold true for /DIGEST-MD5. bq. I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be enabled simultaneously SIMPLE is a special case where it's ambiguous if its an initial or subsequent connection. The server has no way to know, so it's up to the client to "do the right thing". This is where a conf setting, that the job submitter adds, would instruct the RPC client to only use tokens which would enforce SIMPLE + TOKEN. bq. it is desirable to be able to turn off any token related stuff (we can do that today) In the absence of a new config key, the ambiguity introduced by SIMPLE effectively allows token-free operation. > Always start the NN's SecretManager > --- > > Key: HDFS-4056 > URL: https://issues.apache.org/jira/browse/HDFS-4056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-4056.patch > > > To support the ability to use tokens regardless of whether kerberos is > enabled, the NN's secret manager should always be started. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4042) send Cache-Control header on JSP pages
[ https://issues.apache.org/jira/browse/HDFS-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur reassigned HDFS-4042: Assignee: Alejandro Abdelnur > send Cache-Control header on JSP pages > -- > > Key: HDFS-4042 > URL: https://issues.apache.org/jira/browse/HDFS-4042 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, name-node >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Assignee: Alejandro Abdelnur >Priority: Minor > > We should send a Cache-Control header on JSP pages so that HTTP/1.1 compliant > caches can properly manage cached data. > Currently our JSPs send: > {noformat} > % curl -v http://nn1:50070/dfshealth.jsp > ... > < HTTP/1.1 200 OK > < Content-Type: text/html; charset=utf-8 > < Expires: Thu, 01-Jan-1970 00:00:00 GMT > < Set-Cookie: JSESSIONID=xtblchjm7o7j1y1f33r0mpmqp;Path=/ > < Content-Length: 3651 > < Server: Jetty(6.1.26) > {noformat} > Based on a quick reading of RFC 2616 > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html I think we want to > send {{Cache-Control: private, no-cache}} but I could be wrong. The Jetty > docs http://docs.codehaus.org/display/JETTY/LastModifiedCacheControl indicate > this is fairly straightforward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4042) send Cache-Control header on JSP pages
[ https://issues.apache.org/jira/browse/HDFS-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481909#comment-13481909 ] Alejandro Abdelnur commented on HDFS-4042: -- IMO all HTTP responses by Hadoop (web UI, JSON, WebHDFS, HFTP, etc) should set headers to disable caching because all these resources are dynamic by natur. On WebHDFS and HFTP specifically, it could be argued that file contents could be cached, but I'd say that proxies will most likely ignore caching those resources due their size. Not to mentioned security implications, like permissions. > send Cache-Control header on JSP pages > -- > > Key: HDFS-4042 > URL: https://issues.apache.org/jira/browse/HDFS-4042 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, name-node >Affects Versions: 2.0.2-alpha >Reporter: Andy Isaacson >Priority: Minor > > We should send a Cache-Control header on JSP pages so that HTTP/1.1 compliant > caches can properly manage cached data. > Currently our JSPs send: > {noformat} > % curl -v http://nn1:50070/dfshealth.jsp > ... > < HTTP/1.1 200 OK > < Content-Type: text/html; charset=utf-8 > < Expires: Thu, 01-Jan-1970 00:00:00 GMT > < Set-Cookie: JSESSIONID=xtblchjm7o7j1y1f33r0mpmqp;Path=/ > < Content-Length: 3651 > < Server: Jetty(6.1.26) > {noformat} > Based on a quick reading of RFC 2616 > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html I think we want to > send {{Cache-Control: private, no-cache}} but I could be wrong. The Jetty > docs http://docs.codehaus.org/display/JETTY/LastModifiedCacheControl indicate > this is fairly straightforward. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4084) provide CLI support for allow and disallow snapshot on a directory
[ https://issues.apache.org/jira/browse/HDFS-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-4084: - Attachment: HDFS-4084.patch Re-based the patch and addressed Arpit's comment. > provide CLI support for allow and disallow snapshot on a directory > -- > > Key: HDFS-4084 > URL: https://issues.apache.org/jira/browse/HDFS-4084 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node, tools >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Attachments: HDFS-4084.patch, HDFS-4084.patch > > > To provide CLI support to allow snapshot, disallow snapshot on a directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481906#comment-13481906 ] Tsz Wo (Nicholas), SZE commented on HDFS-2802: -- {quote} > I don't think that a solution which copies all of the files/directories > that are being snapshotted should be merged to trunk. I disagree. But we may just end up doing optimization prior to merge. {quote} I understand that O(1) snapshot creation is desirable but I don't think it is a requirement. Currently, the snapshot feature is missing in HDFS. Having snapshot with O(N) snapshot creation already opens the door of opportunity to many applications. Only the applications requiring O(1) snapshot creation should wait for the improvement but not everyone has to wait for it. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4097) provide CLI support for create/delete/list snapshots
[ https://issues.apache.org/jira/browse/HDFS-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-4097: - Attachment: HDFS-4097.patch Thanks. New patch is uploaded. > provide CLI support for create/delete/list snapshots > > > Key: HDFS-4097 > URL: https://issues.apache.org/jira/browse/HDFS-4097 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs client, name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Attachments: HDFS-4097.patch, HDFS-4097.patch > > > provide CLI support for create/delete/list snapshots -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481853#comment-13481853 ] Suresh Srinivas commented on HDFS-2802: --- bq. Great, I'm glad you agree. I have agreed many times :-) [here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13480547&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13480547] and [here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13481632&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13481632]. I am happy to get beyond that issue :-) Thanks for the offer to help. I think for now we are on a path to get it done very soon. bq. To be completely clear, if it makes sense to implement a copying solution as an interim development stage Sounds good. I misunderstood you to be saying opposite of that [here|https://issues.apache.org/jira/browse/HDFS-2802?focusedCommentId=13481682&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13481682] bq. I don't think that a solution which copies all of the files/directories that are being snapshotted should be merged to trunk. I disagree. But we may just end up doing optimization prior to merge. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4045) SecondaryNameNode cannot read from QuorumJournal URI
[ https://issues.apache.org/jira/browse/HDFS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4045: Status: Patch Available (was: Open) It took a fair bit of work to generalize the getRemoteEdits and streaming code, but this implementation seems to work OK and passes my initial sniff test. I'm skeptical that streaming the edits from the JournalNode through the NameNode to the SNN is a robust solution both in error-resilience and performance, but I think it's time to post a rough cut and see how it looks. > SecondaryNameNode cannot read from QuorumJournal URI > > > Key: HDFS-4045 > URL: https://issues.apache.org/jira/browse/HDFS-4045 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0 >Reporter: Vinithra Varadharajan >Assignee: Andy Isaacson > Attachments: hdfs-4045.txt > > > If HDFS is set up in basic mode (non-HA) with QuorumJournal, and the > dfs.namenode.edits.dir is set to only the QuorumJournal URI and no local dir, > the SecondaryNameNode is unable to do a checkpoint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4045) SecondaryNameNode cannot read from QuorumJournal URI
[ https://issues.apache.org/jira/browse/HDFS-4045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Isaacson updated HDFS-4045: Attachment: hdfs-4045.txt > SecondaryNameNode cannot read from QuorumJournal URI > > > Key: HDFS-4045 > URL: https://issues.apache.org/jira/browse/HDFS-4045 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0 >Reporter: Vinithra Varadharajan >Assignee: Andy Isaacson > Attachments: hdfs-4045.txt > > > If HDFS is set up in basic mode (non-HA) with QuorumJournal, and the > dfs.namenode.edits.dir is set to only the QuorumJournal URI and no local dir, > the SecondaryNameNode is unable to do a checkpoint. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4104) dfs -test -d prints inappropriate error on nonexistent directory
Andy Isaacson created HDFS-4104: --- Summary: dfs -test -d prints inappropriate error on nonexistent directory Key: HDFS-4104 URL: https://issues.apache.org/jira/browse/HDFS-4104 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Andy Isaacson Assignee: Andy Isaacson Priority: Minor Running {{hdfs dfs -test -d foo}} should return 0 or 1 as appropriate. It should not generate any output due to missing files. Alas, it prints an error message when {{foo}} does not exist. {code} $ hdfs dfs -test -d foo; echo $? test: `foo': No such file or directory 1 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reassigned HDFS-4101: Assignee: Damien Hardy > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Assignee: Damien Hardy >Priority: Minor > Labels: newbie > Attachments: HDFS-4101-1.patch > > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481840#comment-13481840 ] Aaron T. Myers commented on HDFS-2802: -- bq. That is the goal. Great, I'm glad you agree. I'd be happy to personally work on a more time/space efficient snapshot solution. bq. Where I disagree is with the assertion that we cannot get to it starting with a simple implementation. I would like to get the simple implementation done ASAP with all the tests developed in parallel. Then we will change the implementation for better efficiency both in time and space. Tests will help ensure the correctness of optimizations. I suspect you'll find that implementing the copying solution will not be much easier than a zero-copy/COW solution, and so seems to me like implementing the copying solution will result in a lot of wasted development/testing work. To be completely clear, if it makes sense to implement a copying solution as an interim development stage, then that's fine, but I don't think that a solution which copies all of the files/directories that are being snapshotted should be merged to trunk. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3649) Port HDFS-385 to branch-1-win
[ https://issues.apache.org/jira/browse/HDFS-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-3649: -- Fix Version/s: 1-win > Port HDFS-385 to branch-1-win > - > > Key: HDFS-3649 > URL: https://issues.apache.org/jira/browse/HDFS-3649 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1-win >Reporter: Sumadhur Reddy Bolli >Assignee: Sumadhur Reddy Bolli > Fix For: 1-win > > > Added patch to HDF-385 to port the existing pluggable placement policy to > branch-1-win -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Hardy updated HDFS-4101: --- Release Note: Make ZK connection retries also available at startup. Status: Patch Available (was: Open) > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Priority: Minor > Labels: newbie > Attachments: HDFS-4101-1.patch > > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Hardy updated HDFS-4101: --- Attachment: HDFS-4101-1.patch First proposition for patch * use existing code implementing retries * existing hardcode limit as NUM_RETRIES = 3 > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Priority: Minor > Labels: newbie > Attachments: HDFS-4101-1.patch > > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3990) NN's health report has severe performance problems
[ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481804#comment-13481804 ] Hadoop QA commented on HDFS-3990: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550340/HDFS-3990.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/3378//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3378//console This message is automatically generated. > NN's health report has severe performance problems > -- > > Key: HDFS-3990 > URL: https://issues.apache.org/jira/browse/HDFS-3990 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.patch, > HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, > HDFS-3990.patch, HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt > > > The dfshealth page will place a read lock on the namespace while it does a > dns lookup for every DN. On a multi-thousand node cluster, this often > results in 10s+ load time for the health page. 10 concurrent requests were > found to cause 7m+ load times during which time write operations blocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager
[ https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481799#comment-13481799 ] Kan Zhang commented on HDFS-4056: - {quote} bq. Bottom line is the server should always be able to figure out by itself whether a connection is an initial connection or a subsequent one, based on the auth method (and type of credentials) used, since it needs to decide on whether tokens can be issued for that connection. The server already uses the auth the client sends in the rpc connection header to determine the sasl method the client wants to use. The auth to the server then determines the UGI's auth. The NN does not allow a UGI auth of token to issue, renew, or cancel tokens. {quote} I don't think you get my point. It was a general comment. Since only connections authenticated using the initial auth method(s) are allowed to fetch tokens (I assume we keep that behavior), the server needs to be able to make a determination on whether a connection is authenticated as an initial connection or a subsequent one. For example, if we were to support SIMPLE + TOKEN and SIMPLE + SIMPLE simultaneously (I think not), how could the server decide a connection authenticated with SIMPLE to be an initial connection or not? bq. If we want to allow compatibility with older clients, then both SIMPLE + SIMPLE and SIMPLE + TOKEN must both be supported. Enabling the option of SIMPLE + TOKEN means we need the secret manager enabled which is the aim of this patch. I don't see a use case where SIMPLE + SIMPLE and SIMPLE + TOKEN need to be enabled simultaneously. Can you elaborate? On the other hand, in the SIMPLE + SIMPLE use case I explained above, it is desirable to be able to turn off any token related stuff (we can do that today). > Always start the NN's SecretManager > --- > > Key: HDFS-4056 > URL: https://issues.apache.org/jira/browse/HDFS-4056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-4056.patch > > > To support the ability to use tokens regardless of whether kerberos is > enabled, the NN's secret manager should always be started. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4091) Add snapshot quota to limit the number of snapshots
[ https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4091: - Description: Once a directory has been set to snapshottable, users could create snapshots of the directories. In this JIRA, we add a quota to snapshottable directories. The quota is set by admin in order to limit the number of snapshots allowed. (was: For each snapshottable directory, add a quota to limit the number of snapshots of the directory.) > Add snapshot quota to limit the number of snapshots > --- > > Key: HDFS-4091 > URL: https://issues.apache.org/jira/browse/HDFS-4091 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4091_20121021.patch > > > Once a directory has been set to snapshottable, users could create snapshots > of the directories. In this JIRA, we add a quota to snapshottable > directories. The quota is set by admin in order to limit the number of > snapshots allowed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4091) Add snapshot quota to limit the number of snapshots
[ https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481787#comment-13481787 ] Tsz Wo (Nicholas), SZE commented on HDFS-4091: -- Sure, updated the description. > Add snapshot quota to limit the number of snapshots > --- > > Key: HDFS-4091 > URL: https://issues.apache.org/jira/browse/HDFS-4091 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4091_20121021.patch > > > Once a directory has been set to snapshottable, users could create snapshots > of the directories. In this JIRA, we add a quota to snapshottable > directories. The quota is set by admin in order to limit the number of > snapshots allowed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4091) Add snapshot quota to limit the number of snapshots
[ https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481765#comment-13481765 ] Suresh Srinivas commented on HDFS-4091: --- Can you please add more details to the description? > Add snapshot quota to limit the number of snapshots > --- > > Key: HDFS-4091 > URL: https://issues.apache.org/jira/browse/HDFS-4091 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4091_20121021.patch > > > For each snapshottable directory, add a quota to limit the number of > snapshots of the directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4091) Add snapshot quota to limit the number of snapshots
[ https://issues.apache.org/jira/browse/HDFS-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-4091: -- Description: For each snapshottable directory, add a quota to limit the number of snapshots of the directory. (was: For each snapshottable directory, add a quote to limit the number of snapshots of the directory.) > Add snapshot quota to limit the number of snapshots > --- > > Key: HDFS-4091 > URL: https://issues.apache.org/jira/browse/HDFS-4091 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Attachments: h4091_20121021.patch > > > For each snapshottable directory, add a quota to limit the number of > snapshots of the directory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481758#comment-13481758 ] Suresh Srinivas commented on HDFS-2802: --- bq. This sort of fundamental design decision is not something that can be easily improved incrementally. Copying huge portions of the working set, and then making that copying fast and space efficient, should not be the goal. The goal should be to entirely avoid copying huge portions of the working set. That is the goal. Where I disagree is with the assertion that we cannot get to it starting with a simple implementation. I would like to get the simple implementation done ASAP with all the tests developed in parallel. Then we will change the implementation for better efficiency both in time and space. Tests will help ensure the correctness of optimizations. We already have a bunch of ideas on how to do it. Some of those ideas are in the early prototype that Hari has uploaded. HDFS-4103 has been created to allay these concerns that have been repeatedly brought up to convey that it is goal of HDFS-2802 to have optimized snapshots. Will update the design document as a part of that jira. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-4103) Support O(1) snapshot creation
[ https://issues.apache.org/jira/browse/HDFS-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE reassigned HDFS-4103: Assignee: Tsz Wo (Nicholas), SZE > Support O(1) snapshot creation > -- > > Key: HDFS-4103 > URL: https://issues.apache.org/jira/browse/HDFS-4103 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > > In our first snapshot implementation, snapshot creation runs in O(N) and > occupies O(N) memory space, where N = # files + # directories + # symlinks in > the snapshot. The advantages of the implementation are that there is no > additional cost for the modifications after snapshots are created, and it > leads to a simple implementation. > In this JIRA, we optimize snapshot creation to O(1) although it introduces > additional cost in the modifications after snapshots are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4103) Support O(1) snapshot creation
Tsz Wo (Nicholas), SZE created HDFS-4103: Summary: Support O(1) snapshot creation Key: HDFS-4103 URL: https://issues.apache.org/jira/browse/HDFS-4103 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE In our first snapshot implementation, snapshot creation runs in O(N) and occupies O(N) memory space, where N = # files + # directories + # symlinks in the snapshot. The advantages of the implementation are that there is no additional cost for the modifications after snapshots are created, and it leads to a simple implementation. In this JIRA, we optimize snapshot creation to O(1) although it introduces additional cost in the modifications after snapshots are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481747#comment-13481747 ] Tsz Wo (Nicholas), SZE commented on HDFS-2802: -- > ... but it is certainly possible to have an O(1) solution in terms of the > number of files/directories that are not modified. ... I think the meaning of O(1) above is a very special use of the big-O notation. I would rather keep saying - O(N), where N = # files + # directories + # symlinks; and - O(M), where M = # modified files + # modified directories + # modified symlinks. I agree that O(M) implementation (i.e. include O(1) snapshot creation) is our end goal. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3990) NN's health report has severe performance problems
[ https://issues.apache.org/jira/browse/HDFS-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-3990: -- Attachment: HDFS-3990.patch No longer init {{peerHostName}} to the DN's registration hostname. Check for null when building list of node names to filter. I again looked into removing the null check on {{Server.getRemoteAddress}}. The tests that call directly into the rpc server object, rather than via a connection, appear to be passing mock dn registrations. So the majority of functional tests are matching real cluster behavior. I tried having the rpc server set the ip/peerHostName but some of the tests are verifying the layout and version checks work. So I tried to push those down into the {{FSNamesystem#registerDatanode}} but that method isn't exposed for the tests to call. If this patch is ok, I'll update the 23 patch. > NN's health report has severe performance problems > -- > > Key: HDFS-3990 > URL: https://issues.apache.org/jira/browse/HDFS-3990 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-3990.branch-0.23.patch, HDFS-3990.patch, > HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, HDFS-3990.patch, > HDFS-3990.patch, HDFS-3990.patch, hdfs-3990.txt, hdfs-3990.txt > > > The dfshealth page will place a read lock on the namespace while it does a > dns lookup for every DN. On a multi-thousand node cluster, this often > results in 10s+ load time for the health page. 10 concurrent requests were > found to cause 7m+ load times during which time write operations blocked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481682#comment-13481682 ] Aaron T. Myers commented on HDFS-2802: -- bq. Why is starting with simple implementation and then optimizing it later not a choice? This sort of fundamental design decision is not something that can be easily improved incrementally. Copying huge portions of the working set, and then making that copying fast and space efficient, should not be the goal. The goal should be to entirely avoid copying huge portions of the working set. bq. O(1) memory usage in general does not not seem possible since the original files/directories could be modified. So the best case is O(N) memory usage in general. However, it is possible to have O(1) memory usage at snapshot creation. I agree that it's not possible to have an O(1) solution in terms of the number of files/directories that are modified, but it is certainly possible to have an O(1) solution in terms of the number of files/directories that are _not_ modified. That's the issue I'm concerned about, and as far as I can tell is not what is proposed by this design document. bq. For small subtrees, i.e. when N is small, it does not matter if it is O(1) or O(N). Such snapshot feature already benefits many applications. So we are going to implement O(N) snapshot creation in the first phase and then optimization it later. What are the use cases for taking snapshots of small subtrees? An implementation that is suitable only for small subtrees is probably impractical to snapshot an HBase root directory, a Hive warehouse, or most /user directories that I'm aware of. You'd also presumably want to keep at least a handful (10s?) of snapshots available, so any small subtree that could be snapshotted must be multiplied by ~10 to consider its snapshot size. Note that the design document also explicitly states that it should be possible to take a snapshot of the root of the file system. bq. Then, we could have the snapshot feature out early instead of spending a long time to come up a complicated design and implementation. A complicated design also increases the risk of bugs in the implementation. I'm all for a simple design, but the design must also meet the stated requirements. The design document states that it should be possible to create a snapshot of the root of the file system, but I don't think the proposed design can do such a thing. Suresh had previously said that "May be the design document is fairly early and might have misled you. That is not the goal. The goal is to have efficient implementation." If we're on the same page that the end goal is an O(1) implementation, in terms of the number of files that are not modified between snapshots, in this branch then we can move on. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481669#comment-13481669 ] Hari Mankude commented on HDFS-2802: Nicholas is right in that we do start off with O(1) memory usage. But depending on writes and updates on the base filesystem, memory usage for snapshot will increase. The worst case is when an application updates all the files in the snapshotted subtree. Even in this scenario, the snap inodes are minimized versions of the actual file inode and retain only the relevant information for snapshots. Additionally (in the prototype), if multiple snapshots are taken of the same subtree, then significant optimizations are done to reduce the memory footprint by representing more than one snapshot in a single snapINode. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4087) Protocol changes for listSnapshots functionality
[ https://issues.apache.org/jira/browse/HDFS-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481667#comment-13481667 ] Brandon Li commented on HDFS-4087: -- Thanks Aaron! I will update the javadoc along with the next patch. > Protocol changes for listSnapshots functionality > > > Key: HDFS-4087 > URL: https://issues.apache.org/jira/browse/HDFS-4087 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: HDFS-4087.patch, HDFS-4087.patch, HDFS-4087.patch, > HDFS-4087.patch > > > SnapInfo saves information about a snapshot. This jira also updates the java > protocol classes and translation for listSnapshot operation. > Given a snapshot root, the snapshots create under it can be listed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481643#comment-13481643 ] Tsz Wo (Nicholas), SZE commented on HDFS-2802: -- O(1) memory usage in general does not not seem possible since the original files/directories could be modified. So the best case is O(N) memory usage in general. However, it is possible to have O(1) memory usage at snapshot creation. For small subtrees, i.e. when N is small, it does not matter if it is O(1) or O(N). Such snapshot feature already benefits many applications. So we are going to implement O(N) snapshot creation in the first phase and then optimization it later. Then, we could have the snapshot feature out early instead of spending a long time to come up a complicated design and implementation. A complicated design also increases the risk of bugs in the implementation. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481632#comment-13481632 ] Suresh Srinivas commented on HDFS-2802: --- bq. Instead of implementing an inefficient design and then optimizing it, we should come up with and implement an efficient design. Why is starting with simple implementation and then optimizing it later not a choice? > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4078) Handle replication in snapshots
[ https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481622#comment-13481622 ] Aaron T. Myers commented on HDFS-4078: -- Ah, didn't notice that label, and wasn't aware of it's use. Usually we just comment that tests will be added in a subsequent JIRA and link to said JIRA. > Handle replication in snapshots > --- > > Key: HDFS-4078 > URL: https://issues.apache.org/jira/browse/HDFS-4078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: h4078_20121021.patch > > > Without snapshots, file replication is the same as block replication. > With snapshot, the file replication R_o of the original file and the file > replication R_s of the snapshot file could possibly be different. Since the > blocks are shared between the original file and the snapshot file, block > replication is max(R_o, R_s). If there are more than one snapshots, block > replication is the max file replication of the original file and all snapshot > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4078) Handle replication in snapshots
[ https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481620#comment-13481620 ] Suresh Srinivas commented on HDFS-4078: --- bq. I would really have preferred to see some automated tests included with this change, as an incorrect implementation of this could result in data loss from snapshots. At least, a follow-up JIRA should be filed to add some automated tests for this functionality. Aaron, please see the label "needs-test" all these jiras will need subsequent tests. > Handle replication in snapshots > --- > > Key: HDFS-4078 > URL: https://issues.apache.org/jira/browse/HDFS-4078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: h4078_20121021.patch > > > Without snapshots, file replication is the same as block replication. > With snapshot, the file replication R_o of the original file and the file > replication R_s of the snapshot file could possibly be different. Since the > blocks are shared between the original file and the snapshot file, block > replication is max(R_o, R_s). If there are more than one snapshots, block > replication is the max file replication of the original file and all snapshot > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++
[ https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481617#comment-13481617 ] Suresh Srinivas commented on HDFS-4046: --- @Binglin are you planning to contribute the c library based on this work to Apache Hadoop? > ChecksumTypeProto use NULL as enum value which is illegal in C/C++ > -- > > Key: HDFS-4046 > URL: https://issues.apache.org/jira/browse/HDFS-4046 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, > HDFS-4046-ChecksumType-NULL.patch > > > I tried to write a native hdfs client using protobuf based protocol, when I > generate c++ code using hdfs.proto, the generated file can not compile, > because NULL is an already defined macro. > I am thinking two solutions: > 1. refactor all DataChecksum.Type.NULL references to NONE, which should be > fine for all languages, but this may breaking compatibility. > 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use > enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto > and DataChecksum.Type, and make sure enum integer values are match(currently > already match). > I can make a patch for solution 2. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++
[ https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481615#comment-13481615 ] Suresh Srinivas commented on HDFS-4046: --- bq. It shouldn't as long as the enum ordering does not change. Kihwal, my question was meant to say the change is backward compatible :-) I would suggest adding the enum name or prefix based on the context at the beginning like say STATUS_SUCCESS etc. > ChecksumTypeProto use NULL as enum value which is illegal in C/C++ > -- > > Key: HDFS-4046 > URL: https://issues.apache.org/jira/browse/HDFS-4046 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, > HDFS-4046-ChecksumType-NULL.patch > > > I tried to write a native hdfs client using protobuf based protocol, when I > generate c++ code using hdfs.proto, the generated file can not compile, > because NULL is an already defined macro. > I am thinking two solutions: > 1. refactor all DataChecksum.Type.NULL references to NONE, which should be > fine for all languages, but this may breaking compatibility. > 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use > enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto > and DataChecksum.Type, and make sure enum integer values are match(currently > already match). > I can make a patch for solution 2. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2802: - Tags: (was: 1) Release Note: (was: 1) > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-4102) Test replication with snapshots
Tsz Wo (Nicholas), SZE created HDFS-4102: Summary: Test replication with snapshots Key: HDFS-4102 URL: https://issues.apache.org/jira/browse/HDFS-4102 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Replication becomes more tricky with snapshots since the original file and snapshot files could possibly be set to different replication while the blocks are share among those files. This JIRA is to add more tests for replication with snapshots. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4078) Handle replication in snapshots
[ https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481610#comment-13481610 ] Aaron T. Myers commented on HDFS-4078: -- Sounds good. Thanks, Nicholas. > Handle replication in snapshots > --- > > Key: HDFS-4078 > URL: https://issues.apache.org/jira/browse/HDFS-4078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: h4078_20121021.patch > > > Without snapshots, file replication is the same as block replication. > With snapshot, the file replication R_o of the original file and the file > replication R_s of the snapshot file could possibly be different. Since the > blocks are shared between the original file and the snapshot file, block > replication is max(R_o, R_s). If there are more than one snapshots, block > replication is the max file replication of the original file and all snapshot > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4078) Handle replication in snapshots
[ https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481605#comment-13481605 ] Tsz Wo (Nicholas), SZE commented on HDFS-4078: -- Hi Aaron, we are going to add tests in some future JIRAs. Let me create one for replication now. > Handle replication in snapshots > --- > > Key: HDFS-4078 > URL: https://issues.apache.org/jira/browse/HDFS-4078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: h4078_20121021.patch > > > Without snapshots, file replication is the same as block replication. > With snapshot, the file replication R_o of the original file and the file > replication R_s of the snapshot file could possibly be different. Since the > blocks are shared between the original file and the snapshot file, block > replication is max(R_o, R_s). If there are more than one snapshots, block > replication is the max file replication of the original file and all snapshot > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++
[ https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481602#comment-13481602 ] Tsz Wo (Nicholas), SZE commented on HDFS-4046: -- Would NONE be a reserved word in some other languages? How about adding a prefix such as PB_NULL or HADOOP_NULL? Then the same prefix can be used for other values. > ChecksumTypeProto use NULL as enum value which is illegal in C/C++ > -- > > Key: HDFS-4046 > URL: https://issues.apache.org/jira/browse/HDFS-4046 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, > HDFS-4046-ChecksumType-NULL.patch > > > I tried to write a native hdfs client using protobuf based protocol, when I > generate c++ code using hdfs.proto, the generated file can not compile, > because NULL is an already defined macro. > I am thinking two solutions: > 1. refactor all DataChecksum.Type.NULL references to NONE, which should be > fine for all languages, but this may breaking compatibility. > 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use > enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto > and DataChecksum.Type, and make sure enum integer values are match(currently > already match). > I can make a patch for solution 2. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4099) Clean up replication code and add more javadoc
[ https://issues.apache.org/jira/browse/HDFS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481597#comment-13481597 ] Hudson commented on HDFS-4099: -- Integrated in Hadoop-trunk-Commit #2907 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/2907/]) HDFS-4099. Clean up replication code and add more javadoc. (Revision 1400986) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1400986 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java > Clean up replication code and add more javadoc > -- > > Key: HDFS-4099 > URL: https://issues.apache.org/jira/browse/HDFS-4099 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: 2.0.3-alpha > > Attachments: h4099_20121021.patch > > > - FSNamesystem.checkReplicationFactor(..) should be combined with > BlockManager.checkReplication(..). > - Add javadoc to the replication related method in BlockManager. > - Also clean up those methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4099) Clean up replication code and add more javadoc
[ https://issues.apache.org/jira/browse/HDFS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4099: - Resolution: Fixed Fix Version/s: 2.0.3-alpha Status: Resolved (was: Patch Available) I did not add new tests since the changes are simple. I have committed this. > Clean up replication code and add more javadoc > -- > > Key: HDFS-4099 > URL: https://issues.apache.org/jira/browse/HDFS-4099 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Fix For: 2.0.3-alpha > > Attachments: h4099_20121021.patch > > > - FSNamesystem.checkReplicationFactor(..) should be combined with > BlockManager.checkReplication(..). > - Add javadoc to the replication related method in BlockManager. > - Also clean up those methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4087) Protocol changes for listSnapshots functionality
[ https://issues.apache.org/jira/browse/HDFS-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481593#comment-13481593 ] Aaron T. Myers commented on HDFS-4087: -- The patch committed also includes a copy/pasted comment that isn't accurate/relevant: {code} +/** + * Interface that represents the over the wire information for a file. + */ +@InterfaceAudience.Private +@InterfaceStability.Evolving +public class SnapshotInfo { {code} > Protocol changes for listSnapshots functionality > > > Key: HDFS-4087 > URL: https://issues.apache.org/jira/browse/HDFS-4087 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: Snapshot (HDFS-2802) >Reporter: Brandon Li >Assignee: Brandon Li > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: HDFS-4087.patch, HDFS-4087.patch, HDFS-4087.patch, > HDFS-4087.patch > > > SnapInfo saves information about a snapshot. This jira also updates the java > protocol classes and translation for listSnapshot operation. > Given a snapshot root, the snapshots create under it can be listed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4075) Reduce recommissioning overhead
[ https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481587#comment-13481587 ] Ravi Prakash commented on HDFS-4075: Hi Kihwal! The log message is bq. LOG.info("Invalidated" + numOverReplicated + " over-replicated blocks on " + bq.srcNode + " during recommissioning"); which might mislead me to believe that the block invalidated was on srcNode, when it could be any one of the 4 nodes. Maybe something to the effect "Recommissioning of srcNode led to numOverReplicated over-replicated blocks to be invalidated"? Can you please also explain the change in DatanodeManager.java in this patch? node.isAlive will be updated only when the node heartbeats in. So when will blockManager.processOverReplicatedBlocksOnReCommission(node); be called? > Reduce recommissioning overhead > --- > > Key: HDFS-4075 > URL: https://issues.apache.org/jira/browse/HDFS-4075 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.4, 2.0.2-alpha >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: hdfs-4075.patch > > > When datanodes are recommissioned, > {BlockManager#processOverReplicatedBlocksOnReCommission()} is called for each > rejoined node and excess blocks are added to the invalidate list. The problem > is this is done while the namesystem write lock is held. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4099) Clean up replication code and add more javadoc
[ https://issues.apache.org/jira/browse/HDFS-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-4099: - Priority: Minor (was: Major) Hadoop Flags: Reviewed Thanks Suresh and Uma for reviewing the patch. > Clean up replication code and add more javadoc > -- > > Key: HDFS-4099 > URL: https://issues.apache.org/jira/browse/HDFS-4099 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE >Priority: Minor > Attachments: h4099_20121021.patch > > > - FSNamesystem.checkReplicationFactor(..) should be combined with > BlockManager.checkReplication(..). > - Add javadoc to the replication related method in BlockManager. > - Also clean up those methods. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2802) Support for RW/RO snapshots in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481571#comment-13481571 ] Aaron T. Myers commented on HDFS-2802: -- bq. Aaron, by O(N) where N=# of files + # of directories, I guess you mean O(N) snapshot creation time and O(N) memory usage at snapshot creation. Snapshot creation can be optimized by lazy INode creation. No INode is created at snapshot creation time. Only the INode modified after snapshot will be created. Then it becomes O(1) snapshot creation time and O(1) memory usage at snapshot creation. The design does not exclude this optimization. On page 7 the design document says the following: {quote} The memory usage is linear to the number of INode snapped because it copies all INodes when a snapshot is created. {quote} Then later on page 7 the design document goes on to discuss how this might be optimized by supporting offline snapshots on disk (i.e. get the snapshot metadata out of the NN's heap), and performing the copying of all the INodes in parallel using several threads. This is what I am referring to. I fully support a design which implements O(1) snapshot creation time and O(1) memory usage, but the current proposed design does not describe such a thing. Instead of implementing an inefficient design and then optimizing it, we should come up with and implement an efficient design. > Support for RW/RO snapshots in HDFS > --- > > Key: HDFS-2802 > URL: https://issues.apache.org/jira/browse/HDFS-2802 > Project: Hadoop HDFS > Issue Type: New Feature > Components: data-node, name-node >Reporter: Hari Mankude >Assignee: Hari Mankude > Attachments: snap.patch, snapshot-one-pager.pdf, Snapshots20121018.pdf > > > Snapshots are point in time images of parts of the filesystem or the entire > filesystem. Snapshots can be a read-only or a read-write point in time copy > of the filesystem. There are several use cases for snapshots in HDFS. I will > post a detailed write-up soon with with more information. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481565#comment-13481565 ] Steve Loughran commented on HDFS-4101: -- the good news is: it'll be easy to test > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Priority: Minor > Labels: newbie > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4078) Handle replication in snapshots
[ https://issues.apache.org/jira/browse/HDFS-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481562#comment-13481562 ] Aaron T. Myers commented on HDFS-4078: -- I would really have preferred to see some automated tests included with this change, as an incorrect implementation of this could result in data loss from snapshots. At least, a follow-up JIRA should be filed to add some automated tests for this functionality. In general I've seen a bunch of the commits to the HDFS-2802 branch include no automated tests whatsoever. I realize that it's sometimes difficult to write automated tests at the beginning of a big project, before all of the scaffolding is in place, but I hope that there's a plan to have comprehensive test coverage of this work. > Handle replication in snapshots > --- > > Key: HDFS-4078 > URL: https://issues.apache.org/jira/browse/HDFS-4078 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Reporter: Tsz Wo (Nicholas), SZE >Assignee: Tsz Wo (Nicholas), SZE > Labels: needs-test > Fix For: Snapshot (HDFS-2802) > > Attachments: h4078_20121021.patch > > > Without snapshots, file replication is the same as block replication. > With snapshot, the file replication R_o of the original file and the file > replication R_s of the snapshot file could possibly be different. Since the > blocks are shared between the original file and the snapshot file, block > replication is max(R_o, R_s). If there are more than one snapshots, block > replication is the max file replication of the original file and all snapshot > files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4090) getFileChecksum() result incompatible when called against zero-byte files.
[ https://issues.apache.org/jira/browse/HDFS-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481545#comment-13481545 ] Ravi Prakash commented on HDFS-4090: The only nit (and I don't care if you don't fix it), is that the comments could have taken 2 lines instead of 3. Also, when I removed the src/main part of the patch, the test fails with a NPE because zeroChecksum == null. Maybe check for null? But I'm not going to be a stickler for that either. Please feel free to check it in. +1 lgtm > getFileChecksum() result incompatible when called against zero-byte files. > -- > > Key: HDFS-4090 > URL: https://issues.apache.org/jira/browse/HDFS-4090 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs client >Affects Versions: 0.23.4, 2.0.2-alpha >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Attachments: hdfs-4090.patch > > > When getFileChecksum() is called against a zero-byte file, the branch-1 > client returns MD5MD5CRC32FileChecksum with crcPerBlock=0, bytePerCrc=0 and > md5=70bc8f4b72a86921468bf8e8441dce51, whereas a null is returned in trunk. > The null makes sense since there is no actual block checksums, but this > breaks the compatibility when doing distCp and calling getFileChecksum() via > webhdfs or hftp. > This JIRA is to make the client to return the same 'magic' value that the > branch-1 and earlier clients return. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4030) BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount should be AtomicLongs
[ https://issues.apache.org/jira/browse/HDFS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481510#comment-13481510 ] Kihwal Lee commented on HDFS-4030: -- +1 (nb) The patch looks good to me. > BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount should > be AtomicLongs > -- > > Key: HDFS-4030 > URL: https://issues.apache.org/jira/browse/HDFS-4030 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node >Affects Versions: 2.0.0-alpha >Reporter: Eli Collins >Assignee: Eli Collins > Attachments: hdfs-4030.txt > > > The BlockManager excessBlocksCount and postponedMisreplicatedBlocksCount > fields are currently volatile longs which are incremented, which isn't thread > safe. It looks like they're always incremented on paths that hold the NN > write lock but it would be easier and less error prone for future changes if > we made them AtomicLongs. The other volatile long members are just set in one > thread and read in another so they're fine as is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4056) Always start the NN's SecretManager
[ https://issues.apache.org/jira/browse/HDFS-4056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481474#comment-13481474 ] Daryn Sharp commented on HDFS-4056: --- bq. What combinations of initial and subsequent auth modes are we going to support? The current RPC client/server behavior is: * Insecure: ** SIMPLE: accept ** DIGEST-MD5: (secret manager enabled) accept ** DIGEST-MD5: (secret manager disabled) downgrade client to SIMPLE ** KERBEROS: downgrade client to SIMPLE * Secure: ** SIMPLE: reject ** DIGEST-MD5: (secret manager enabled) accept ** DIGEST-MD5: (secret manager disabled) reject ** KERBEROS: accept So today an insecure cluster is SIMPLE + SIMPLE, a secure cluster is KERBEROS + TOKEN. This patch enables SIMPLE + TOKEN by activating the secret manager, but still supports SIMPLE + SIMPLE. bq. Bottom line is the server should always be able to figure out by itself whether a connection is an initial connection or a subsequent one, based on the auth method (and type of credentials) used, since it needs to decide on whether tokens can be issued for that connection. The server already uses the auth the client sends in the rpc connection header to determine the sasl method the client wants to use. The auth to the server then determines the UGI's auth. The NN does not allow a UGI auth of token to issue, renew, or cancel tokens. bq. if we are going to support SIMPLE + SIMPLE then we shouldn't always start NN's SecretManager. If we want to allow compatibility with older clients, then both SIMPLE + SIMPLE and SIMPLE + TOKEN must both be supported. Enabling the option of SIMPLE + TOKEN means we need the secret manager enabled which is the aim of this patch. > Always start the NN's SecretManager > --- > > Key: HDFS-4056 > URL: https://issues.apache.org/jira/browse/HDFS-4056 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node >Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-4056.patch > > > To support the ability to use tokens regardless of whether kerberos is > enabled, the NN's secret manager should always be started. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3553) Hftp proxy tokens are broken
[ https://issues.apache.org/jira/browse/HDFS-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481432#comment-13481432 ] Daryn Sharp commented on HDFS-3553: --- My apologies, I overlooked my previous statement about branch-1 before closing. It does appear that proxy tokens cannot be used with hftp, unless that proxy token was obtained via hdfs. HDFS-3509 is attempting to fix proxy tokens for webhdfs. The two filesystems ultimately call into the same low level methods. The branch-1 patch here should apply if that jira makes the appropriate changes. > Hftp proxy tokens are broken > > > Key: HDFS-3553 > URL: https://issues.apache.org/jira/browse/HDFS-3553 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 1.0.2, 2.0.0-alpha, 3.0.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Blocker > Fix For: 0.23.4, 3.0.0, 2.0.3-alpha > > Attachments: HDFS-3553-1.branch-1.0.patch, > HDFS-3553-2.branch-1.0.patch, HDFS-3553-3.branch-1.0.patch, > HDFS-3553.branch-1.0.patch, HDFS-3553.branch-23.patch, HDFS-3553.trunk.patch > > > Proxy tokens are broken for hftp. The impact is systems using proxy tokens, > such as oozie jobs, cannot use hftp. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4046) ChecksumTypeProto use NULL as enum value which is illegal in C/C++
[ https://issues.apache.org/jira/browse/HDFS-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481431#comment-13481431 ] Kihwal Lee commented on HDFS-4046: -- > Surech: You mean renaming it to NONE from the existing name NULL? Why does it > break compatibility? It shouldn't as long as the enum ordering does not change. > Binglin: And there is a class named ChecksumNull may also need change given > that NULL is renamed to NONE? This is not critical, but I think it is okay to update it for the consistency. > ChecksumTypeProto use NULL as enum value which is illegal in C/C++ > -- > > Key: HDFS-4046 > URL: https://issues.apache.org/jira/browse/HDFS-4046 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Binglin Chang >Assignee: Binglin Chang >Priority: Minor > Attachments: HDFS-4046-ChecksumType-NULL-and-TestAuditLogs-bug.patch, > HDFS-4046-ChecksumType-NULL.patch > > > I tried to write a native hdfs client using protobuf based protocol, when I > generate c++ code using hdfs.proto, the generated file can not compile, > because NULL is an already defined macro. > I am thinking two solutions: > 1. refactor all DataChecksum.Type.NULL references to NONE, which should be > fine for all languages, but this may breaking compatibility. > 2. only change protobuf definition ChecksumTypeProto.NULL to NONE, and use > enum integer value(DataChecksum.Type.id) to convert between ChecksumTypeProto > and DataChecksum.Type, and make sure enum integer values are match(currently > already match). > I can make a patch for solution 2. > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4080) Add an option to disable block-level state change logging
[ https://issues.apache.org/jira/browse/HDFS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481390#comment-13481390 ] Kihwal Lee commented on HDFS-4080: -- Many are being logged while the lock is held by an object far up in the call tree, so it is not as simple as HDFS-4052. The ultimate solution may be finer granule locking. > Add an option to disable block-level state change logging > - > > Key: HDFS-4080 > URL: https://issues.apache.org/jira/browse/HDFS-4080 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 3.0.0, 2.0.3-alpha >Reporter: Kihwal Lee > > Although the block-level logging in namenode is useful for debugging, it can > add a significant overhead to busy hdfs clusters since they are done while > the namespace write lock is held. One example is shown in HDFS-4075. In this > example, the write lock was held for 5 minutes while logging 11 million log > messages for 5.5 million block invalidation events. > It will be useful if we have an option to disable these block-level log > messages and keep other state change messages going. If others feel that > they can turned into DEBUG (with addition of isDebugEnabled() checks), that > may also work too, but there might be people depending on the messages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to ZooKeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-4101: -- Labels: newbie (was: ) This would be a good ticket for anyone who's interested in getting started contributing to the HA code! Thanks for filing, Damien. > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > ZooKeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Priority: Minor > Labels: newbie > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3704) In the DFSClient, Add the node to the dead list when the ipc.Client calls fails
[ https://issues.apache.org/jira/browse/HDFS-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481316#comment-13481316 ] Yanbo Liang commented on HDFS-3704: --- If the ipc.Client calls to DataNode fails, maybe it was caused by temporary failure and will recovery later. If we treat these DataNodes as failure nodes corresponding to this DFSInputStream, we can not use replicas on these kind of DataNode no longer even if they recover from failure at later time. Does it meet the requirement of other workflow? > In the DFSClient, Add the node to the dead list when the ipc.Client calls > fails > --- > > Key: HDFS-3704 > URL: https://issues.apache.org/jira/browse/HDFS-3704 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 1.0.3, 2.0.0-alpha >Reporter: nkeywal >Priority: Minor > > The DFSCLient maintains a list of dead node per input steam. When creating > this DFSInputStream, it may connect to one of the nodes to check final block > size. If this call fail, this datanode should be put in the dead nodes list > to save time. If not it will be retried for the block transfer during the > read, and we're likely to get a timeout. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4101) ZKFC should implement zookeeper.recovery.retry like HBase to connect to zookeeper
[ https://issues.apache.org/jira/browse/HDFS-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Hardy updated HDFS-4101: --- Summary: ZKFC should implement zookeeper.recovery.retry like HBase to connect to zookeeper (was: ZKFC should implement zookeeper.recovery.retry like Hbase to connect to zookeeper) > ZKFC should implement zookeeper.recovery.retry like HBase to connect to > zookeeper > - > > Key: HDFS-4101 > URL: https://issues.apache.org/jira/browse/HDFS-4101 > Project: Hadoop HDFS > Issue Type: Improvement > Components: auto-failover, ha >Affects Versions: 2.0.0-alpha, 3.0.0 > Environment: running CDH4.1.1 >Reporter: Damien Hardy >Priority: Minor > > When zkfc start and zookeeper is not yet started ZKFC fails and stop directly. > Maybe ZKFC should allow some retries on Zookeeper services like does HBase > with zookeeper.recovery.retry > This particularly appends when I start my whole cluster on VirtualBox for > example (every components nearly at the same time) ZKFC is the only that fail > and stop ... > Every others can wait each-others some time independently of the start order > like NameNode/DataNode/JournalNode/Zookeeper/HBaseMaster/HBaseRS so that the > system can be set and stable in few seconds -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira