[jira] [Resolved] (HDFS-3117) clean cache and can't start hadoop
[ https://issues.apache.org/jira/browse/HDFS-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3117. -- Resolution: Invalid Hi cldoltd, please post your question to common-u...@hadoop.apache.org. JIRA is for tracking known issues with Hadoop. clean cache and can't start hadoop -- Key: HDFS-3117 URL: https://issues.apache.org/jira/browse/HDFS-3117 Project: Hadoop HDFS Issue Type: Task Reporter: cldoltd i use command cache 3 /proc/sys/vm/drop_caches to clean cache Now i can't start hadoop. thanks -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: HDFS-3004.020.patch Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Status: Open (was: Patch Available) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Status: Patch Available (was: Open) Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
[ https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sallou updated HDFS-3118: - Affects Version/s: 1.0.1 wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0, 1.0.1 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
[ https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sallou updated HDFS-3118: - Status: Patch Available (was: Open) --- src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java.orig 2012-03-20 09:54:33.0 +0100 +++ src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 2012-03-20 09:55:13.0 +0100 @@ -473,7 +473,7 @@ fsOwner = UserGroupInformation.getCurrentUser(); LOG.info(fsOwner= + fsOwner); -this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); +this.supergroup = conf.get(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY, supergroup); this.isPermissionEnabled = conf.getBoolean(dfs.permissions, true); LOG.info(supergroup= + supergroup); LOG.info(isPermissionEnabled= + isPermissionEnabled); --- src/test/org/apache/hadoop/mapred/TestMapredSystemDir.java.orig 2012-03-20 09:56:37.0 +0100 +++ src/test/org/apache/hadoop/mapred/TestMapredSystemDir.java 2012-03-20 09:58:14.0 +0100 @@ -30,6 +30,7 @@ import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.security.*; +import org.apache.hadoop.hdfs.DFSConfigKeys; /** * Test if JobTracker is resilient to garbage in mapred.system.dir. @@ -49,7 +50,7 @@ MiniMRCluster mr = null; try { // start dfs - conf.set(dfs.permissions.supergroup, supergroup); + conf.set(DFSConfigKeys.DFS_PERMISSIONS_SUPERUSERGROUP_KEY, supergroup); conf.set(mapred.system.dir, /mapred); dfs = new MiniDFSCluster(conf, 1, true, null); FileSystem fs = dfs.getFileSystem(); @@ -120,4 +121,4 @@ if (mr != null) { mr.shutdown();} } } -} \ No newline at end of file +} --- src/hdfs/hdfs-default.xml.orig 2012-03-20 10:00:53.0 +0100 +++ src/hdfs/hdfs-default.xml 2012-03-20 10:01:04.0 +0100 @@ -184,7 +184,7 @@ /property property - namedfs.permissions.supergroup/name + namedfs.permissions.superusergroup/name valuesupergroup/value descriptionThe name of the group of super-users./description /property --- src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml.orig 2012-03-20 10:02:01.0 +0100 +++ src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml 2012-03-20 10:02:14.0 +0100 @@ -227,7 +227,7 @@ only those things visible using other permissions. Additional groups may be added to the comma-separated list. /li - licodedfs.permissions.supergroup = supergroup/code + licodedfs.permissions.superusergroup = supergroup/code br /The name of the group of super-users. /li --- src/docs/cn/src/documentation/content/xdocs/hdfs_permissions_guide.xml.orig 2012-03-20 10:03:17.0 +0100 +++ src/docs/cn/src/documentation/content/xdocs/hdfs_permissions_guide.xml 2012-03-20 10:03:30.0 +0100 @@ -170,7 +170,7 @@ dd Web服务器使用的用户名。如果将这个参数设置为超级用户的名称,则所有Web客户就可以看到所有的信息。如果将这个参数设置为一个不使用的用户,则Web客户就只能访问到“other”权限可访问的资源了。额外的组可以加在后面,形成一个用逗号分隔的列表。 /dd - dtcodedfs.permissions.supergroup = supergroup/code/dt + dtcodedfs.permissions.superusergroup = supergroup/code/dt dd 超级用户的组名。 /dd wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.1, 1.0.0 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
[ https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sallou updated HDFS-3118: - Release Note: Use DFS_PERMISSIONS_SUPERUSERGROUP_KEY in code and update documentation Status: Patch Available (was: Open) wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.1, 1.0.0 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
[ https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sallou updated HDFS-3118: - Status: Open (was: Patch Available) wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.1, 1.0.0 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
[ https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sallou updated HDFS-3118: - Status: Patch Available (was: Open) file attached wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.1, 1.0.0 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor Attachments: supergroup.patch The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3118) wiki and hadoop templates provides wrong superusergroup property instead of supergroup
[ https://issues.apache.org/jira/browse/HDFS-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sallou updated HDFS-3118: - Attachment: supergroup.patch wiki and hadoop templates provides wrong superusergroup property instead of supergroup -- Key: HDFS-3118 URL: https://issues.apache.org/jira/browse/HDFS-3118 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 1.0.0, 1.0.1 Environment: Used Debian package install Reporter: Olivier Sallou Priority: Minor Attachments: supergroup.patch The hdfs-site template and the wiki: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html#The+Super-User refers to property dfs.permissions.superusergroup to define the group of superuser. However we must use the property dfs.permissions.supergroup, and not superusergroup, to make it work. In file src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java, supergroup is extracted from: this.supergroup = conf.get(dfs.permissions.supergroup, supergroup); It does not make use of DFS_PERMISSIONS_SUPERUSERGROUP_KEY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3119) Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file
Overreplicated block is not deleted even after the replication factor is reduced after sync follwed by closing that file Key: HDFS-3119 URL: https://issues.apache.org/jira/browse/HDFS-3119 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.24.0 Reporter: J.Andreina Priority: Minor Fix For: 0.24.0, 0.23.2 cluster setup: -- 1NN,2 DN,replication factor 2,block report interval 3sec ,block size-256MB step1: write a file filewrite.txt of size 90bytes with sync(not closed) step2: change the replication factor to 1 using the command: ./hdfs dfs -setrep 1 /filewrite.txt step3: close the file * At the NN side the file Decreasing replication from 2 to 1 for /filewrite.txt , logs has occured but the overreplicated blocks are not deleted even after the block report is sent from DN * while listing the file in the console using ./hdfs dfs -ls the replication factor for that file is mentioned as 1 * In fsck report for that files displays that the file is replicated to 2 datanodes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233524#comment-13233524 ] jirapos...@reviews.apache.org commented on HDFS-2834: - bq. On 2012-03-20 01:27:50, Todd Lipcon wrote: bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, line 44 bq. https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line44 bq. bq. shouldn't this be true? Oops, yes. Thankfully the test still passes when it's testing the right path... bq. On 2012-03-20 01:27:50, Todd Lipcon wrote: bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, lines 81-82 bq. https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line81 bq. bq. no reason to use DFSClient here. Instead you can just use the filesystem, right? Then downcast the stream you get back? Good point - no need even to downcast since FSDataInputStream has the API. bq. On 2012-03-20 01:27:50, Todd Lipcon wrote: bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java, line 104 bq. https://reviews.apache.org/r/4212/diff/2/?file=90213#file90213line104 bq. bq. don't you want an assert on sawException here? You can also use GenericTestUtils.assertExceptionContains() if you want to check the text of it Good catch. No particular need to assert the content of the exception - any checksum error is good enough here. bq. On 2012-03-20 01:27:50, Todd Lipcon wrote: bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 562-564 bq. https://reviews.apache.org/r/4212/diff/2/?file=90207#file90207line562 bq. bq. this comment seems like it's in the wrong spot, since the code that comes after it doesn't reference offsetFromChunkBoundary. I removed the comment, it's covered by the comment at line 549. - Henry --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/#review6103 --- On 2012-03-09 00:47:24, Henry Robinson wrote: bq. bq. --- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4212/ bq. --- bq. bq. (Updated 2012-03-09 00:47:24) bq. bq. bq. Review request for hadoop-hdfs and Todd Lipcon. bq. bq. bq. Summary bq. --- bq. bq. New patch for HDFS-2834 (I can't update the old review request). bq. bq. bq. This addresses bug HDFS-2834. bq. http://issues.apache.org/jira/browse/HDFS-2834 bq. bq. bq. Diffs bq. - bq. bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 bq. bq. Diff: https://reviews.apache.org/r/4212/diff bq. bq. bq. Testing bq. --- bq. bq. bq. Thanks, bq. bq. Henry bq. bq. ByteBuffer-based read API for DFSInputStream Key: HDFS-2834 URL: https://issues.apache.org/jira/browse/HDFS-2834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Henry Robinson Assignee: Henry Robinson Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}. Although for many clients this is desired behaviour, in certain situations, such as native-reads
[jira] [Updated] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated HDFS-2834: - Attachment: HDFS-2834.10.patch Review comments. ByteBuffer-based read API for DFSInputStream Key: HDFS-2834 URL: https://issues.apache.org/jira/browse/HDFS-2834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Henry Robinson Assignee: Henry Robinson Attachments: HDFS-2834-no-common.patch, HDFS-2834.10.patch, HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}. Although for many clients this is desired behaviour, in certain situations, such as native-reads through libhdfs, this imposes an extra copy penalty since the {{byte[]}} needs to be copied out again into a natively readable memory area. For these cases, it would be preferable to allow the client to supply its own buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233535#comment-13233535 ] jirapos...@reviews.apache.org commented on HDFS-2834: - --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4212/ --- (Updated 2012-03-20 16:29:56.616292) Review request for hadoop-hdfs and Todd Lipcon. Changes --- Review responses Summary --- New patch for HDFS-2834 (I can't update the old review request). This addresses bug HDFS-2834. http://issues.apache.org/jira/browse/HDFS-2834 Diffs (updated) - hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 4187f1c hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 71c8a50 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java PRE-CREATION hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java f4052bb Diff: https://reviews.apache.org/r/4212/diff Testing --- Thanks, Henry ByteBuffer-based read API for DFSInputStream Key: HDFS-2834 URL: https://issues.apache.org/jira/browse/HDFS-2834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Henry Robinson Assignee: Henry Robinson Attachments: HDFS-2834-no-common.patch, HDFS-2834.10.patch, HDFS-2834.3.patch, HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.9.patch, HDFS-2834.patch, HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}. Although for many clients this is desired behaviour, in certain situations, such as native-reads through libhdfs, this imposes an extra copy penalty since the {{byte[]}} needs to be copied out again into a natively readable memory area. For these cases, it would be preferable to allow the client to supply its own buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233550#comment-13233550 ] Mingjie Lai commented on HDFS-3083: --- Aaron. You're right for the root cause. The order of the configured namenodes does make a difference. Throwing StandbyException from SecreteManager is not perfect, but okay for me. Good job. HA+security: failed to run a mapred job from yarn after a manual failover - Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233573#comment-13233573 ] Aaron T. Myers commented on HDFS-3083: -- Thanks a lot, Mingjie. Did you perhaps get a chance to apply the patch and test out the fix? HA+security: failed to run a mapred job from yarn after a manual failover - Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-3050: -- Status: Patch Available (was: Open) refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233576#comment-13233576 ] Todd Lipcon commented on HDFS-3083: --- +1. This seems like the best way to fix this that I can think of as well. Can you run the HDFS tests locally to be sure before committing? HA+security: failed to run a mapred job from yarn after a manual failover - Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233578#comment-13233578 ] Eli Collins commented on HDFS-3050: --- Think you need to regenerate the diff, this one nukes LdapGroupsMappings and modifies CHANGES.txt. refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3044) fsck move should be non-destructive by default
[ https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233591#comment-13233591 ] Eli Collins commented on HDFS-3044: --- - Still needs a test the new behavior of fsck move, that it's not destructive (ie a test that covers move w/o delete, and asserts the source files are still there) - Nit, if we're going to name the flag doMove (vs eg salvageCorruptFiles) please add a comment by the declaration that doMove doesn't actually do a move anymore (since it no longer deletes its a copy now) Otherwise looks great! fsck move should be non-destructive by default -- Key: HDFS-3044 URL: https://issues.apache.org/jira/browse/HDFS-3044 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Eli Collins Assignee: Colin Patrick McCabe Attachments: HDFS-3044.002.patch The fsck move behavior in the code and originally articulated in HADOOP-101 is: {quote}Current failure modes for DFS involve blocks that are completely missing. The only way to fix them would be to recover chains of blocks and put them into lost+found{quote} A directory is created with the file name, the blocks that are accessible are created as individual files in this directory, then the original file is removed. I suspect the rationale for this behavior was that you can't use files that are missing locations, and copying the block as files at least makes part of the files accessible. However this behavior can also result in permanent dataloss. Eg: - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster startup, files with blocks where all replicas are on these set of datanodes are marked corrupt - Admin does fsck move, which deletes the corrupt files, saves whatever blocks were available - The HW issues with datanodes are resolved, they are started and join the cluster. The NN tells them to delete their blocks for the corrupt files since the file was deleted. I think we should: - Make fsck move non-destructive by default (eg just does a move into lost+found) - Make the destructive behavior optional (eg --destructive so admins think about what they're doing) - Provide better sanity checks and warnings, eg if you're running fsck and not all the slaves have checked in (if using dfs.hosts) then fsck should print a warning indicating this that an admin should have to override if they want to do something destructive -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233603#comment-13233603 ] Tsz Wo (Nicholas), SZE commented on HDFS-3107: -- Easy, Milind. :) I do agree with Suresh that (2) is not a very good reason to have truncate. I think such accidence is rare. However, you made a good point that having append without truncate is a deficiency. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233609#comment-13233609 ] Milind Bhandarkar commented on HDFS-3107: - I must have missed a smiley :-) Nicolas, After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends. This is where customers realize their mistake immediately after starting to append, and do a ctrl-c. This is very common. -- Milind Bhandarkar Chief Architect, Greenplum Labs, Data Computing Division, EMC +1-650-523-3858 (W) +1-408-666-8483 (C) HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3100: - Attachment: HDFS-3100.patch Regenerate the patch from the right directory. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3108) [UI] Few Namenode links are not working
[ https://issues.apache.org/jira/browse/HDFS-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233637#comment-13233637 ] Brahma Reddy Battula commented on HDFS-3108: Ya..Scenario-1 is same as HDFS-2025.. [UI] Few Namenode links are not working --- Key: HDFS-3108 URL: https://issues.apache.org/jira/browse/HDFS-3108 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0, 0.23.1 Reporter: Brahma Reddy Battula Priority: Minor Fix For: 0.23.3 Attachments: Scenario2_Trace.txt Scenario 1 == Once tail a file from UI and click on Go Back to File View,I am getting HTTP ERROR 404 Scenario 2 === Frequently I am getting following execption If a click on (BrowseFileSystem or anyfile)java.lang.IllegalArgumentException: java.net.UnknownHostException: HOST-10-18-40-24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
[ https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233651#comment-13233651 ] Eli Collins commented on HDFS-2617: --- Cool, sometime soon? Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution -- Key: HDFS-2617 URL: https://issues.apache.org/jira/browse/HDFS-2617 Project: Hadoop HDFS Issue Type: Improvement Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-2617-a.patch The current approach to secure and authenticate nn web services is based on Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now that we have one, we can get rid of the non-standard KSSL and use SPNEGO throughout. This will simplify setup and configuration. Also, Kerberized SSL is a non-standard approach with its own quirks and dark corners (HDFS-2386). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233661#comment-13233661 ] Aaron T. Myers commented on HDFS-3083: -- I just ran the full HDFS test suite, and they all passed. I'll commit this shortly based on Todd's +1. Thanks a lot for the review. HA+security: failed to run a mapred job from yarn after a manual failover - Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: HDFS-3050.007.patch rebase on latest trunk refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3083) Cannot run a MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3083: - Summary: Cannot run a MR job with HA and security enabled when second-listed NN active (was: HA+security: failed to run a mapred job from yarn after a manual failover) Cannot run a MR job with HA and security enabled when second-listed NN active - Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-3083: - Summary: Cannot run an MR job with HA and security enabled when second-listed NN active (was: Cannot run a MR job with HA and security enabled when second-listed NN active) Cannot run an MR job with HA and security enabled when second-listed NN active -- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233666#comment-13233666 ] Hadoop QA commented on HDFS-3050: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519106/HDFS-3050.007.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2041//console This message is automatically generated. refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers resolved HDFS-3083. -- Resolution: Fixed Hadoop Flags: Reviewed I've just committed this to trunk and branch-0.23. Thanks a lot for the reviews, Todd and Mingjie. Cannot run an MR job with HA and security enabled when second-listed NN active -- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233680#comment-13233680 ] Hadoop QA commented on HDFS-3100: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519100/HDFS-3100.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestCrcCorruption org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestCorruptFilesJsp org.apache.hadoop.hdfs.TestClientBlockVerification org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.TestDatanodeBlockScanner org.apache.hadoop.hdfs.TestDataTransferProtocol org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.server.namenode.TestListCorruptFileBlocks org.apache.hadoop.hdfs.TestDFSClientRetries +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2040//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2040//console This message is automatically generated. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: (was: HDFS-3050.008.patch) refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, HDFS-3050.008.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report
[ https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3086: - Attachment: h3086_20120320.patch h3086_20120320.patch: - remove the storages parameter from DatanodeProtocol.registerDatanode(..); - change storageID to DatanodeStorage in StorageBlockReport. Change Datanode not to send storage list in registration - it will be sent in block report -- Key: HDFS-3086 URL: https://issues.apache.org/jira/browse/HDFS-3086 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3086_20120320.patch When a datnode is registered, the datanode send also the storage lists. It is not useful since the storage list is already available in block reports. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report
[ https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-3086: - Status: Patch Available (was: Open) Change Datanode not to send storage list in registration - it will be sent in block report -- Key: HDFS-3086 URL: https://issues.apache.org/jira/browse/HDFS-3086 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3086_20120320.patch When a datnode is registered, the datanode send also the storage lists. It is not useful since the storage list is already available in block reports. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode
[ https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3050: --- Attachment: HDFS-3050.008.patch HDFS-3050.008.patch refactor OEV to share more code with the NameNode - Key: HDFS-3050 URL: https://issues.apache.org/jira/browse/HDFS-3050 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, HDFS-3050.008.patch Current, OEV (the offline edits viewer) re-implements all of the opcode parsing logic found in the NameNode. This duplicated code creates a maintenance burden for us. OEV should be refactored to simply use the normal EditLog parsing code, rather than rolling its own. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default
[ https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3044: --- Attachment: HDFS-3044.003.patch address eli's comments fsck move should be non-destructive by default -- Key: HDFS-3044 URL: https://issues.apache.org/jira/browse/HDFS-3044 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Eli Collins Assignee: Colin Patrick McCabe Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch The fsck move behavior in the code and originally articulated in HADOOP-101 is: {quote}Current failure modes for DFS involve blocks that are completely missing. The only way to fix them would be to recover chains of blocks and put them into lost+found{quote} A directory is created with the file name, the blocks that are accessible are created as individual files in this directory, then the original file is removed. I suspect the rationale for this behavior was that you can't use files that are missing locations, and copying the block as files at least makes part of the files accessible. However this behavior can also result in permanent dataloss. Eg: - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster startup, files with blocks where all replicas are on these set of datanodes are marked corrupt - Admin does fsck move, which deletes the corrupt files, saves whatever blocks were available - The HW issues with datanodes are resolved, they are started and join the cluster. The NN tells them to delete their blocks for the corrupt files since the file was deleted. I think we should: - Make fsck move non-destructive by default (eg just does a move into lost+found) - Make the destructive behavior optional (eg --destructive so admins think about what they're doing) - Provide better sanity checks and warnings, eg if you're running fsck and not all the slaves have checked in (if using dfs.hosts) then fsck should print a warning indicating this that an admin should have to override if they want to do something destructive -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3094: -- Attachment: HDFS-3094.branch-1.0.patch Attached an updated patch for branch 1.0 with comments addressed Here are the test patch results {code} BUILD SUCCESSFUL Total time: 7 minutes 22 seconds -1 overall. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 11 new Findbugs (version 1.3.9) warnings. {code} Findbugs warnings are unrelated to this patch add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3105) Add DatanodeStorage information to block recovery
[ https://issues.apache.org/jira/browse/HDFS-3105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233755#comment-13233755 ] Hudson commented on HDFS-3105: -- Integrated in Hadoop-Mapreduce-0.23-Build #231 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/231/]) svn merge -c 1302683 from trunk for HDFS-3105. (Revision 1302685) Result = FAILURE szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302685 Files : * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolServerSideTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/InterDatanodeProtocolTranslatorPB.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/FSDatasetInterface.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/DatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/InterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestInterDatanodeProtocol.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java Add DatanodeStorage information to block recovery - Key: HDFS-3105 URL: https://issues.apache.org/jira/browse/HDFS-3105 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node, hdfs client Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3105_20120315.patch, h3105_20120315b.patch, h3105_20120316.patch, h3105_20120316b.patch, h3105_20120319.patch When recovering a block, the namenode and client do not have the datanode storage information of the block. So namenode cannot add the block to the corresponding datanode storge block list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3094: -- Attachment: HDFS-3094.patch Attached patch for trunk with the comments addressed. add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233762#comment-13233762 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Mapreduce-0.23-Build #231 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/231/]) Merge HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302633) Result = FAILURE umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302633 Files : * /hadoop/common/branches/branch-0.23 * /hadoop/common/branches/branch-0.23/hadoop-common-project * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-auth * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/docs * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/core * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/native * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/datanode * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/secondary * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/test/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/bin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/conf * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/c++ * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/block_forensics * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build-contrib.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/build.xml * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/data_join * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/eclipse-plugin * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/index * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/contrib/vaidya * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/examples * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/fs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/hdfs * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/ipc * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/src/webapps/job * /hadoop/common/branches/branch-0.23/hadoop-project * /hadoop/common/branches/branch-0.23/hadoop-project/src/site Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233768#comment-13233768 ] Suresh Srinivas commented on HDFS-3107: --- bq. I must have missed a smiley Thats okay. You missed the smiley in the tweet too. bq. This is very common. I see I was not aware it was that common. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-3094: -- Status: Patch Available (was: Open) add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233769#comment-13233769 ] Arpit Gupta commented on HDFS-3094: --- bq. should be -nonInteractive (not a capital 'A') done bq. Rename getisForce to just isForce or isForceEnabled(). done Updated tests to not use a different thread and sleep bq. It looks like if you specify invalid options, it won't give any kind of useful error message. You should probably be throwing HadoopIllegalArgumentException instead of returning null in several of these cases. Left as is, as returning null causes usage to be printed which will show the correct format. add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233773#comment-13233773 ] Hudson commented on HDFS-3083: -- Integrated in Hadoop-Common-0.23-Commit #708 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/708/]) HDFS-3083. Cannot run an MR job with HA and security enabled when second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303099) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303099 Files : * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Cannot run an MR job with HA and security enabled when second-listed NN active -- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233787#comment-13233787 ] Hudson commented on HDFS-3083: -- Integrated in Hadoop-Common-trunk-Commit #1907 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1907/]) HDFS-3083. Cannot run an MR job with HA and security enabled when second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303098) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303098 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Cannot run an MR job with HA and security enabled when second-listed NN active -- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233791#comment-13233791 ] Milind Bhandarkar commented on HDFS-3107: - .bq Thats okay. You missed the smiley in the tweet too. I just copy-pasted, so it was expected :-) .bq I see I was not aware it was that common. Since appends were enabled very recently, only those with the facebook's version of hadoop, or hadoop 1.0 are users doing this now. Before this, users were creating multiple files. In any case, my interest in this feature is for implementing transactions over HDFS (as Lei and I have already discussed with Sanjay Radia and Hairong.) And aborting a transaction means truncating to the last known good data across multiple files. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233796#comment-13233796 ] Hudson commented on HDFS-3083: -- Integrated in Hadoop-Hdfs-0.23-Commit #699 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/699/]) HDFS-3083. Cannot run an MR job with HA and security enabled when second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303099) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303099 Files : * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java * /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/branches/branch-0.23/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Cannot run an MR job with HA and security enabled when second-listed NN active -- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3083) Cannot run an MR job with HA and security enabled when second-listed NN active
[ https://issues.apache.org/jira/browse/HDFS-3083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233803#comment-13233803 ] Hudson commented on HDFS-3083: -- Integrated in Hadoop-Hdfs-trunk-Commit #1981 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1981/]) HDFS-3083. Cannot run an MR job with HA and security enabled when second-listed NN active. Contributed by Aaron T. Myers. (Revision 1303098) Result = SUCCESS atm : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1303098 Files : * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/token/SecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Cannot run an MR job with HA and security enabled when second-listed NN active -- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Assignee: Aaron T. Myers Priority: Critical Fix For: 0.24.0, 0.23.3 Attachments: HDFS-3083-combined.patch Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3120) Provide ability to enable sync without append
Provide ability to enable sync without append - Key: HDFS-3120 URL: https://issues.apache.org/jira/browse/HDFS-3120 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.1 Reporter: Eli Collins Assignee: Eli Collins The work on branch-20-append was to support *sync*, for durable HBase WALs, not *append*. The branch-20-append implementation is known to be buggy. There's been confusion about this, we often answer queries on the list [like this|http://search-hadoop.com/m/wfed01VOIJ5]. Unfortunately, the way to enable correct sync on branch-1 for HBase is to set dfs.support.append to true in your config, which has the side effect of enabling append (which we don't want to do). Let's add a new *dfs.support.hsync* option that enables working sync (which is basically the current dfs.support.append flag modulo one place where it's not referring to sync). For compatibility, if dfs.support.append is set, dfs.support.sync will be set as well. This way someone can enable sync for HBase and still keep the current behavior that if dfs.support.append is not set then an append operation will result in an IOE indicating append is not supported. We should do this on trunk as well, as there's no reason to conflate hsync and append with a single config even if append works. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233816#comment-13233816 ] Eli Collins commented on HDFS-3107: --- bq. Since appends were enabled very recently, only those with the facebook's version of hadoop, or hadoop 1.0 are users doing this now. Append doesn't work on hadoop 1.0, see HDFS-3120. I'm actually going to start a discussion about removing append entirely on hdfs-dev@. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-3072) haadmin should have configurable timeouts for failover commands
[ https://issues.apache.org/jira/browse/HDFS-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HDFS-3072: - Assignee: Todd Lipcon haadmin should have configurable timeouts for failover commands --- Key: HDFS-3072 URL: https://issues.apache.org/jira/browse/HDFS-3072 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon The HAAdmin failover could should time out reasonably aggressively and go onto the fencing strategies if it's dealing with a mostly dead active namenode. Currently it uses what's probably the default, which is to say no timeout whatsoever. {code} /** * Return a proxy to the specified target service. */ protected HAServiceProtocol getProtocol(String serviceId) throws IOException { String serviceAddr = getServiceAddr(serviceId); InetSocketAddress addr = NetUtils.createSocketAddr(serviceAddr); return (HAServiceProtocol)RPC.getProxy( HAServiceProtocol.class, HAServiceProtocol.versionID, addr, getConf()); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3091) Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters.
[ https://issues.apache.org/jira/browse/HDFS-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233839#comment-13233839 ] Hudson commented on HDFS-3091: -- Integrated in Hadoop-Mapreduce-trunk #1025 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1025/]) HDFS-3091. Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. Contributed by Nicholas. (Revision 1302624) Result = SUCCESS umamahesh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1302624 Files : * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Update the usage limitations of ReplaceDatanodeOnFailure policy in the config description for the smaller clusters. --- Key: HDFS-3091 URL: https://issues.apache.org/jira/browse/HDFS-3091 Project: Hadoop HDFS Issue Type: Improvement Components: data-node, hdfs client, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.24.0, 0.23.3 Attachments: h3091_20120319.patch When verifying the HDFS-1606 feature, Observed couple of issues. Presently the ReplaceDatanodeOnFailure policy satisfies even though we dont have enough DN to replcae in cluster and will be resulted into write failure. {quote} 12/03/13 14:27:12 WARN hdfs.DFSClient: DataStreamer Exception java.io.IOException: Failed to add a datanode: nodes.length != original.length + 1, nodes=[xx.xx.xx.xx:50010], original=[xx.xx.xx.xx1:50010] at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:778) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:834) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:930) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:741) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:416) {quote} Lets take some cases: 1) Replication factor 3 and cluster size also 3 and unportunately pipeline drops to 1. ReplaceDatanodeOnFailure will be satisfied because *existings(1)= replication/2 (3/2==1)*. But when it finding the new node to replace obiously it can not find the new node and the sanity check will fail. This will be resulted to Wite failure. 2) Replication factor 10 (accidentally user sets the replication factor to higher value than cluster size), Cluser has only 5 datanodes. Here even if one node fails also write will fail with same reason. Because pipeline max will be 5 and killed one datanode, then existings will be 4 *existings(4)= replication/2(10/2==5)* will be satisfied and obiously it can not replace with the new node as there is no extra nodes exist in the cluster. This will be resulted to write failure. 3) sync realted opreations also fails in this situations ( will post the clear scenarios) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233846#comment-13233846 ] Milind Bhandarkar commented on HDFS-3107: - Yes, I am using the term append loosely, because of FB's 20-append branch. Our transaction work is done with 0.23.x. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233857#comment-13233857 ] Milind Bhandarkar commented on HDFS-3107: - Suresh, Nicolas, Eli; Any opinions about the proposed API and semantics ? HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233904#comment-13233904 ] Todd Lipcon commented on HDFS-3107: --- IMO adding truncate() adds a bunch of non-trivial complexity. It's not so much because truncating a block is that hard -- but rather because it breaks a serious invariant we have elsewhere that blocks only get longer after they are created. This means that we have to revisit code all over HDFS -- in particular some of the trickiest bits around block synchronization -- to get this to work. It's not insurmountable, but I would like to know a lot more about the use case before commenting on the API/semantics. Maybe you can open a JIRA or upload a design about your transactional HDFS feature, so we can understand the motivation better? Otherwise I'm more inclined to agree with Eli's suggestion to remove append entirely (please continue that discussion on-list, though). {quote} After appends were enabled in HDFS, we have seen a lot of cases where a lot of (mainly text, or even compressed text) datasets were merged using appends. This is where customers realize their mistake immediately after starting to append, and do a ctrl-c. {quote} I don't follow... we don't even expose append() via the shell. And if we did, would users actually be using fs -append to manually write new lines of data into their Hadoop systems?? HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: data-node, name-node Reporter: Lei Chang Attachments: HDFS_truncate_semantics_Mar15.pdf Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: HDFS-3004.022.patch * remove some unecessary whitespace changes * re-introduce EditLogInputException * edit log input stream: change API as we discussed. * FSEditLogLoader: re-organize this file. Fix some corner cases relating to out-of-order transaction IDs Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-3004: --- Attachment: HDFS-3004.023.patch * OpInstanceCache needs to be thread-local to work correctly * update exception text regex in TestFSEditLogLoader Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, HDFS-3004.023.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3121) test for HADOOP-8194 (quota using viewfs)
[ https://issues.apache.org/jira/browse/HDFS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-3121: -- Attachment: hdfs-3121.patch Attaching test for HADOOP-8194 test for HADOOP-8194 (quota using viewfs) - Key: HDFS-3121 URL: https://issues.apache.org/jira/browse/HDFS-3121 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Attachments: hdfs-3121.patch This JIRA is to write tests for viewing quota using viewfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3121) test for HADOOP-8194 (quota using viewfs)
[ https://issues.apache.org/jira/browse/HDFS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John George updated HDFS-3121: -- Status: Patch Available (was: Open) test for HADOOP-8194 (quota using viewfs) - Key: HDFS-3121 URL: https://issues.apache.org/jira/browse/HDFS-3121 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Attachments: hdfs-3121.patch This JIRA is to write tests for viewing quota using viewfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report
[ https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233963#comment-13233963 ] Hadoop QA commented on HDFS-3086: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519114/h3086_20120320b.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 18 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2050//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2050//console This message is automatically generated. Change Datanode not to send storage list in registration - it will be sent in block report -- Key: HDFS-3086 URL: https://issues.apache.org/jira/browse/HDFS-3086 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3086_20120320.patch, h3086_20120320b.patch When a datnode is registered, the datanode send also the storage lists. It is not useful since the storage list is already available in block reports. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3071) haadmin failover command does not provide enough detail for when target NN is not ready to be active
[ https://issues.apache.org/jira/browse/HDFS-3071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233977#comment-13233977 ] Aaron T. Myers commented on HDFS-3071: -- The patch looks like it will work to me, but I agree that we shouldn't concern ourselves yet with protocol compatibility of the HAServiceProtocol. As such, I think you should go ahead and revise the patch to have a more conventional API. haadmin failover command does not provide enough detail for when target NN is not ready to be active Key: HDFS-3071 URL: https://issues.apache.org/jira/browse/HDFS-3071 Project: Hadoop HDFS Issue Type: Improvement Components: ha Affects Versions: 0.24.0 Reporter: Philip Zeyliger Assignee: Todd Lipcon Attachments: hdfs-3071.txt When running the failover command, you can get an error message like the following: {quote} $ hdfs --config $(pwd) haadmin -failover namenode2 namenode1 Failover failed: xxx.yyy/1.2.3.4:8020 is not ready to become active {quote} Unfortunately, the error message doesn't describe why that node isn't ready to be active. In my case, the target namenode's logs don't indicate anything either. It turned out that the issue was Safe mode is ON.Resources are low on NN. Safe mode must be turned off manually., but ideally the user would be told that at the time of the failover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3094) add -nonInteractive and -force option to namenode -format command
[ https://issues.apache.org/jira/browse/HDFS-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234005#comment-13234005 ] Hadoop QA commented on HDFS-3094: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519125/HDFS-3094.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2053//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2053//console This message is automatically generated. add -nonInteractive and -force option to namenode -format command - Key: HDFS-3094 URL: https://issues.apache.org/jira/browse/HDFS-3094 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 0.24.0, 1.0.2 Reporter: Arpit Gupta Assignee: Arpit Gupta Attachments: HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.branch-1.0.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch, HDFS-3094.patch Currently the bin/hadoop namenode -format prompts the user for a Y/N to setup the directories in the local file system. -force : namenode formats the directories without prompting -nonInterActive : namenode format will return with an exit code of 1 if the dir exists. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-3100: - Attachment: HDFS-3100.patch Previous patch missed one condition and didn't send checksum to the client and thus real corruption couldn't be detected. This problem is fixed in the new patch. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, HDFS-3100.patch, HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234015#comment-13234015 ] Tsz Wo (Nicholas), SZE commented on HDFS-3100: -- Brandon, I think we could avoid the metaFileExists(..) and, as you mentioned, there is a race condition between two calls. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, HDFS-3100.patch, HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234062#comment-13234062 ] Hadoop QA commented on HDFS-3004: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519152/HDFS-3004.023.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2054//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2054//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2054//console This message is automatically generated. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, HDFS-3004.023.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3100) failed to append data using webhdfs
[ https://issues.apache.org/jira/browse/HDFS-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234079#comment-13234079 ] Hadoop QA commented on HDFS-3100: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519162/HDFS-3100.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 11 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed the unit tests build +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2056//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2056//console This message is automatically generated. failed to append data using webhdfs --- Key: HDFS-3100 URL: https://issues.apache.org/jira/browse/HDFS-3100 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.24.0, 0.23.1 Reporter: Zhanwei.Wang Assignee: Brandon Li Attachments: HDFS-3100.patch, HDFS-3100.patch, HDFS-3100.patch, hadoop-wangzw-datanode-ubuntu.log, hadoop-wangzw-namenode-ubuntu.log, test.sh, testAppend.patch STEP: 1, deploy a single node hdfs 0.23.1 cluster and configure hdfs as: A) enable webhdfs B) enable append C) disable permissions 2, start hdfs 3, run the test script as attached RESULT: expected: a file named testFile should be created and populated with 32K * 5000 zeros, HDFS should be OK. I got: script cannot be finished, file has been created but not be populated as expected, actually append operation failed. Datanode log shows that, blockscaner report a bad replica and nanenode decide to delete it. Since it is a single node cluster, append fail. It makes no sense that the script failed every time. Datanode and Namenode logs are attached. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3086) Change Datanode not to send storage list in registration - it will be sent in block report
[ https://issues.apache.org/jira/browse/HDFS-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234080#comment-13234080 ] Suresh Srinivas commented on HDFS-3086: --- Patch looks good. +1. Change Datanode not to send storage list in registration - it will be sent in block report -- Key: HDFS-3086 URL: https://issues.apache.org/jira/browse/HDFS-3086 Project: Hadoop HDFS Issue Type: Sub-task Components: data-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h3086_20120320.patch, h3086_20120320b.patch When a datnode is registered, the datanode send also the storage lists. It is not useful since the storage list is already available in block reports. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt. Key: HDFS-3122 URL: https://issues.apache.org/jira/browse/HDFS-3122 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical *Block Report* can *race* with *Block Recovery* with closeFile flag true. IF block report generated just befor recovery at DN side and due to N/W. This block report got delayed to NN. Recovery success and generation stamp has been changed to new one. primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also marked as complete, since the closeFile flag is true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Since the genartion stamps are mismatching, block is getting marked as corrupt. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234104#comment-13234104 ] Uma Maheswara Rao G commented on HDFS-3122: --- I reproduced this case with the debug points. 1) created a file and hsync'ed that file. 2) triggered on BR in separate thread and blocked this call in NN side just before aquiring the fsnamesystem lock. 3) triggered one recoverlease call from separate thread and completed the call. 4) after successfully completed #3 (after commitBlockSynchronization with new genstamp), started processing the blocked BR in #2. 5) since that old BR has older genstamp, that block is getting marked as corrupt. will attach the colored logs. Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt. Key: HDFS-3122 URL: https://issues.apache.org/jira/browse/HDFS-3122 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical *Block Report* can *race* with *Block Recovery* with closeFile flag true. IF block report generated just befor recovery at DN side and due to N/W. This block report got delayed to NN. Recovery success and generation stamp has been changed to new one. primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also marked as complete, since the closeFile flag is true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Since the genartion stamps are mismatching, block is getting marked as corrupt. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3121) test for HADOOP-8194 (quota using viewfs)
[ https://issues.apache.org/jira/browse/HDFS-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234119#comment-13234119 ] Hadoop QA commented on HDFS-3121: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519154/hdfs-3121.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.fs.viewfs.TestViewFsFileStatusHdfs +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2055//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2055//console This message is automatically generated. test for HADOOP-8194 (quota using viewfs) - Key: HDFS-3121 URL: https://issues.apache.org/jira/browse/HDFS-3121 Project: Hadoop HDFS Issue Type: Bug Reporter: John George Assignee: John George Attachments: hdfs-3121.patch This JIRA is to write tests for viewing quota using viewfs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-3122: -- Attachment: blockCorrupt.txt attched the grepped logs. Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt. Key: HDFS-3122 URL: https://issues.apache.org/jira/browse/HDFS-3122 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical Attachments: blockCorrupt.txt *Block Report* can *race* with *Block Recovery* with closeFile flag true. IF block report generated just befor recovery at DN side and due to N/W. This block report got delayed to NN. Recovery success and generation stamp has been changed to new one. primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also marked as complete, since the closeFile flag is true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Since the genartion stamps are mismatching, block is getting marked as corrupt. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3122) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt.
[ https://issues.apache.org/jira/browse/HDFS-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-3122: -- Description: *Block Report* can *race* with *Block Recovery* with closeFile flag true. Block report generated just before block recovery at DN side and due to N/W problems, block report got delayed to NN. After this, recovery success and generation stamp modifies to new one. And primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also block got marked as complete, since the closeFile flag was true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Finally block will be marked as corrupt because of genstamp mismatch. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} was: *Block Report* can *race* with *Block Recovery* with closeFile flag true. IF block report generated just befor recovery at DN side and due to N/W. This block report got delayed to NN. Recovery success and generation stamp has been changed to new one. primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also marked as complete, since the closeFile flag is true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Since the genartion stamps are mismatching, block is getting marked as corrupt. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} Target Version/s: 0.24.0, 0.23.3 (was: 0.23.3, 0.24.0) Block recovery with closeFile flag true can race with blockReport. Due to this blocks are getting marked as corrupt. Key: HDFS-3122 URL: https://issues.apache.org/jira/browse/HDFS-3122 Project: Hadoop HDFS Issue Type: Bug Components: data-node, name-node Affects Versions: 0.23.0, 0.24.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Critical Attachments: blockCorrupt.txt *Block Report* can *race* with *Block Recovery* with closeFile flag true. Block report generated just before block recovery at DN side and due to N/W problems, block report got delayed to NN. After this, recovery success and generation stamp modifies to new one. And primary DN invokes the commitBlockSynchronization and block got updated in NN side. Also block got marked as complete, since the closeFile flag was true. Updated with new genstamp. Now blockReport started processing at NN side. This particular block from RBW (when it generated the BR at DN), and file was completed at NN side. Finally block will be marked as corrupt because of genstamp mismatch. {code} case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } else if (storedBlock.getGenerationStamp() != iblk.getGenerationStamp()) { return new BlockToMarkCorrupt(storedBlock, reported + reportedState + replica with genstamp + iblk.getGenerationStamp() + does not match COMPLETE block's + genstamp in block map + storedBlock.getGenerationStamp()); } else { // COMPLETE block, same genstamp {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3004) Implement Recovery Mode
[ https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13234131#comment-13234131 ] Hadoop QA commented on HDFS-3004: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12519152/HDFS-3004.023.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 21 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade org.apache.hadoop.hdfs.TestPersistBlocks +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/2057//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/2057//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2057//console This message is automatically generated. Implement Recovery Mode --- Key: HDFS-3004 URL: https://issues.apache.org/jira/browse/HDFS-3004 Project: Hadoop HDFS Issue Type: New Feature Components: tools Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch, HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch, HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch, HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch, HDFS-3004.023.patch, HDFS-3004__namenode_recovery_tool.txt When the NameNode metadata is corrupt for some reason, we want to be able to fix it. Obviously, we would prefer never to get in this case. In a perfect world, we never would. However, bad data on disk can happen from time to time, because of hardware errors or misconfigurations. In the past we have had to correct it manually, which is time-consuming and which can result in downtime. Recovery mode is initialized by the system administrator. When the NameNode starts up in Recovery Mode, it will try to load the FSImage file, apply all the edits from the edits log, and then write out a new image. Then it will shut down. Unlike in the normal startup process, the recovery mode startup process will be interactive. When the NameNode finds something that is inconsistent, it will prompt the operator as to what it should do. The operator can also choose to take the first option for all prompts by starting up with the '-f' flag, or typing 'a' at one of the prompts. I have reused as much code as possible from the NameNode in this tool. Hopefully, the effort that was spent developing this will also make the NameNode editLog and image processing even more robust than it already is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira