[jira] [Commented] (HDFS-2126) Improve Namenode startup time [umbrella task]
[ https://issues.apache.org/jira/browse/HDFS-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060310#comment-13060310 ] Matt Foley commented on HDFS-2126: -- The intent is to close this umbrella Jira after HDFS-1391 and HDFS-1732 are committed. Other ideas for speeding up Namenode startup have been proposed, and in some cases Jiras opened. We record some of them here for historical interest, but they are more speculative and may be pursued under other Jiras or other projects: * Fully background the FSImage writes when in Safe Mode (HDFS-1798) * Further optimization of FSImage reads (e.g. HDFS-1366) * Concurrent FSImage read processing, by splitting the FSImage file into independently-processable partitions (speculative) * Improvements for Edits log read processing, similar to the efficiency improvements obtained for FSImage reads (speculative) * Concurrent Block Report processing (e.g. HDFS-1667) * Fully background Termination Scan (taking the improvements of HDFS-1391 to their maximum) Improve Namenode startup time [umbrella task] - Key: HDFS-2126 URL: https://issues.apache.org/jira/browse/HDFS-2126 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.2 Reporter: Matt Foley This is an umbrella task to group the improvements in Namenode startup latency made over the last few months, and track remaining ideas. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2010) Clean up and test behavior under failed edit streams
[ https://issues.apache.org/jira/browse/HDFS-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060326#comment-13060326 ] Matt Foley commented on HDFS-2010: -- Nope, lgtm! Clean up and test behavior under failed edit streams Key: HDFS-2010 URL: https://issues.apache.org/jira/browse/HDFS-2010 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Aaron T. Myers Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-2010.0.patch, hdfs-2010.1.patch, hdfs-2010.2.patch Right now there is very little test coverage of situations where one or more of the edits directories fails. In trunk, the behavior when all of the edits directories are dead is that the NN prints a fatal level log message and calls Runtime.exit(-1). I don't think this is really the behavior we want. Needs a bit of thought, but I think something like the following would make more sense: - any calls currently waiting on logSync should end up throwing an exception - NN should probably enter safe mode - ops can restore edits directories and then ask the NN to restore storage, at which point it could edit safemode - alternatively, ops could call ask the NN to do saveNamespace and then shut it down -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS
[ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060343#comment-13060343 ] sri commented on HDFS-503: -- I have couple of questions, 1)With, Raid being setup, I am not able to generate DFSAdmin report (hadoop dfsadmin -report). Why is that so ? 2)I am not able to reduce the targetReplicationFactor to 0 (I want to run mapreduce where the Bloackfixer retrives the data from the raided disks) Is der any way to do this. Thanks in advance Implement erasure coding as a layer on HDFS --- Key: HDFS-503 URL: https://issues.apache.org/jira/browse/HDFS-503 Project: Hadoop HDFS Issue Type: New Feature Components: contrib/raid Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.21.0 Attachments: raid1.txt, raid2.txt The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before. Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt. My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS
[ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060450#comment-13060450 ] sri commented on HDFS-503: -- I would like to know, if the stripes just act as a recovery option(when other datanodes have failed), or can they act as input to the mapreduce jobs(to satisfy locality). Implement erasure coding as a layer on HDFS --- Key: HDFS-503 URL: https://issues.apache.org/jira/browse/HDFS-503 Project: Hadoop HDFS Issue Type: New Feature Components: contrib/raid Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.21.0 Attachments: raid1.txt, raid2.txt The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before. Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt. My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1990) Resource leaks in HDFS
[ https://issues.apache.org/jira/browse/HDFS-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060540#comment-13060540 ] Hudson commented on HDFS-1990: -- Integrated in Hadoop-Hdfs-trunk #717 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/717/]) HDFS-1990. Fix resource leaks in BlockReceiver.close(). Contributed by Uma Maheswara Rao G szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1143147 Files : * /hadoop/common/trunk/hdfs/CHANGES.txt * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java Resource leaks in HDFS -- Key: HDFS-1990 URL: https://issues.apache.org/jira/browse/HDFS-1990 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: ramkrishna.s.vasudevan Assignee: Uma Maheswara Rao G Priority: Minor Fix For: 0.23.0 Attachments: HDFS-1990.patch, HDFS-1990.patch Possible resource leakage in HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1753) Resource Leak in org.apache.hadoop.hdfs.server.namenode.StreamFile
[ https://issues.apache.org/jira/browse/HDFS-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060541#comment-13060541 ] Hudson commented on HDFS-1753: -- Integrated in Hadoop-Hdfs-trunk #717 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/717/]) HDFS-1753. Resource Leak in StreamFile. Contributed by Uma Maheswara Rao G eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1143106 Files : * /hadoop/common/trunk/hdfs/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestStreamFile.java * /hadoop/common/trunk/hdfs/CHANGES.txt * /hadoop/common/trunk/hdfs/src/java/org/apache/hadoop/hdfs/server/namenode/StreamFile.java Resource Leak in org.apache.hadoop.hdfs.server.namenode.StreamFile -- Key: HDFS-1753 URL: https://issues.apache.org/jira/browse/HDFS-1753 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.23.0 Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-1753.1.patch, HDFS-1753.2.patch, HDFS-1753.3.patch, HDFS-1753.4.patch, HDFS-1753.patch In doGet Method, final DFSInputStream in = dfs.open(filename); final long fileLen = in.getFileLength(); OutputStream os = response.getOutputStream(); Here this lines are present at out side of the try block. If response.getOutputStream() throws any exception then DFSInputStream will not be closed.So, better to move response.getOutputStream() into try block. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2131) Tests for HADOOP-7361
[ https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2131: -- Attachment: HADOOP-7361-test.patch Tests for HADOOP-7361 - Key: HDFS-2131 URL: https://issues.apache.org/jira/browse/HDFS-2131 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HADOOP-7361-test.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2131) Tests for HADOOP-7361
Tests for HADOOP-7361 - Key: HDFS-2131 URL: https://issues.apache.org/jira/browse/HDFS-2131 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HADOOP-7361-test.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2131) Tests for HADOOP-7361
[ https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-2131: -- Status: Patch Available (was: Open) Tests for HADOOP-7361 - Key: HDFS-2131 URL: https://issues.apache.org/jira/browse/HDFS-2131 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HADOOP-7361-test.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2126) Improve Namenode startup time [umbrella task]
[ https://issues.apache.org/jira/browse/HDFS-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060681#comment-13060681 ] Koji Noguchi commented on HDFS-2126: Is it only us(Yahoo) namenodes which hits FullGC right after the handlers first start up? That wastes 1 to 3 mins. (We are giving large heap size with -Xmx and -Xms.) Improve Namenode startup time [umbrella task] - Key: HDFS-2126 URL: https://issues.apache.org/jira/browse/HDFS-2126 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.2 Reporter: Matt Foley This is an umbrella task to group the improvements in Namenode startup latency made over the last few months, and track remaining ideas. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2126) Improve Namenode startup time [umbrella task]
[ https://issues.apache.org/jira/browse/HDFS-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060690#comment-13060690 ] Todd Lipcon commented on HDFS-2126: --- Do you set CMSInitiatingOccupancyFraction in your GC options? I haven't noticed this behavior but haven't personally worked on startup time. Improve Namenode startup time [umbrella task] - Key: HDFS-2126 URL: https://issues.apache.org/jira/browse/HDFS-2126 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.20.2 Reporter: Matt Foley This is an umbrella task to group the improvements in Namenode startup latency made over the last few months, and track remaining ideas. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2131) Tests for HADOOP-7361
[ https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060691#comment-13060691 ] Hadoop QA commented on HDFS-2131: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485438/HADOOP-7361-test.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: org.apache.hadoop.hdfs.server.namenode.TestBackupNode org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.hdfs.TestSeekBug +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/879//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/879//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/879//console This message is automatically generated. Tests for HADOOP-7361 - Key: HDFS-2131 URL: https://issues.apache.org/jira/browse/HDFS-2131 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HADOOP-7361-test.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2131) Tests for HADOOP-7361
[ https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060695#comment-13060695 ] Uma Maheswara Rao G commented on HDFS-2131: --- TestDFSShell will pass after commiting the patch HADOOP-7361. remaining failures are not related to this patch. Tests for HADOOP-7361 - Key: HDFS-2131 URL: https://issues.apache.org/jira/browse/HDFS-2131 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HADOOP-7361-test.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers reassigned HDFS-2132: Assignee: Aaron T. Myers Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2131) Tests for HADOOP-7361
[ https://issues.apache.org/jira/browse/HDFS-2131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060738#comment-13060738 ] Daryn Sharp commented on HDFS-2131: --- These tests are probably better suited to {{TestHDFSCLI}}. Although I'm not fond of the custom framework and it's weak integration with junit, testing for a -1 return is a feeble check since it can occur for any number of reasons. The {{TestHDFSCLI}} tests will let you verify that the exception output is correct. Come to think of it, I'm surprised the commands aren't failing with exit 1... -1 is usually a usage error. Tests for HADOOP-7361 - Key: HDFS-2131 URL: https://issues.apache.org/jira/browse/HDFS-2131 Project: Hadoop HDFS Issue Type: Test Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Attachments: HADOOP-7361-test.patch -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060745#comment-13060745 ] Todd Lipcon commented on HDFS-2011: --- I'm working on merging this with HDFS-1073, and had one question: when do we expect that an editlog stream would be closed twice? In 1073 there are some extra asserts, so instead of ignoring the second close, it now throws java.io.IOException: Trying to use aborted output stream. I'm debating whether to remove this exception like you've done in this patch, vs remove the patch, since it seems like it might be indicative of a bug to close a stream twice. Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2132: - Attachment: hdfs-2132.0.patch Patch which makes sure that {{EditLogFileOutputStream.close(...)}} cleans up after itself. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Attachments: hdfs-2132.0.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2132: - Fix Version/s: 0.23.0 Status: Patch Available (was: Open) Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060775#comment-13060775 ] Ravi Prakash commented on HDFS-2011: I had noticed close being called twice while testing this functionality . This was causing a NullPointerException the second time. The stack trace is given in comment https://issues.apache.org/jira/browse/HDFS-2011?focusedCommentId=13041858page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13041858 {quote} 2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 87 on 8020, call getEditLogSize() from 98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:109) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.processIOError(FSEditLog.java:299) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getEditLogSize(FSEditLog.java:849) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEditLogSize(FSNamesystem.java:4270) at org.apache.hadoop.hdfs.server.namenode.NameNode.getEditLogSize(NameNode.java:1095) at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:346) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1399) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1395) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1094) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1393) {quote} The bug itself is quite hard to reproduce. I had to run my tests in an infinite loop and the NullPointerException happened after 3-4 hours (each run of the test would take 2 mins maybe). After the NullPointerException, the namenode would essentially be useless. Even hdfs dfs -ls would throw a NullPointerException. I am not sure myself which philosophy would be better. FileOutputStream itself ignores a second close. I checked this with the following program {noformat} import java.io.*; public class TestJAVA { public static void main(String args[]) { System.out.println(Hello World); try { FileOutputStream fos = new FileOutputStream(/tmp/ravi.txt); fos.write(50); fos.write(50); fos.write(50); fos.write(50); fos.write(50); fos.write(50); fos.close(); fos.close(); } catch (IOException ioe) { System.out.println(Hello California); System.out.println (ioe); } System.out.println(Hello Champaign); } } {noformat} Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060777#comment-13060777 ] Ravi Prakash commented on HDFS-2011: The program above output {noformat} Hello World Hello Champaign {noformat} Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060780#comment-13060780 ] John George commented on HDFS-2011: --- If I remember right, it was a case of an incomplete create as opposed to close being called twice. So, the close() was being called on a stream that was not really created... Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060793#comment-13060793 ] Ravi Prakash commented on HDFS-2132: I am new to Hadoop so please forgive me if I do not understand the philosophies behind this patch. If any of the close methods fail, they will throw an IOException which will be propagated up the stack. Isn't this the way all JAVA works? Comments on your patch 1. In normal operation all close methods within the try will be called once, and then once again in the IOUtils.cleanup method. What purpose does this serve? I would rather the methods be called only once. 2. In the finally block, all IOExceptions which might have been thrown are logged, and then programmatically swallowed. The upstream functions are never made aware of these IOExceptions and I am not sure this is the right behavior. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060794#comment-13060794 ] Hadoop QA commented on HDFS-2132: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485477/hdfs-2132.0.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/880//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/880//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/880//console This message is automatically generated. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2132: - Attachment: hdfs-2132.1.patch Whoops, uploaded the wrong patch. Here's one with tests. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060806#comment-13060806 ] Aaron T. Myers commented on HDFS-2132: -- bq. I am new to Hadoop so please forgive me if I do not understand the philosophies behind this patch. If any of the close methods fail, they will throw an IOException which will be propagated up the stack. Isn't this the way all JAVA works? This is indeed the way it works and is the desired behavior. The point of this patch is that when a close fails for any one of the {{Closeables}}, we should still make a last-ditch effort to close the others. If we can't close them then, there's nothing we can do. bq. 1. In normal operation all close methods within the try will be called once, and then once again in the IOUtils.cleanup method. What purpose does this serve? I would rather the methods be called only once. In the normal case all of the {{Closeables}} will be set to {{null}}. Note that {{IOUtils.cleanup(...)}} expressly handles {{nulls}}, and will not attempt to call {{close()}} again. bq. 2. In the finally block, all IOExceptions which might have been thrown are logged, and then programmatically swallowed. The upstream functions are never made aware of these IOExceptions and I am not sure this is the right behavior. It's true that in the exceptional case any failures to call {{close()}} in {{IOUtils.cleanup(...)}} will be logged and not propagated. This is exactly the intended behavior. Note that the original exception caused by the call to {{close()}} outside of {{IOUtils.cleanup(...)}} will still be propagated up. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060812#comment-13060812 ] Todd Lipcon commented on HDFS-2011: --- In the HDFS-1073 branch, EditLogOutputStream now has separate close() and abort() methods. abort() is used when there has been some error on the stream and we expect to do an unclean close (ie without flushing). close() is used for clean closes. If close() itself fails, it will then proceed to abort() when the IO error is handled. So, I think the correct test case on the branch is to call abort() twice and make sure that's ignored, or call close() and then abort() to make sure that's ignored. Does that sound reasonable? Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2018) Move all journal stream management code into one place
[ https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060822#comment-13060822 ] Jitendra Nath Pandey commented on HDFS-2018: Some comments: 1. FileJournalManager.java getMaxLoadableTransaction can be made private. 2. JournalManager interface Instead of adding archiveLogsOlderThan to the interface, we could add purgeTransactions method (as in HDFS-1580 design). FileJournalManager could implement purgeTransactions using archives, i.e. instead of really deleting, it archives them. Since Checkpoints are being archived, we don't need to force any JournalManager to archive edit logs as well. Apart from the above, the patch looks good to me. +1 Move all journal stream management code into one place -- Key: HDFS-2018 URL: https://issues.apache.org/jira/browse/HDFS-2018 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: Edit log branch (HDFS-1073) Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff Currently in the HDFS-1073 branch, the code for creating output streams is in FileJournalManager and the code for input streams is in the inspectors. This change does a number of things. - Input and Output streams are now created by the JournalManager. - FSImageStorageInspectors now deals with URIs when referring to edit logs - Recovery of inprogress logs is performed by counting the number of transactions instead of looking at the length of the file. The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060823#comment-13060823 ] Todd Lipcon commented on HDFS-2132: --- hrm, it doesn't look like bufReady should ever throw an IOE on close, right? it's just a memory buffer. But, fc.truncate() might throw an IOE - that seems like the more realistic case to worry about. Maybe that would be a better fault to inject for the test? Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2132: - Attachment: hdfs-2132.2.patch Thanks a lot for the review, Todd. Here's a patch which addresses your comment. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060847#comment-13060847 ] Hadoop QA commented on HDFS-2132: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485480/hdfs-2132.1.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/881//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/881//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/881//console This message is automatically generated. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060853#comment-13060853 ] John George commented on HDFS-2011: --- I think calling 1. abort() twice 2. close() twice 3. close() followed by an abort() would test most cases. Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bharath Mundlapudi updated HDFS-1977: - Attachment: HDFS-1977-2.patch Things changed since last post, reattaching with new changes. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1979) HDFS-1073: Fix backupnode for new edits/image layout
[ https://issues.apache.org/jira/browse/HDFS-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1979: -- Attachment: hdfs-1979.txt Cleaned up the patch. I think this should be ready to go. Here's a summary of some of the changes to make the rather large patch easier to follow: BackupNode itself: - no longer uses the spool file. Instead, the state tracks whether the BN is in sync or journaling only. In essence, the next log segment is used as the spool file. - lots of refactoring so that checkpoint code is primarily shared with the SecondaryNameNode. We could pull this into a new CheckpointUtils class or something, but didn't want to make that change here since it would make the patch even larger. - moved the BN-specific RPCs into a new BackupNodeProtocol instead of sharing NameNodeProtocol. This makes sense since the NN as is was just throwing exceptions on those calls. - split the BN RPCs into several pieces, rather than using journal() for everything. This makes the API easier to follow - fixed bugs where the NN would send uncheckpointed txns to the BackupNode (BN is currently non-functional in trunk) EditLog: - added new BackupJournalManager to coordinate talking to BN - added new parameter to start/end log segment about whether to include the special START/END transactions. This was necessary since the BN will receive these replicated from the NN, and thus shouldn't add its own in addition to what the NN wrote. BackupImage/FSImage: - new concept of lastAppliedTxId which tracks the latest txnid reflected by the namesystem. Some refactoring done so that this is properly tracked during image loading, etc. We used to simply use the edit log's last written txid for this, but in the case of the BN the edit log may be writing ahead of where the NS actually reflects. Storage inspector: - refactored out the planning of loading logs from which image. This will probably get changed again by the work in HDFS-1579, but this was the minimal change to get this working. Used when the BN is synchronizing with the NN. Tests: - added new test for the BN that makes sure it can stay in sync with the NN, replicates edits identically, etc. - split CN test and BN tests into separate methods to be easier to run just one - removed testBackupRegistration since we no longer have to enforce only-one-backupnode HDFS-1073: Fix backupnode for new edits/image layout Key: HDFS-1979 URL: https://issues.apache.org/jira/browse/HDFS-1979 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-1979-prelim.txt, hdfs-1979.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060872#comment-13060872 ] Ravi Prakash commented on HDFS-2132: Thanks Aaron for the explanation! :) I agree. I might be missing a trick (again ;-) ) , but are you sure the Closeables will be null after .close()? Won't they be references pointing to a closed stream) and so close() will be called twice on them. I don't see an easy way to avoid that though. So cool. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060877#comment-13060877 ] Aaron T. Myers commented on HDFS-2132: -- bq. Thanks Aaron for the explanation! No problem. bq. I might be missing a trick (again ) , but are you sure the Closeables will be null after .close()? Well, now that I look at it, you've effectively caught a bug. :) The previous code was expressly setting {{bufReady}} and {{bufCurrent}} to {{null}}, but not {{fp}} or {{fc}}. My patch didn't touch that code, but it might as well fix it. I'll upload another patch in a moment. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-2132: - Attachment: hdfs-2132.3.patch Patch which addresses the issue Ravi pointed out. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch, hdfs-2132.3.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum
[ https://issues.apache.org/jira/browse/HDFS-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2065: - Hadoop Flags: [Reviewed] Status: Patch Available (was: Open) +1 patch looks good. Fix NPE in DFSClient.getFileChecksum Key: HDFS-2065 URL: https://issues.apache.org/jira/browse/HDFS-2065 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2065-1.patch The following code can throw NPE if callGetBlockLocations returns null. If server returns null {code} ListLocatedBlock locatedblocks = callGetBlockLocations(namenode, src, 0, Long.MAX_VALUE).getLocatedBlocks(); {code} The right fix for this is server should throw right exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1979) HDFS-1073: Fix backupnode for new edits/image layout
[ https://issues.apache.org/jira/browse/HDFS-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1979: -- Attachment: hdfs-1979.txt Slight update to patch: I had forgotten to implement releaseBackupStreams properly. HDFS-1073: Fix backupnode for new edits/image layout Key: HDFS-1979 URL: https://issues.apache.org/jira/browse/HDFS-1979 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hdfs-1979-prelim.txt, hdfs-1979.txt, hdfs-1979.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060904#comment-13060904 ] Hadoop QA commented on HDFS-1977: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485490/HDFS-1977-2.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/884//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/884//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/884//console This message is automatically generated. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2128) Support for pluggable Trash policies
[ https://issues.apache.org/jira/browse/HDFS-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060909#comment-13060909 ] Usman Masood commented on HDFS-2128: The issue is to choose the right interface for pluggable Trash modules. Currently the public methods are: - moveToTrash(..) - getEmptier() - checkpoint() - expunge() The first two methods should be part of the Trash interface, but I'm not sure about the last two. Not every Trash policy should be required to implement a checkpoint mechanism. Currently expunge() and checkpoint() are used by FsShell for the -expunge arg. Support for pluggable Trash policies Key: HDFS-2128 URL: https://issues.apache.org/jira/browse/HDFS-2128 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur Assignee: dhruba borthakur It would be beneficial to make the Trash policy pluggable. One primary use-case for this is to archive files (in some remote store) when they get removed by Trash emptier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()
[ https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060910#comment-13060910 ] Bharath Mundlapudi commented on HDFS-1977: -- This patch doesn't include unit tests, since its just adopting to new logging api. No new tests are required. Stop using StringUtils.stringifyException() --- Key: HDFS-1977 URL: https://issues.apache.org/jira/browse/HDFS-1977 Project: Hadoop HDFS Issue Type: Improvement Reporter: Joey Echeverria Assignee: Bharath Mundlapudi Priority: Minor Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch The old version of the logging APIs didn't support logging stack traces by passing exceptions to the logging methods (e.g. Log.error()). A number of log statements make use of StringUtils.stringifyException() to get around the old behavior. It would be nice if this could get cleaned up to make use of the the logger's stack trace printing. This also gives users more control since you can configure how the stack traces are written to the logs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1780) reduce need to rewrite fsimage on statrtup
[ https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1780: -- Attachment: hdfs-1780.txt Here's a patch on the 1073 branch which implements this. reduce need to rewrite fsimage on statrtup -- Key: HDFS-1780 URL: https://issues.apache.org/jira/browse/HDFS-1780 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Daryn Sharp Attachments: hdfs-1780.txt On startup, the namenode will read the fs image, apply edits, then rewrite the fs image. This requires a non-trivial amount of time for very large directory structures. Perhaps the namenode should employ some logic to decide that the edits are simple enough that it doesn't warrant rewriting the image back out to disk. A few ideas: Use the size of the edit logs, if the size is below a threshold, assume it's cheaper to reprocess the edit log instead of writing the image back out. Time the processing of the edits and if the time is below a defined threshold, the image isn't rewritten. Timing the reading of the image, and the processing of the edits. Base the decision on the time it would take to write the image (a multiplier is applied to the read time?) versus the time it would take to reprocess the edits. If a certain threshold (perhaps percentage or expected time to rewrite) is exceeded, rewrite the image. Somethingalong the lines of the last suggestion may allow for defaults that adapt for any size cluster, thus eliminating the need to keep tweaking a cluster's settings based on its size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1780) reduce need to rewrite fsimage on statrtup
[ https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1780: -- Affects Version/s: Edit log branch (HDFS-1073) Fix Version/s: Edit log branch (HDFS-1073) Assignee: Todd Lipcon reduce need to rewrite fsimage on statrtup -- Key: HDFS-1780 URL: https://issues.apache.org/jira/browse/HDFS-1780 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Daryn Sharp Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1780.txt On startup, the namenode will read the fs image, apply edits, then rewrite the fs image. This requires a non-trivial amount of time for very large directory structures. Perhaps the namenode should employ some logic to decide that the edits are simple enough that it doesn't warrant rewriting the image back out to disk. A few ideas: Use the size of the edit logs, if the size is below a threshold, assume it's cheaper to reprocess the edit log instead of writing the image back out. Time the processing of the edits and if the time is below a defined threshold, the image isn't rewritten. Timing the reading of the image, and the processing of the edits. Base the decision on the time it would take to write the image (a multiplier is applied to the read time?) versus the time it would take to reprocess the edits. If a certain threshold (perhaps percentage or expected time to rewrite) is exceeded, rewrite the image. Somethingalong the lines of the last suggestion may allow for defaults that adapt for any size cluster, thus eliminating the need to keep tweaking a cluster's settings based on its size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2132) Potential resource leak in EditLogFileOutputStream.close
[ https://issues.apache.org/jira/browse/HDFS-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060938#comment-13060938 ] Hadoop QA commented on HDFS-2132: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485485/hdfs-2132.2.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/882//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/882//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/882//console This message is automatically generated. Potential resource leak in EditLogFileOutputStream.close Key: HDFS-2132 URL: https://issues.apache.org/jira/browse/HDFS-2132 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 0.23.0 Attachments: hdfs-2132.0.patch, hdfs-2132.1.patch, hdfs-2132.2.patch, hdfs-2132.3.patch {{EditLogFileOutputStream.close(...)}} sequentially closes a series of underlying resources. If any of the calls to {{close()}} throw before the last one, the later resources will never be closed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum
[ https://issues.apache.org/jira/browse/HDFS-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060939#comment-13060939 ] Hadoop QA commented on HDFS-2065: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483562/HDFS-2065-1.patch against trunk revision 1143147. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/886//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/886//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/886//console This message is automatically generated. Fix NPE in DFSClient.getFileChecksum Key: HDFS-2065 URL: https://issues.apache.org/jira/browse/HDFS-2065 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Bharath Mundlapudi Assignee: Bharath Mundlapudi Fix For: 0.23.0 Attachments: HDFS-2065-1.patch The following code can throw NPE if callGetBlockLocations returns null. If server returns null {code} ListLocatedBlock locatedblocks = callGetBlockLocations(namenode, src, 0, Long.MAX_VALUE).getLocatedBlocks(); {code} The right fix for this is server should throw right exception. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060949#comment-13060949 ] Aaron T. Myers commented on HDFS-1896: -- I just took a manual pass through the commits to trunk which have occurred since the branch was created for HDFS-1073. Here's the list of JIRAs which I think may be relevant: * HDFS-2011 * HDFS-1955 * HDFS-988 * HDFS-2041 * HDFS-2030 * HDFS-2003 * HDFS-1948 * HDFS-1149 * HDFS-1969 * HDFS-1636 * HDFS-1936 I'm going to try to manually verify that these fixes did not regress in the HDFS-1073 branch. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1780) reduce need to rewrite fsimage on statrtup
[ https://issues.apache.org/jira/browse/HDFS-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1780: -- Attachment: hdfs-1780.txt One small fix -- needed to fix FSImageTransactionalStorageInspector.needToSave() so that it forces a save if any of the image directories are missing VERSION file. This bug was causing TestNameEditsConfigs to fail. reduce need to rewrite fsimage on statrtup -- Key: HDFS-1780 URL: https://issues.apache.org/jira/browse/HDFS-1780 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Daryn Sharp Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1780.txt, hdfs-1780.txt On startup, the namenode will read the fs image, apply edits, then rewrite the fs image. This requires a non-trivial amount of time for very large directory structures. Perhaps the namenode should employ some logic to decide that the edits are simple enough that it doesn't warrant rewriting the image back out to disk. A few ideas: Use the size of the edit logs, if the size is below a threshold, assume it's cheaper to reprocess the edit log instead of writing the image back out. Time the processing of the edits and if the time is below a defined threshold, the image isn't rewritten. Timing the reading of the image, and the processing of the edits. Base the decision on the time it would take to write the image (a multiplier is applied to the read time?) versus the time it would take to reprocess the edits. If a certain threshold (perhaps percentage or expected time to rewrite) is exceeded, rewrite the image. Somethingalong the lines of the last suggestion may allow for defaults that adapt for any size cluster, thus eliminating the need to keep tweaking a cluster's settings based on its size. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2011: -- Attachment: elfos-close-patch-on-1073.txt Here's the patch I'm planning to commit to 1073 branch. Look good? I will also do some stress testing similar to what Ravi described on the branch to see if I can reproduce the issue he saw. Removal and restoration of storage directories on checkpointing failure doesn't work properly - Key: HDFS-2011 URL: https://issues.apache.org/jira/browse/HDFS-2011 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Fix For: 0.23.0 Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073.txt Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2133) 1073: address remaining TODOs and pre-merge cleanup
1073: address remaining TODOs and pre-merge cleanup --- Key: HDFS-2133 URL: https://issues.apache.org/jira/browse/HDFS-2133 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Assignee: Todd Lipcon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2133) 1073: address remaining TODOs and pre-merge cleanup
[ https://issues.apache.org/jira/browse/HDFS-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2133: -- Component/s: name-node Description: There are a few TODOs still in the code and a bit of cleanup to be done before merging HDFS-1073. This JIRA is for this misc cleanup. Affects Version/s: Edit log branch (HDFS-1073) Fix Version/s: Edit log branch (HDFS-1073) 1073: address remaining TODOs and pre-merge cleanup --- Key: HDFS-2133 URL: https://issues.apache.org/jira/browse/HDFS-2133 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) There are a few TODOs still in the code and a bit of cleanup to be done before merging HDFS-1073. This JIRA is for this misc cleanup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2133) 1073: address remaining TODOs and pre-merge cleanup
[ https://issues.apache.org/jira/browse/HDFS-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2133: -- Attachment: hdfs-2133.txt Patch addresses the following: - removes an extra setReadyToFlush/flush call in EditLogFileOutputStream.close. This snuck in when we did some refactoring, but seems unnecessary, since we always flush before closing a stream anyway. Tests seem to be passing even when I remove this (and the next bit of that same function already verifies that there isn't any unflushed data in the buffer) {code:title=FSImage.java} -// TODO need to discuss what the correct logic is for determing which -// storage directory to read properties from sdForProperties.read(); {code} This TODO is invalid -- when inspecting the dirs at startup, we already call {{read()}} for each directory. That means that we've verified that they all contain the same data. Since VERSION files are now just namespace info, and nothing related to checkpoint times or versions, it doesn't matter which one we read() from here. {code:title=FSImage.java} - storage.writeAll(); // TODO is this a good spot for this? - + storage.writeAll(); {code} Yes, I think it's a good spot :) Eli had commented that he agreed in an earlier code review, but I hadn't removed it at that point. This {{writeAll}} call is necessary when adding new directories to a NN, for example. - various changes to remove checkpointTxId from NameNodeRegistration and CheckpointCommand. A checkpoint txid is no longer relevant when deciding whether to allow a checkpoint to take place, since we can distinguish between different checkpoints at different txids. - various javadoc additions where things were incorrect or incomplete 1073: address remaining TODOs and pre-merge cleanup --- Key: HDFS-2133 URL: https://issues.apache.org/jira/browse/HDFS-2133 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-2133.txt There are a few TODOs still in the code and a bit of cleanup to be done before merging HDFS-1073. This JIRA is for this misc cleanup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060964#comment-13060964 ] Aaron T. Myers commented on HDFS-1896: -- Oh, also, though it's not committed yet, HDFS-2132 likely will be soon. This will also need to be ported to the HDFS-1073 branch. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-503) Implement erasure coding as a layer on HDFS
[ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060975#comment-13060975 ] dhruba borthakur commented on HDFS-503: --- 1. Raid has no impact on dfsadmin -report command. 2. You won't be able to set a replication factor to 0. You would have to manually pull the plug (kill it) on a datanode to see how raid works. 3. stripe locations do not contribute to split locations of a block, thus they are not used for map-reduce locality. Implement erasure coding as a layer on HDFS --- Key: HDFS-503 URL: https://issues.apache.org/jira/browse/HDFS-503 Project: Hadoop HDFS Issue Type: New Feature Components: contrib/raid Reporter: dhruba borthakur Assignee: dhruba borthakur Fix For: 0.21.0 Attachments: raid1.txt, raid2.txt The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before. Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt. My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060981#comment-13060981 ] Aaron T. Myers commented on HDFS-1896: -- I've gone through the list. Most things appear to be fine. The only issues I found were the following: # HDFS-1955 - This appears to have entirely regressed in 1073. # HDFS-1149 - I bet the change to {{NNStorage.setFields(...)}} will cause the upgrade tests to break. At the very least, there are now some unused imports in {{NNStorage}} on the 1073 branch. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2134) Move DecommissionManager to block management
[ https://issues.apache.org/jira/browse/HDFS-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2134: - Affects Version/s: 0.23.0 Status: Patch Available (was: Open) Move DecommissionManager to block management Key: HDFS-2134 URL: https://issues.apache.org/jira/browse/HDFS-2134 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.23.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2134_20110706.patch Datanode management including {{DecommissionManager}} should belong to block management. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2134) Move DecommissionManager to block management
[ https://issues.apache.org/jira/browse/HDFS-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-2134: - Attachment: h2134_20110706.patch h2134_20110706.patch: moving the codes. Move DecommissionManager to block management Key: HDFS-2134 URL: https://issues.apache.org/jira/browse/HDFS-2134 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.23.0 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h2134_20110706.patch Datanode management including {{DecommissionManager}} should belong to block management. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2134) Move DecommissionManager to block management
Move DecommissionManager to block management Key: HDFS-2134 URL: https://issues.apache.org/jira/browse/HDFS-2134 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Datanode management including {{DecommissionManager}} should belong to block management. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2104) 1073: Add a flag to 2NN to format its checkpoint dirs on startup
[ https://issues.apache.org/jira/browse/HDFS-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2104: -- Attachment: hdfs-2104.txt Added a -format flag. While I was at it, I replaced the ad-hoc parsing code with Commons-CLI. I elected _not_ to have it ask for confirmation, since this is only formatting the secondary and not a primary. Our old behavior was basically to overwrite the local image anyway, so this isn't a regression in safety. 1073: Add a flag to 2NN to format its checkpoint dirs on startup Key: HDFS-2104 URL: https://issues.apache.org/jira/browse/HDFS-2104 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Attachments: hdfs-2104.txt -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061008#comment-13061008 ] Todd Lipcon commented on HDFS-1896: --- bq. I bet the change to NNStorage.setFields(...) will cause the upgrade tests to break. hmm, you sure you mean 1149 (the lease reassignment fix?) I'm not sure I see how that relates to NNStorage. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061009#comment-13061009 ] Todd Lipcon commented on HDFS-1896: --- btw, thanks for looking into these details. I'll file a JIRA about fixing 1955. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-2135) 1073: fix regression of HDFS-1955 in branch
1073: fix regression of HDFS-1955 in branch --- Key: HDFS-2135 URL: https://issues.apache.org/jira/browse/HDFS-2135 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Todd Lipcon Assignee: Todd Lipcon -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2135) 1073: fix regression of HDFS-1955 in branch
[ https://issues.apache.org/jira/browse/HDFS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2135: -- Description: atm went through the NN storage-related JIRAs committed in trunk since HDFS-1073 was branched, and noted that it looked like HDFS-1955 is regressed on the branch. This JIRA is to investigate and fix as necessary. Affects Version/s: Edit log branch (HDFS-1073) Fix Version/s: Edit log branch (HDFS-1073) 1073: fix regression of HDFS-1955 in branch --- Key: HDFS-2135 URL: https://issues.apache.org/jira/browse/HDFS-2135 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) atm went through the NN storage-related JIRAs committed in trunk since HDFS-1073 was branched, and noted that it looked like HDFS-1955 is regressed on the branch. This JIRA is to investigate and fix as necessary. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061011#comment-13061011 ] Aaron T. Myers commented on HDFS-1896: -- Sorry, I meant HDFS-1969 not HDFS-1149. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2135) 1073: fix regression of HDFS-1955 in branch
[ https://issues.apache.org/jira/browse/HDFS-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2135: -- Attachment: hdfs-2135.txt Was a fairly trivial error - it was using the errorSDs list, which was getting cleared halfway through the function. I switched it to look at how many storage dirs got removed by asking the storage instead. Unfortunately it's difficult to write a unit test, as was observed in the original JIRA. So, I tested by hand by adding a fault in the saving code for one of the dirs. 1073: fix regression of HDFS-1955 in branch --- Key: HDFS-2135 URL: https://issues.apache.org/jira/browse/HDFS-2135 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-2135.txt atm went through the NN storage-related JIRAs committed in trunk since HDFS-1073 was branched, and noted that it looked like HDFS-1955 is regressed on the branch. This JIRA is to investigate and fix as necessary. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-1896) Additional QA tasks for Edit Log branch
[ https://issues.apache.org/jira/browse/HDFS-1896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061014#comment-13061014 ] Todd Lipcon commented on HDFS-1896: --- I just manually tested HDFS-1969's new functionality and it seems to be working: - formatted NN using 0.20 - ran NN from 1073 branch - it complained about wrong layout - ran 1073 NN with -upgrade flag, it started with upgrade - ran 1073 NN with -rollback flag, it correctly complained - ran 0.20 NN with -rollback flag, it rolled back to old namespace Once all of the currently outstanding patches are applied, the upgrade tests also seem to be passing, so I think we're OK on that one. I agree there are lots of unused imports. I'll do a pass to clean them up right before we merge. Additional QA tasks for Edit Log branch --- Key: HDFS-1896 URL: https://issues.apache.org/jira/browse/HDFS-1896 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) As we close out tasks in the HDFS-1073 branch, there are a few places where I've noticed that we lack some test coverage. Creating this ticket just as a place to jot down some notes on things that we ought to make sure are tested, preferably by automated (unit) tests. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1794) Add code to list which edit logs are available on a remote NN
[ https://issues.apache.org/jira/browse/HDFS-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1794: -- Hadoop Flags: [Reviewed] Add code to list which edit logs are available on a remote NN - Key: HDFS-1794 URL: https://issues.apache.org/jira/browse/HDFS-1794 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1794.txt, hdfs-1794.txt When the 2NN or BN needs to sync up with the primary NN, it may need to download several different edits files since the NN may roll whenever it likes. This JIRA adds a new type called RemoteEditLogManifest to list the available edit log files since a given transaction ID. This may also be useful for monitoring or backup tools down the road. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1993) TestCheckpoint needs to clean up between cases
[ https://issues.apache.org/jira/browse/HDFS-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1993: -- Hadoop Flags: [Reviewed] TestCheckpoint needs to clean up between cases -- Key: HDFS-1993 URL: https://issues.apache.org/jira/browse/HDFS-1993 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node, test Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1993.txt TestCheckpoint currently relies on some test ordering in order to pass correctly. Instead it should clean itself up in a setUp() method. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1792) Add code to detect valid length of an edits file
[ https://issues.apache.org/jira/browse/HDFS-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1792: -- Hadoop Flags: [Reviewed] Add code to detect valid length of an edits file Key: HDFS-1792 URL: https://issues.apache.org/jira/browse/HDFS-1792 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1792.txt In some edit log corruption situations, it's useful to be able to determine the valid length of an edit log. For this JIRA we define valid as the length of the file excluding any trailing 0x00 bytes, usually left there by the preallocation done while writing. In the future this API can be extended to look at edit checksums, etc. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-1894) Add constants for LAYOUT_VERSIONs in edits log branch
[ https://issues.apache.org/jira/browse/HDFS-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-1894: -- Hadoop Flags: [Reviewed] Add constants for LAYOUT_VERSIONs in edits log branch - Key: HDFS-1894 URL: https://issues.apache.org/jira/browse/HDFS-1894 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: Edit log branch (HDFS-1073) Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: Edit log branch (HDFS-1073) Attachments: hdfs-1894.txt, hdfs-1894.txt When merging from trunk into branch, it's pretty difficult to resolve conflicts around the layout versions, since trunk keeps swallowing whatever layout version I've picked in the branch. Adding a couple of constants will make the merges much easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira