[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
[ https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834158#action_12834158 ] Cosmin Lehene commented on HDFS-909: @Todd what's the state of this patch? This happens more often than I initially thought. Just hit it again. Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log - Key: HDFS-909 URL: https://issues.apache.org/jira/browse/HDFS-909 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 Environment: CentOS Reporter: Cosmin Lehene Assignee: Todd Lipcon Priority: Blocker Fix For: 0.21.0, 0.22.0 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt closing the edits log file can race with write to edits log file operation resulting in OP_INVALID end-of-file marker being initially overwritten by the concurrent (in setReadyToFlush) threads and then removed twice from the buffer, losing a good byte from edits log. Example: {code} FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush() FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - FSEditLog.closeStream() - EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync() OR FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - FSEditLog.closeStream() -EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync() VERSUS FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.setReadyToFlush() FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync() OR Any FSEditLog.write {code} Access on the edits flush operations is synchronized only in the FSEdits.logSync() method level. However at a lower level access to EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT synchronized. These can be called from concurrent threads like in the example above So if a rollEditLog or rollFSIMage is happening at the same time with a write operation it can race for EditLogFileOutputStream.setReadyToFlush that will overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the end-of-file marker) and then remove it twice (from each thread) in flushAndSync()! Hence there will be a valid byte missing from the edits log that leads to a SecondaryNameNode silent failure and a full HDFS failure upon cluster restart. We got to this point after investigating a corrupted edits file that made HDFS unable to start with {code:title=namenode.log} java.io.IOException: Incorrect data format. logVersion is -20 but writables.length is 768. at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450 {code} EDIT: moved the logs to a comment to make this readable -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834252#action_12834252 ] Jay Booth commented on HDFS-918: Yeah, I'll do my best to get benchmarks by the end of the weekend, kind of a crazy week this week and I moved this past weekend, so I don't have a ton of time. Todd, if you feel like blasting a couple stream-of-consciousness comments to me via email, go right ahead, otherwise I'll run the benchmarks this weekend and wait for the well-written version :). Zlatin, I originally had a similar architecture to what you're describing, using a BlockingQueue to funnel the actual work to a threadpool, but I had some issues with being able to get the locking quite right, either I wasn't getting things into the queue as fast as possible, or I was burning a lot of empty cycles in the selector thread. Specifically, I can't cancel a SelectionKey and then re-register with the same selector afterwards, it leads to exceptions, so my Selector thread was spinning in a tight loop verifying that, yes, all of these writable SelectionKeys are currently enqueued for work, whenever anything was being processed. But that was a couple iterations ago, maybe I'll have better luck trying it now. What we really need is a libevent-like framework, I'll spend a little time reviewing the outward facing API for that and scratching my noggin. Ultimately, only so much I/O can actually happen at a time before the disk is swamped, so it might be that a set of, say, 32 selector threads gets the same performance as 1024 threads. In that case, we'd be taking up fewer resources for the same performance. At any rate, I need to benchmark before speculating further. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
[ https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834264#action_12834264 ] Cosmin Lehene commented on HDFS-909: @Todd, Thanks! We're using 0.21 :) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log - Key: HDFS-909 URL: https://issues.apache.org/jira/browse/HDFS-909 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 Environment: CentOS Reporter: Cosmin Lehene Assignee: Todd Lipcon Priority: Blocker Fix For: 0.21.0, 0.22.0 Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt closing the edits log file can race with write to edits log file operation resulting in OP_INVALID end-of-file marker being initially overwritten by the concurrent (in setReadyToFlush) threads and then removed twice from the buffer, losing a good byte from edits log. Example: {code} FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - FSEditLog.closeStream() - EditLogOutputStream.setReadyToFlush() FSNameSystem.rollEditLog() - FSEditLog.divertFileStreams() - FSEditLog.closeStream() - EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync() OR FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - FSEditLog.closeStream() -EditLogOutputStream.setReadyToFlush() FSNameSystem.rollFSImage() - FSIMage.rollFSImage() - FSEditLog.purgeEditLog() - FSEditLog.revertFileStreams() - FSEditLog.closeStream() -EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync() VERSUS FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.setReadyToFlush() FSNameSystem.completeFile - FSEditLog.logSync() - EditLogOutputStream.flush() - EditLogFileOutputStream.flushAndSync() OR Any FSEditLog.write {code} Access on the edits flush operations is synchronized only in the FSEdits.logSync() method level. However at a lower level access to EditsLogOutputStream setReadyToFlush(), flush() or flushAndSync() is NOT synchronized. These can be called from concurrent threads like in the example above So if a rollEditLog or rollFSIMage is happening at the same time with a write operation it can race for EditLogFileOutputStream.setReadyToFlush that will overwrite the the last byte (normally the FSEditsLog.OP_INVALID which is the end-of-file marker) and then remove it twice (from each thread) in flushAndSync()! Hence there will be a valid byte missing from the edits log that leads to a SecondaryNameNode silent failure and a full HDFS failure upon cluster restart. We got to this point after investigating a corrupted edits file that made HDFS unable to start with {code:title=namenode.log} java.io.IOException: Incorrect data format. logVersion is -20 but writables.length is 768. at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450 {code} EDIT: moved the logs to a comment to make this readable -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834274#action_12834274 ] Zlatin Balevsky commented on HDFS-918: -- Jay, the selector thread is likely busylooping because select() will return immediately if any channels are writable. Cancelling takes a select() call and you cannot re-register the channel until the key has been properly cancelled and removed from the selector key sets. It is easier to turn write interest off before passing the writable channel to the threadpool. When the threadpool is done with transferTo(), pass the channel back to the select()-ing thread and instruct it to turn write interest back on. (Do not change the interest outside the selecting thread.) Hope this helps. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-939) libhdfs test is broken
[ https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-939: - Status: Open (was: Patch Available) libhdfs test is broken -- Key: HDFS-939 URL: https://issues.apache.org/jira/browse/HDFS-939 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-939-1.patch The libhdfs test currently does not run because hadoop.tmp.dir is specified as a relative path, and it looks like a side-effect of HDFS-873 was that relative paths get made absolute, so build/test/libhdfs gets turned into /test/libhdfs, which the NN can not create. Let's make the test generate conf files that use the appropriate directory (build/test/libhdfs) specified by fully qualified URIs. Also, are relative paths in conf files supported? If not rather than fail we should detect this and print a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-939) libhdfs test is broken
[ https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-939: - Status: Patch Available (was: Open) libhdfs test is broken -- Key: HDFS-939 URL: https://issues.apache.org/jira/browse/HDFS-939 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Reporter: Eli Collins Assignee: Eli Collins Attachments: hdfs-939-1.patch The libhdfs test currently does not run because hadoop.tmp.dir is specified as a relative path, and it looks like a side-effect of HDFS-873 was that relative paths get made absolute, so build/test/libhdfs gets turned into /test/libhdfs, which the NN can not create. Let's make the test generate conf files that use the appropriate directory (build/test/libhdfs) specified by fully qualified URIs. Also, are relative paths in conf files supported? If not rather than fail we should detect this and print a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-940) libhdfs test uses UnixUserGroupInformation
[ https://issues.apache.org/jira/browse/HDFS-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-940: - Priority: Blocker (was: Major) Affects Version/s: 0.22.0 Fix Version/s: 0.22.0 Bumping priority. libhdfs is not contrib. libhdfs test uses UnixUserGroupInformation -- Key: HDFS-940 URL: https://issues.apache.org/jira/browse/HDFS-940 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.22.0 Reporter: Eli Collins Priority: Blocker Fix For: 0.22.0 The libhdfs test fails with the following, needs to be updated since UnixUserGroupInformation was removed. [exec] failed to construct hadoop user unix group info object [exec] Exception in thread main java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation: method init()V not found [exec] at org.apache.hadoop.security.UnixUserGroupInformation.init(UnixUserGroupInformation.java:69) [exec] Call to org/apache/hadoop/security/UnixUserGroupInformation failed! [exec] Oops! Failed to connect to hdfs as user nobody! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-939) libhdfs test is broken
[ https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-939: - Priority: Blocker (was: Major) Affects Version/s: 0.22.0 Fix Version/s: 0.22.0 libhdfs test is broken -- Key: HDFS-939 URL: https://issues.apache.org/jira/browse/HDFS-939 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-939-1.patch The libhdfs test currently does not run because hadoop.tmp.dir is specified as a relative path, and it looks like a side-effect of HDFS-873 was that relative paths get made absolute, so build/test/libhdfs gets turned into /test/libhdfs, which the NN can not create. Let's make the test generate conf files that use the appropriate directory (build/test/libhdfs) specified by fully qualified URIs. Also, are relative paths in conf files supported? If not rather than fail we should detect this and print a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-939) libhdfs test is broken
[ https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834311#action_12834311 ] Eli Collins commented on HDFS-939: -- Could a comitter review this? I doubt this breaks a core or contrib test, the Hudson output doesn't give details. libhdfs test is broken -- Key: HDFS-939 URL: https://issues.apache.org/jira/browse/HDFS-939 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Fix For: 0.22.0 Attachments: hdfs-939-1.patch The libhdfs test currently does not run because hadoop.tmp.dir is specified as a relative path, and it looks like a side-effect of HDFS-873 was that relative paths get made absolute, so build/test/libhdfs gets turned into /test/libhdfs, which the NN can not create. Let's make the test generate conf files that use the appropriate directory (build/test/libhdfs) specified by fully qualified URIs. Also, are relative paths in conf files supported? If not rather than fail we should detect this and print a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834321#action_12834321 ] Jay Booth commented on HDFS-918: Thanks Zlatin, I think you're right. I'll look at finding a way to remove writable interest without cancelling the key, that could fix the busy looping issue, then I could use a condition to ensure wakeup when something is newly writable-interested (via completed packet or new request) and refactor back to a single selector thread and several executing threads. I'll make a copy of the patch and try benchmarking both methods. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file
[ https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834396#action_12834396 ] Suresh Srinivas commented on HDFS-946: -- # TestDFSShell.java not sure why the methods are named starting with caps. Also is the change to this file needed? # FSDirectory.createFileStatus - consider moving isDirectory check outside. Also current code extends beyond 80 columns. # HDFSFileStatus #* consider naming it HdfsFileStatus #* final static public should public static final #* since this if for HDFS, comments in the code about different notions in the FS is not required in methods getPermission(), getOwner(), getGroup(), #* Some of the method parameters and other variables could be declared final # getFulName() - without unnecessary else code is more readable. Same for getFullPath() NameNode should not return full path name when lisitng a diretory or getting the status of a file - Key: HDFS-946 URL: https://issues.apache.org/jira/browse/HDFS-946 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch FSDirectory#getListring(String src) has the following code: int i = 0; for (INode cur : contents) { listing[i] = createFileStatus(srcs+cur.getLocalName(), cur); i++; } So listing a directory will return an array of FileStatus. Each FileStatus element has the full path name. This increases the return message size and adds non-negligible CPU time to the operation. FSDirectory#getFileInfo(String) does not need to return the file name either. Another optimization is that in the version of FileStatus that's used in the wire protocol, the field path does not need to be Path; It could be a String or a byte array ideally. This could avoid unnecessary creation of the Path objects at NameNode, thus help reduce the GC problem observed when a large number of getFileInfo or getListing operations hit NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-939) libhdfs test is broken
[ https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834447#action_12834447 ] Hadoop QA commented on HDFS-939: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434479/hdfs-939-1.patch against trunk revision 908628. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/232/console This message is automatically generated. libhdfs test is broken -- Key: HDFS-939 URL: https://issues.apache.org/jira/browse/HDFS-939 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-939-1.patch The libhdfs test currently does not run because hadoop.tmp.dir is specified as a relative path, and it looks like a side-effect of HDFS-873 was that relative paths get made absolute, so build/test/libhdfs gets turned into /test/libhdfs, which the NN can not create. Let's make the test generate conf files that use the appropriate directory (build/test/libhdfs) specified by fully qualified URIs. Also, are relative paths in conf files supported? If not rather than fail we should detect this and print a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-939) libhdfs test is broken
[ https://issues.apache.org/jira/browse/HDFS-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834465#action_12834465 ] Eli Collins commented on HDFS-939: -- The core test failure is HDFS-982 and the contrib test failure is HDFS-981. libhdfs test is broken -- Key: HDFS-939 URL: https://issues.apache.org/jira/browse/HDFS-939 Project: Hadoop HDFS Issue Type: Bug Components: contrib/libhdfs Affects Versions: 0.22.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-939-1.patch The libhdfs test currently does not run because hadoop.tmp.dir is specified as a relative path, and it looks like a side-effect of HDFS-873 was that relative paths get made absolute, so build/test/libhdfs gets turned into /test/libhdfs, which the NN can not create. Let's make the test generate conf files that use the appropriate directory (build/test/libhdfs) specified by fully qualified URIs. Also, are relative paths in conf files supported? If not rather than fail we should detect this and print a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-520) Create new tests for block recovery
[ https://issues.apache.org/jira/browse/HDFS-520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-520: --- Attachment: blockRecoveryPositive.patch This patch imlements the block recovery tests described in section 5. 1. TestBlockRecovery covers BlockRecovery_02.8 - 13. 2. TestLeaseRecovery2#testHardLeaseRecovery is a funcational test that covers BlockRecovery_02 (02.1-02.5,) and BlockRecovery_03, 03.1, 04. 3. I do not think BlockRecovery_02.6 is in the scope of this test. 4. BlockRecovery_01 is a negative test that will be covered in the negative test suite. Create new tests for block recovery --- Key: HDFS-520 URL: https://issues.apache.org/jira/browse/HDFS-520 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Attachments: blockRecoveryPositive.patch According to the test plan a number of new features are going to be implemented as a part of this umbrella (HDFS-265) JIRA. These new features are have to be tested properly. Block recovery is one of new functionality which require new tests to be developed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-894) DatanodeID.ipcPort is not updated when existing node re-registers
[ https://issues.apache.org/jira/browse/HDFS-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-894: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks Todd! DatanodeID.ipcPort is not updated when existing node re-registers - Key: HDFS-894 URL: https://issues.apache.org/jira/browse/HDFS-894 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Attachments: hdfs-894.txt In FSNamesystem.registerDatanode, it checks if a registering node is a reregistration of an old one based on storage ID. If so, it simply updates the old one with the new registration info. However, the new ipcPort is lost when this happens. I produced manually this by setting up a DN with IPC port set to 0 (so it picks an ephemeral port) and then restarting the DN. At this point, the NN's view of the ipcPort is stale, and clients will not be able to achieve pipeline recovery. This should be easy to fix and unit test, but not sure when I'll get to it, so anyone else should feel free to grab it if they get to it first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-245) Create symbolic links in HDFS
[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-245: - Attachment: symlink40-hdfs.patch Patch attached. Removes testSetTimes since it was moved to common but otherwise the same. Create symbolic links in HDFS - Key: HDFS-245 URL: https://issues.apache.org/jira/browse/HDFS-245 Project: Hadoop HDFS Issue Type: New Feature Reporter: dhruba borthakur Assignee: Eli Collins Attachments: 4044_20081030spi.java, design-doc-v4.txt, designdocv1.txt, designdocv2.txt, designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-245) Create symbolic links in HDFS
[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-245: - Status: Patch Available (was: Open) Create symbolic links in HDFS - Key: HDFS-245 URL: https://issues.apache.org/jira/browse/HDFS-245 Project: Hadoop HDFS Issue Type: New Feature Reporter: dhruba borthakur Assignee: Eli Collins Attachments: 4044_20081030spi.java, design-doc-v4.txt, designdocv1.txt, designdocv2.txt, designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-894) DatanodeID.ipcPort is not updated when existing node re-registers
[ https://issues.apache.org/jira/browse/HDFS-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated HDFS-894: --- Fix Version/s: 0.22.0 DatanodeID.ipcPort is not updated when existing node re-registers - Key: HDFS-894 URL: https://issues.apache.org/jira/browse/HDFS-894 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: hdfs-894.txt In FSNamesystem.registerDatanode, it checks if a registering node is a reregistration of an old one based on storage ID. If so, it simply updates the old one with the new registration info. However, the new ipcPort is lost when this happens. I produced manually this by setting up a DN with IPC port set to 0 (so it picks an ephemeral port) and then restarting the DN. At this point, the NN's view of the ipcPort is stale, and clients will not be able to achieve pipeline recovery. This should be easy to fix and unit test, but not sure when I'll get to it, so anyone else should feel free to grab it if they get to it first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-245) Create symbolic links in HDFS
[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834605#action_12834605 ] Hadoop QA commented on HDFS-245: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12436052/symlink40-hdfs.patch against trunk revision 910760. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 15 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 117 release audit warnings (more than the trunk's current 0 warnings). -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/233/console This message is automatically generated. Create symbolic links in HDFS - Key: HDFS-245 URL: https://issues.apache.org/jira/browse/HDFS-245 Project: Hadoop HDFS Issue Type: New Feature Reporter: dhruba borthakur Assignee: Eli Collins Attachments: 4044_20081030spi.java, design-doc-v4.txt, designdocv1.txt, designdocv2.txt, designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-245) Create symbolic links in HDFS
[ https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834607#action_12834607 ] Eli Collins commented on HDFS-245: -- Test failures are due to HDFS-981, HDFS-982, and errors reading partially download zip files. Create symbolic links in HDFS - Key: HDFS-245 URL: https://issues.apache.org/jira/browse/HDFS-245 Project: Hadoop HDFS Issue Type: New Feature Reporter: dhruba borthakur Assignee: Eli Collins Attachments: 4044_20081030spi.java, design-doc-v4.txt, designdocv1.txt, designdocv2.txt, designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, symlink28-hdfs.patch, symlink29-hdfs.patch, symlink29-hdfs.patch, symlink30-hdfs.patch, symlink31-hdfs.patch, symlink33-hdfs.patch, symlink35-hdfs.patch, symlink36-hdfs.patch, symlink37-hdfs.patch, symlink38-hdfs.patch, symlink39-hdfs.patch, symLink4.patch, symlink40-hdfs.patch, symLink5.patch, symLink6.patch, symLink8.patch, symLink9.patch HDFS should support symbolic links. A symbolic link is a special type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution. Programs which read or write to files named by a symbolic link will behave as if operating directly on the target file. However, archiving utilities can handle symbolic links specially and manipulate them directly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file
[ https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-946: --- Attachment: HdfsFileStatus3.patch This patch incorporates all review comments except for 2. consider move isDirectory outside. In addition, it 1. removes the method getPath(), which returns the byte array, from HdfsFileStatus. This makes the byte array immutable, which makes the change safer. 2. fixes a few more bugs in Jsp code which misuses getPath(). 3. adds comments to INode#name that reminds that if this encoding is changed, the ClientProtocol is changed and the decoding of HdfsFileStatus#name should change too. NameNode should not return full path name when lisitng a diretory or getting the status of a file - Key: HDFS-946 URL: https://issues.apache.org/jira/browse/HDFS-946 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, HdfsFileStatus3.patch FSDirectory#getListring(String src) has the following code: int i = 0; for (INode cur : contents) { listing[i] = createFileStatus(srcs+cur.getLocalName(), cur); i++; } So listing a directory will return an array of FileStatus. Each FileStatus element has the full path name. This increases the return message size and adds non-negligible CPU time to the operation. FSDirectory#getFileInfo(String) does not need to return the file name either. Another optimization is that in the version of FileStatus that's used in the wire protocol, the field path does not need to be Path; It could be a String or a byte array ideally. This could avoid unnecessary creation of the Path objects at NameNode, thus help reduce the GC problem observed when a large number of getFileInfo or getListing operations hit NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-946) NameNode should not return full path name when lisitng a diretory or getting the status of a file
[ https://issues.apache.org/jira/browse/HDFS-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-946: --- Hadoop Flags: [Incompatible change] Status: Patch Available (was: Open) NameNode should not return full path name when lisitng a diretory or getting the status of a file - Key: HDFS-946 URL: https://issues.apache.org/jira/browse/HDFS-946 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: HDFSFileStatus.patch, HDFSFileStatus1.patch, HdfsFileStatus3.patch FSDirectory#getListring(String src) has the following code: int i = 0; for (INode cur : contents) { listing[i] = createFileStatus(srcs+cur.getLocalName(), cur); i++; } So listing a directory will return an array of FileStatus. Each FileStatus element has the full path name. This increases the return message size and adds non-negligible CPU time to the operation. FSDirectory#getFileInfo(String) does not need to return the file name either. Another optimization is that in the version of FileStatus that's used in the wire protocol, the field path does not need to be Path; It could be a String or a byte array ideally. This could avoid unnecessary creation of the Path objects at NameNode, thus help reduce the GC problem observed when a large number of getFileInfo or getListing operations hit NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.