[jira] Updated: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1557: - Status: Open (was: Patch Available) Separate Storage from FSImage - Key: HDFS-1557 URL: https://issues.apache.org/jira/browse/HDFS-1557 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.21.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 0.23.0 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff FSImage currently derives from Storage and FSEditLog has to call methods directly on FSImage to access the filesystem. This JIRA is to separate the Storage class out into NNStorage so that FSEditLog is less dependent on FSImage. From this point, the other parts of the circular dependency should be easy to fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1557: - Attachment: HDFS-1557.diff Separate Storage from FSImage - Key: HDFS-1557 URL: https://issues.apache.org/jira/browse/HDFS-1557 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.21.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 0.23.0 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff FSImage currently derives from Storage and FSEditLog has to call methods directly on FSImage to access the filesystem. This JIRA is to separate the Storage class out into NNStorage so that FSEditLog is less dependent on FSImage. From this point, the other parts of the circular dependency should be easy to fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Kelly updated HDFS-1557: - Status: Patch Available (was: Open) Addressed Suresh's comments. {quote} NNStorage use synchronized method errorDirectory() to notify listeners of error. The listener implement synchrnonized method to handle the error. Is it possible for listeners (say FSImage) to be calling from its synchronized section, a synchronized method on NNStorage? This could cause dead locks. {quote} NNStorage now using CopyOnWriteArrayLists now, so errorDirectory is no longer synchronised. (see below) {quote} format(), registerListener() should this be synchronized as it manipulates listeners? {quote} listeners is now a CopyOnWriteArrayList. {quote} Storage#storageDirs are manipulated in NNStorage and Storage. The way it is done is not thread safe. Perhaps the existing code is thread safe it self. This could be addressed in a separate jira. {quote} Well spotted. I dont think any of this is threadsafe, given that storageDirs is modified in numerous places, and is being constantly being iterated over which could trigger a concurrent modification exception. I've made storageDirs and removedStorageDirs a CopyOnWriteArrayList now. {quote} Consider making the following method package private: isPreUpgradableLayout(), setRestoreFailedStorage() (both variants), attemptResotreRemovedStorage()... The are other methods that could only be used with in the package. This makes sure this is not a class for outside consumption. I would further consider making NNStorage non public class. {quote} Tightened up all the access privileges which I could on that class now to package private. Unfortunately, NNStorage itself must remain public because of UpgradeUtilities in testing being in a different package. Separate Storage from FSImage - Key: HDFS-1557 URL: https://issues.apache.org/jira/browse/HDFS-1557 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.21.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 0.23.0 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff FSImage currently derives from Storage and FSEditLog has to call methods directly on FSImage to access the filesystem. This JIRA is to separate the Storage class out into NNStorage so that FSEditLog is less dependent on FSImage. From this point, the other parts of the circular dependency should be easy to fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HDFS-1596) Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria reassigned HDFS-1596: --- Assignee: Harsh J Chouraria Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml Key: HDFS-1596 URL: https://issues.apache.org/jira/browse/HDFS-1596 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Patrick Angeles Assignee: Harsh J Chouraria Attachments: HDFS-7117.r1.diff The following configs are in core-default.xml, but are really read by the Secondary Namenode. These should be moved to hdfs-default.xml for consistency. property namefs.checkpoint.dir/name value${hadoop.tmp.dir}/dfs/namesecondary/value descriptionDetermines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. /description /property property namefs.checkpoint.edits.dir/name value${fs.checkpoint.dir}/value descriptionDetermines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directoires then teh edits is replicated in all of the directoires for redundancy. Default value is same as fs.checkpoint.dir /description /property property namefs.checkpoint.period/name value3600/value descriptionThe number of seconds between two periodic checkpoints. /description /property property namefs.checkpoint.size/name value67108864/value descriptionThe size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired. /description /property -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1596) Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J Chouraria updated HDFS-1596: Attachment: HDFS-7117.r1.diff Patch that updates all references of fs.checkpoint.* to their newer dfs.namenode.checkpoint.* keys. Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml Key: HDFS-1596 URL: https://issues.apache.org/jira/browse/HDFS-1596 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Patrick Angeles Attachments: HDFS-7117.r1.diff The following configs are in core-default.xml, but are really read by the Secondary Namenode. These should be moved to hdfs-default.xml for consistency. property namefs.checkpoint.dir/name value${hadoop.tmp.dir}/dfs/namesecondary/value descriptionDetermines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. /description /property property namefs.checkpoint.edits.dir/name value${fs.checkpoint.dir}/value descriptionDetermines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directoires then teh edits is replicated in all of the directoires for redundancy. Default value is same as fs.checkpoint.dir /description /property property namefs.checkpoint.period/name value3600/value descriptionThe number of seconds between two periodic checkpoints. /description /property property namefs.checkpoint.size/name value67108864/value descriptionThe size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired. /description /property -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1600) editsStored.xml cause release audit warning
[ https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987759#action_12987759 ] Erik Steffl commented on HDFS-1600: --- As explained in HDFS-1448 (test-patch comment) editsStored.xml should not be changed, it is a reference file for tests (test results are compared to this file). I did not know where to add it for it to be ignored (asked around but it seemed that the test-patch warning can be just ignored, there are other files that also miss the licence). editsStored.xml cause release audit warning --- Key: HDFS-1600 URL: https://issues.apache.org/jira/browse/HDFS-1600 Project: Hadoop HDFS Issue Type: Bug Components: build, test Reporter: Tsz Wo (Nicholas), SZE Assignee: Erik Steffl Attachments: h1600_20110126.patch The file {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}} for any new patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode
[ https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hairong Kuang updated HDFS-1335: Attachment: hdfsRPC.patch Here is the patch that makes HDFS to work with new method-based RPC compatibility protocol. HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode - Key: HDFS-1335 URL: https://issues.apache.org/jira/browse/HDFS-1335 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs client, name-node Affects Versions: 0.22.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: hdfsRPC.patch, hdfsRpcVersion.patch The idea is that for getProtocolVersion, NameNode checks if the client and server versions are compatible if the server version is greater than the client version. If no, throws a VersionIncompatible exception; otherwise, returns the server version. On the dfs client side, when creating a NameNode proxy, catches the VersionMismatch exception and then checks if the client version and the server version are compatible if the client version is greater than the server version. If not compatible, throws exception VersionIncomptible; otherwise, records the server version and continues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1557) Separate Storage from FSImage
[ https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987865#action_12987865 ] Jitendra Nath Pandey commented on HDFS-1557: I ran test-patch on the latest patch. There is a findbug error Synchronization performed on java.util.concurrent.CopyOnWriteArrayList in org.apache.hadoop.hdfs.server.namenode.NNStorage.attemptRestoreRemovedStorage(boolean) at line synchronized (this.removedStorageDirs) in NNStorage Overall test-patch results were [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 42 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system tests framework. The patch passed system tests framework compile. Separate Storage from FSImage - Key: HDFS-1557 URL: https://issues.apache.org/jira/browse/HDFS-1557 Project: Hadoop HDFS Issue Type: Sub-task Components: name-node Affects Versions: 0.21.0 Reporter: Ivan Kelly Assignee: Ivan Kelly Fix For: 0.23.0 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff FSImage currently derives from Storage and FSEditLog has to call methods directly on FSImage to access the filesystem. This JIRA is to separate the Storage class out into NNStorage so that FSEditLog is less dependent on FSImage. From this point, the other parts of the circular dependency should be easy to fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1598) ListPathsServlet excludes .*.crc files
[ https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987894#action_12987894 ] Tsz Wo (Nicholas), SZE commented on HDFS-1598: -- {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] -1 release audit. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). [exec] [exec] +1 system test framework. The patch passed system test framework compile. {noformat} The release warning is not related to this. See HDFS-1600. ListPathsServlet excludes .*.crc files -- Key: HDFS-1598 URL: https://issues.apache.org/jira/browse/HDFS-1598 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h1598_20110126.patch The {{.*.crc}} files are excluded by default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1084) TestDFSShell fails in trunk.
[ https://issues.apache.org/jira/browse/HDFS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987897#action_12987897 ] Konstantin Shvachko commented on HDFS-1084: --- We can fix it by using {{FileUtil.makeShellPath(File file, boolean makeCanonicalPath)}} here instead of {{getCanonicalPath()}}. A side note - the entire {{RawLocalFileSystem}} should probably invoke FileUtil methods rather than Shell. This would be a larger code cleanup, not for 0.22. TestDFSShell fails in trunk. Key: HDFS-1084 URL: https://issues.apache.org/jira/browse/HDFS-1084 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0 Reporter: Konstantin Shvachko Assignee: Po Cheung Priority: Blocker Fix For: 0.22.0 {{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I see it on my Linux box. Don't see it failing with Hudson, and the same test runs fine in 0.21 branch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1602) Revert HADOOP-4885 for it is doesn't work as expected.
Revert HADOOP-4885 for it is doesn't work as expected. -- Key: HDFS-1602 URL: https://issues.apache.org/jira/browse/HDFS-1602 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Boudnik NameNode storage restore functionality doesn't work (as HDFS-903 demonstrated). This needs to be either disabled, or removed, or fixed. This feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1598) ListPathsServlet excludes .*.crc files
[ https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1598: - Attachment: h1598_20110126_0.20.patch h1598_20110126_0.20.patch: for 0.20 ListPathsServlet excludes .*.crc files -- Key: HDFS-1598 URL: https://issues.apache.org/jira/browse/HDFS-1598 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch The {{.*.crc}} files are excluded by default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)
Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.) - Key: HDFS-1603 URL: https://issues.apache.org/jira/browse/HDFS-1603 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Boudnik While investigating failures on HDFS-1602 it became apparent that once a namenode storage volume is pulled out NN becomes completely sticky until {{FSImage:processIOError: removing storage}} move the storage from the active set. During this time none of normal NN operations are possible (e.g. creating a directory on HDFS timeouts eventually). In case of NFS this can be workaround'd with soft,intr,timeo,retrans settings. However, a better handling of the situation is apparently possible and needs to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1598) ListPathsServlet excludes .*.crc files
[ https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987904#action_12987904 ] Hadoop QA commented on HDFS-1598: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12469623/h1598_20110126_0.20.patch against trunk revision 1062052. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/134//console This message is automatically generated. ListPathsServlet excludes .*.crc files -- Key: HDFS-1598 URL: https://issues.apache.org/jira/browse/HDFS-1598 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch The {{.*.crc}} files are excluded by default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.
[ https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-1602: - Summary: Fix HADOOP-4885 for it is doesn't work as expected. (was: Revert HADOOP-4885 for it is doesn't work as expected.) Fix HADOOP-4885 for it is doesn't work as expected. --- Key: HDFS-1602 URL: https://issues.apache.org/jira/browse/HDFS-1602 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0 Reporter: Konstantin Boudnik NameNode storage restore functionality doesn't work (as HDFS-903 demonstrated). This needs to be either disabled, or removed, or fixed. This feature also fails HDFS-1496 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1598) ListPathsServlet excludes .*.crc files
[ https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-1598: - Resolution: Fixed Fix Version/s: 0.23.0 0.22.0 0.21.1 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I have committed this. ListPathsServlet excludes .*.crc files -- Key: HDFS-1598 URL: https://issues.apache.org/jira/browse/HDFS-1598 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.20.2 Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.21.1, 0.22.0, 0.23.0 Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch The {{.*.crc}} files are excluded by default. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix
[ https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-1496: - Attachment: HDFS-1496.sh This system level test reproduces the same issue with local NFS server and over-NFS mounted storage volumes TestStorageRestore is failing after HDFS-903 fix Key: HDFS-1496 URL: https://issues.apache.org/jira/browse/HDFS-1496 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0, 0.23.0 Reporter: Konstantin Boudnik Assignee: Hairong Kuang Priority: Blocker Fix For: 0.22.0 Attachments: HDFS-1496.sh TestStorageRestore seems to be failing after HDFS-903 commit. Running git bisect confirms it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure
[ https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987916#action_12987916 ] Tsz Wo (Nicholas), SZE commented on HDFS-1595: -- Todd, would you like to work on this with your idea? DFSClient may incorrectly detect datanode failure - Key: HDFS-1595 URL: https://issues.apache.org/jira/browse/HDFS-1595 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20.4 Reporter: Tsz Wo (Nicholas), SZE Priority: Critical Attachments: hdfs-1595-idea.txt Suppose a source datanode S is writing to a destination datanode D in a write pipeline. We have an implicit assumption that _if S catches an exception when it is writing to D, then D is faulty and S is fine._ As a result, DFSClient will take out D from the pipeline, reconstruct the write pipeline with the remaining datanodes and then continue writing . However, we find a case that the faulty machine F is indeed S but not D. In the case we found, F has a faulty network interface (or a faulty switch port) in such a way that the faulty network interface works fine when transferring a small amount of data, say 1MB, but it often fails when transferring a large amount of data, say 100MB. It is even worst if F is the first datanode in the pipeline. Consider the following: # DFSClient creates a pipeline with three datanodes. The first datanode is F. # F catches an IOException when writing to the second datanode. Then, F reports the second datanode has error. # DFSClient removes the second datanode from the pipeline and continue writing with the remaining datanode(s). # The pipeline now has two datanodes but (2) and (3) repeat. # Now, only F remains in the pipeline. DFSClient continues writing with one replica in F. # The write succeeds and DFSClient is able to *close the file successfully*. # The block is under replicated. The NameNode schedules replication from F to some other datanode D. # The replication fails for the same reason. D reports to the NameNode that the replica in F is corrupted. # The NameNode marks the replica in F is corrupted. # The block is corrupted since no replica is available. We were able to manually divide the replicas into small files and copy them out from F without fixing the hardware. The replicas seems uncorrupted. This is a *data availability problem*. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure
[ https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987920#action_12987920 ] Todd Lipcon commented on HDFS-1595: --- Yea, I can take this, but it may be a bit before I can get to it - mostly focusing on bug fixing at the moment (I would classify this as a missing feature more than a bug). DFSClient may incorrectly detect datanode failure - Key: HDFS-1595 URL: https://issues.apache.org/jira/browse/HDFS-1595 Project: Hadoop HDFS Issue Type: Bug Components: data-node, hdfs client Affects Versions: 0.20.4 Reporter: Tsz Wo (Nicholas), SZE Priority: Critical Attachments: hdfs-1595-idea.txt Suppose a source datanode S is writing to a destination datanode D in a write pipeline. We have an implicit assumption that _if S catches an exception when it is writing to D, then D is faulty and S is fine._ As a result, DFSClient will take out D from the pipeline, reconstruct the write pipeline with the remaining datanodes and then continue writing . However, we find a case that the faulty machine F is indeed S but not D. In the case we found, F has a faulty network interface (or a faulty switch port) in such a way that the faulty network interface works fine when transferring a small amount of data, say 1MB, but it often fails when transferring a large amount of data, say 100MB. It is even worst if F is the first datanode in the pipeline. Consider the following: # DFSClient creates a pipeline with three datanodes. The first datanode is F. # F catches an IOException when writing to the second datanode. Then, F reports the second datanode has error. # DFSClient removes the second datanode from the pipeline and continue writing with the remaining datanode(s). # The pipeline now has two datanodes but (2) and (3) repeat. # Now, only F remains in the pipeline. DFSClient continues writing with one replica in F. # The write succeeds and DFSClient is able to *close the file successfully*. # The block is under replicated. The NameNode schedules replication from F to some other datanode D. # The replication fails for the same reason. D reports to the NameNode that the replica in F is corrupted. # The NameNode marks the replica in F is corrupted. # The block is corrupted since no replica is available. We were able to manually divide the replicas into small files and copy them out from F without fixing the hardware. The replicas seems uncorrupted. This is a *data availability problem*. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix
[ https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated HDFS-1496: - Attachment: HDFS-1496.sh fixing missed extra NN ops after NFS mount is restored. TestStorageRestore is failing after HDFS-903 fix Key: HDFS-1496 URL: https://issues.apache.org/jira/browse/HDFS-1496 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 0.22.0, 0.23.0 Reporter: Konstantin Boudnik Assignee: Hairong Kuang Priority: Blocker Fix For: 0.22.0 Attachments: HDFS-1496.sh, HDFS-1496.sh TestStorageRestore seems to be failing after HDFS-903 commit. Running git bisect confirms it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1604) Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles -- Key: HDFS-1604 URL: https://issues.apache.org/jira/browse/HDFS-1604 Project: Hadoop HDFS Issue Type: New Feature Components: security Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur This JIRA is for the HDFS portion of HADOOP-7119 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Work started: (HDFS-1604) Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
[ https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-1604 started by Alejandro Abdelnur. Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles -- Key: HDFS-1604 URL: https://issues.apache.org/jira/browse/HDFS-1604 Project: Hadoop HDFS Issue Type: New Feature Components: security Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur This JIRA is for the HDFS portion of HADOOP-7119 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-1604) add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
[ https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur updated HDFS-1604: - Summary: add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles (was: Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles) add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles -- Key: HDFS-1604 URL: https://issues.apache.org/jira/browse/HDFS-1604 Project: Hadoop HDFS Issue Type: New Feature Components: security Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur This JIRA is for the HDFS portion of HADOOP-7119 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1594) When the disk becomes full Namenode is getting shutdown and not able to recover
[ https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987960#action_12987960 ] Konstantin Boudnik commented on HDFS-1594: -- bq . I am finding it hard to review this diff because it has lots of diffs that are not inherently connected with the attempted fix. That was exactly the point of asking for another round-trip ;) When the disk becomes full Namenode is getting shutdown and not able to recover --- Key: HDFS-1594 URL: https://issues.apache.org/jira/browse/HDFS-1594 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: 0.21.0, 0.21.1, 0.22.0 Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux Reporter: Devaraj K Attachments: hadoop-root-namenode-linux124.log, HDFS-1594.patch When the disk becomes full name node is shutting down and if we try to start after making the space available It is not starting and throwing the below exception. {code:xml} 2011-01-24 23:23:33,727 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed. java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117) at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:284) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:577) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:570) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538) 2011-01-24 23:23:33,729 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117) at org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:284) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:577) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:570) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538) 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: