[jira] [Updated] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-9129: Attachment: HDFS-9129.006.patch As per offline discussion, the v6 patch prefers synchronized {{getSafeModeTip}} to volatile fields. Some public methods for test in {{BlockManagerSafeMode}} are removed as well. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch, HDFS-9129.006.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961728#comment-14961728 ] Hadoop QA commented on HDFS-9250: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 26m 5s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 25s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 9s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 31s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 38s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 18s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 50m 24s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 30s | Tests passed in hadoop-hdfs-client. | | | | 111m 0s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.TestRenameWhileOpen | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767182/HDFS-9250.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 58590fe | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13035/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13035/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13035/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13035/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13035/console | This message was automatically generated. > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch, HDFS-9250.002.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8671) Add client support for HTTP/2 stream channels
[ https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HDFS-8671: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Pushed to HDFS-7966 branch. Thanks [~wheat9] for reviewing. > Add client support for HTTP/2 stream channels > - > > Key: HDFS-8671 > URL: https://issues.apache.org/jira/browse/HDFS-8671 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: HDFS-7966 > > Attachments: HDFS-8671-v0.patch, HDFS-8671-v1.patch > > > {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at > server side. > Now we implement Http2BlockReader using jetty http2-client in the POC branch, > but the final version of jetty 9.3.0 only accepts java8. > So here we plan to extend the functions of {{Http2StreamChannel}} to support > client side usage and then implement Http2BlockReader based on it. And we > still use jetty http2-client to write testcases to ensure that our http2 > implementation is valid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961710#comment-14961710 ] Akira AJISAKA commented on HDFS-8836: - Some comments from me. {code} +skipEmptyFileDelimiter = cf.getOpt("skip-empty-file") ? true : false; {code} 1. {{? true : false}} is redundant, can be removed. {code} if (skipEmptyFileDelimiter && src.stat.getLen() == 0) { continue; } FSDataInputStream in = src.fs.open(src.path); try { IOUtils.copyBytes(in, out, getConf(), false); if (delimiter != null) { out.write(delimiter.getBytes("UTF-8")); } } finally { in.close(); } {code} 2. Can we skip opening empty file if the file length is zero as follows? {code} if (src.stat.getLen() != 0) { try (FSDataInputStream in = src.fs.open(src.path)) { IOUtils.copyBytes(in, out, getConf(), false); writeDelimiter(out); } } else if (!skipEmptyFileDelimiter) { writeDelimiter(out); } private void writeDelimiter(FSDataOutputStream out) { ... } {code} {code:title=TestFsShellCopy#testCopyMerge} // directory with 3 files, should skip subdir {code} 3. An empty file is added, so there are 4 files. > Skip newline on empty files with getMerge -nl > - > > Key: HDFS-8836 > URL: https://issues.apache.org/jira/browse/HDFS-8836 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.1 >Reporter: Jan Filipiak >Assignee: Kanaka Kumar Avvaru >Priority: Trivial > Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, > HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch > > > Hello everyone, > I recently was in the need of using the new line option -nl with getMerge > because the files I needed to merge simply didn't had one. I was merging all > the files from one directory and unfortunately this directory also included > empty files, which effectively led to multiple newlines append after some > files. I needed to remove them manually afterwards. > In this situation it is maybe good to have another argument that allows > skipping empty files. > Thing one could try to implement this feature: > The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't > return the number of bytes copied which would be convenient as one could > skip append the new line when 0 bytes where copied or one would check the > file size before. > I posted this Idea on the mailing list > http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E > but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8836) Skip newline on empty files with getMerge -nl
[ https://issues.apache.org/jira/browse/HDFS-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961682#comment-14961682 ] Akira AJISAKA commented on HDFS-8836: - Sorry for late response. bq. One could set up many oozie coordinators that would wait for A/_SUCCESS and then start processing it. There would be no safe time to delete the file as one is always in danger of having one of the cooridnators not executed as they didn't find its "dataset" file. Reasonable for me. I'll review your patch. > Skip newline on empty files with getMerge -nl > - > > Key: HDFS-8836 > URL: https://issues.apache.org/jira/browse/HDFS-8836 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Affects Versions: 2.6.0, 2.7.1 >Reporter: Jan Filipiak >Assignee: Kanaka Kumar Avvaru >Priority: Trivial > Attachments: HDFS-8836-01.patch, HDFS-8836-02.patch, > HDFS-8836-03.patch, HDFS-8836-04.patch, HDFS-8836-05.patch > > > Hello everyone, > I recently was in the need of using the new line option -nl with getMerge > because the files I needed to merge simply didn't had one. I was merging all > the files from one directory and unfortunately this directory also included > empty files, which effectively led to multiple newlines append after some > files. I needed to remove them manually afterwards. > In this situation it is maybe good to have another argument that allows > skipping empty files. > Thing one could try to implement this feature: > The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't > return the number of bytes copied which would be convenient as one could > skip append the new line when 0 bytes where copied or one would check the > file size before. > I posted this Idea on the mailing list > http://mail-archives.apache.org/mod_mbox/hadoop-user/201507.mbox/%3C55B25140.3060005%40trivago.com%3E > but I didn't really get many responses, so I thought I my try this way. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8880) NameNode metrics logging
[ https://issues.apache.org/jira/browse/HDFS-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961680#comment-14961680 ] Arpit Agarwal commented on HDFS-8880: - I'll address #3 by eliminating the extra thread. I am not opposed to a more general solution, pending which this is still useful. I added this to scratch a personal itch as I often missed textual records of NN metrics stored with the service logs for easy grep'ing by metric name or timestamp. There was no intention to add this to every service. Coda Hale [slf4jreporter|https://dropwizard.github.io/metrics/3.1.0/manual/core/#man-core-reporters-slf4j] looks particularly interesting but IIUC the reporters also use a polling thread and there'd be at least some code added to each service to instantiate reporters. We can file a Jira for a more general solution as there was some community interest from YARN, and perhaps downstream. > NameNode metrics logging > > > Key: HDFS-8880 > URL: https://issues.apache.org/jira/browse/HDFS-8880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.8.0 > > Attachments: HDFS-8880.01.patch, HDFS-8880.02.patch, > HDFS-8880.03.patch, HDFS-8880.04.patch, namenode-metrics.log > > > The NameNode can periodically log metrics to help debugging when the cluster > is not setup with another metrics monitoring scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9250: Attachment: (was: HDFS-9250.002.patch) > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch, HDFS-9250.002.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9250: Status: Patch Available (was: Open) > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch, HDFS-9250.002.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961646#comment-14961646 ] Xiao Chen commented on HDFS-9250: - Hey [~andrew.wang], Thanks again for bringing up HDFS-8646, which looks complete to me. The version I encountered the {{ArrayStoreException}} is before your fix. Thus I think it's possible that the location is added without disk replica. Patch 002 is attached. Your suggestion of adding a precondition check sounds great, since otherwise we know it's gonna throw the {{ArrayStoreException}} for sure in that condition. I left the test case untouched just to run into the precondition block. Please review. Thanks! > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch, HDFS-9250.002.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9250: Attachment: HDFS-9250.002.patch > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch, HDFS-9250.002.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9250: Attachment: HDFS-9250.002.patch > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch, HDFS-9250.002.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8671) Add client support for HTTP/2 stream channels
[ https://issues.apache.org/jira/browse/HDFS-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961635#comment-14961635 ] Haohui Mai commented on HDFS-8671: -- LGTM. +1 > Add client support for HTTP/2 stream channels > - > > Key: HDFS-8671 > URL: https://issues.apache.org/jira/browse/HDFS-8671 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: HDFS-7966 > > Attachments: HDFS-8671-v0.patch, HDFS-8671-v1.patch > > > {{Http2StreamChannel}} is introduced in HDFS-8515 but can only be used at > server side. > Now we implement Http2BlockReader using jetty http2-client in the POC branch, > but the final version of jetty 9.3.0 only accepts java8. > So here we plan to extend the functions of {{Http2StreamChannel}} to support > client side usage and then implement Http2BlockReader based on it. And we > still use jetty http2-client to write testcases to ensure that our http2 > implementation is valid. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9251) Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly creating Files in tests code.
[ https://issues.apache.org/jira/browse/HDFS-9251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961607#comment-14961607 ] Colin Patrick McCabe commented on HDFS-9251: Thanks, [~eddyxu]. {code} 222 Preconditions.checkArgument(volume instanceof FsVolumeImpl); {code} We should not have these lines. The test is {{FsDatasetImplTestUtils.java}}, so we know that the volume must be an instance of {{FsVolumeImpl}}. The only way it could not be is if there was a bug, which we don't want to hide. Looks good aside from that. > Refactor TestWriteToReplica and TestFsDatasetImpl to avoid explicitly > creating Files in tests code. > --- > > Key: HDFS-9251 > URL: https://issues.apache.org/jira/browse/HDFS-9251 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-9251.00.patch, HDFS-9251.01.patch > > > In {{TestWriteToReplica}} and {{TestFsDatasetImpl}}, tests directly creates > block and metadata files: > {code} > replicaInfo.getBlockFile().createNewFile(); > replicaInfo.getMetaFile().createNewFile(); > {code} > It leaks the implementation details of {{FsDatasetImpl}}. This JIRA proposes > to use {{FsDatasetImplTestUtils}} (HDFS-9188) to create replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7087) Ability to list /.reserved
[ https://issues.apache.org/jira/browse/HDFS-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-7087: Status: Patch Available (was: Open) > Ability to list /.reserved > -- > > Key: HDFS-7087 > URL: https://issues.apache.org/jira/browse/HDFS-7087 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Andrew Wang >Assignee: Xiao Chen > Attachments: HDFS-7087.001.patch, HDFS-7087.002.patch, > HDFS-7087.draft.patch > > > We have two special paths within /.reserved now, /.reserved/.inodes and > /.reserved/raw. It seems like we should be able to list /.reserved to see > them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961598#comment-14961598 ] Hudson commented on HDFS-9253: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #507 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/507/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/test_fuse_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_stat_struct.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961593#comment-14961593 ] Jing Zhao commented on HDFS-7964: - Thanks for rebasing the patch, Daryn. The patch looks good to me. Some minor comments: # The following code uses whether the current thread holds the monitor to decide whether the edit should be async/sync. This way may be not direct to follow, also make it hard to guarantee the correctness of future code. Can we simply make the decision based on the op itself? {code} // only rpc calls not explicitly sync'ed on the log will be async. if (rpcCall != null && !Thread.holdsLock(this)) { edit = new AsyncEdit(this, op, rpcCall); } else { edit = new SyncEdit(this, op); } {code} # If requests keeps coming but the traffic is slow, the sync will happen only when the buffer is full, which means the response may be delayed? This may be a rare case in practice but maybe we should avoid it here. Can we make each iteration of the loop either fill the buffer or drain the pending queue? {code} if (edit != null) { // sync if requested by edit log. doSync = edit.logEdit(); syncWaitQ.add(edit); } else { // sync when editq runs dry, but have edits pending a sync. doSync = !syncWaitQ.isEmpty(); } {code} # The class InvalidOp has not been used. We can either remove it or use it in {{OP_INVALID}}. # Maybe we can do some further cleanup for {{RollingUpgradeOp}}. E.g., after adding classes like {{RollingUpgradeStartOp}} and {{RollingUpgradeFinalizeOp}}, we can put {{getInstance}} methods there and remove {{getStartInstance}} and {{getFinalizeInstance}}. # Is the main reason of having {{OpInstanceCache#get}} to minimize the code change? # It will be helpful to add a comment to explain the calculation logic of {{editsBatchedInSync}}. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961589#comment-14961589 ] Mingliang Liu commented on HDFS-9184: - Thanks for your comment [~daijy]. To address this, I think we have several options. # One is that we set the max length of caller context as 128 bytes. The {{CallerContext.Builder}} will throw an exception if end user is trying to set a longer context of >128 bytes length. It works just fine if we won't miss the _configurability_. # Another approach is to validate the length when we create a RPC {{Client$Connection}}. We can either truncate the caller context and log a warning, or we can throw an exception. We may have to change the {{ProtoUtils#makeRpcRequestHeader}} for this validation, as we need to read the config keys. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, > HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, > HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961574#comment-14961574 ] Daniel Dai commented on HDFS-9184: -- If we want to impose a limitation on the length, it is better to impose on the client side explicitly rather than silently truncate on datanode. This id will be used in other components for cross reference. If hdfs audit log shows a truncated id, it would be hard to cross reference to logs of other components. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, > HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, > HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS-7435.003.patch Add null check when creating iterator of storages > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9230) Report space overhead of unfinalized upgrade/rollingUpgrade
[ https://issues.apache.org/jira/browse/HDFS-9230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961561#comment-14961561 ] Andrew Wang commented on HDFS-9230: --- For hardlink upgrades, you could check the link count to see if a file in previous is still referenced in current. This is similar in cost to du. > Report space overhead of unfinalized upgrade/rollingUpgrade > --- > > Key: HDFS-9230 > URL: https://issues.apache.org/jira/browse/HDFS-9230 > Project: Hadoop HDFS > Issue Type: Improvement > Components: HDFS >Reporter: Xiaoyu Yao > > DataNodes do not delete block files during upgrades to allow rollback. This > is often confusing to administrators since they sometimes delete files before > finalize upgrade but don't see the DFS used space reduce. > Ideally, HDFS should report the un-finalized upgrade overhead along with its > message on NN UI "Upgrade in progress. Not yet finalized." Or, this can be > improve with better NN UI message and document that space won't be reclaimed > for deletion until upgrade is finalized. > For non-rolling upgrade, it is not easy to track this due to hard link. Say > NN initialized upgrade at T1, the block files on DNs that exist before T1 are > still under 'current' directory but is just a hard link to 'previous' > directory. When those files are deleted after T1 due to deletion, the block > file usage on DN won't get deleted until upgrade is finalized. > So we need to book keeping files created before T1 but deleted after T1 as > the un-finalized upgrade overhead here. > For rolling upgrade, it is relative easy to track space overhead as we are > not using hard link. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances
[ https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961557#comment-14961557 ] Mingliang Liu commented on HDFS-9241: - {quote} Old applications can still depend on hadoop-hdfs and nothing will break. However, the application might need to change a couple lines of code if it only wants to depend on hadoop-hdfs-client. {quote} It makes sense to me. Do you think we need to make {{HdfsConfigurationLoader}} public so that code depending on {{hadoop-hdfs-client}} is able to load the default resource forcefully (in case)? > HDFS clients can't construct HdfsConfiguration instances > > > Key: HDFS-9241 > URL: https://issues.apache.org/jira/browse/HDFS-9241 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Steve Loughran >Assignee: Mingliang Liu > Attachments: HDFS-9241.000.patch > > > the changes for the hdfs client classpath make instantiating > {{HdfsConfiguration}} from the client impossible; it only lives server side. > This breaks any app which creates one. > I know people will look at the {{@Private}} tag and say "don't do that then", > but it's worth considering precisely why I, at least, do this: it's the only > way to guarantee that the hdfs-default and hdfs-site resources get on the > classpath, including all the security settings. It's precisely the use case > which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code. > What am I meant to do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9236) Missing sanity check for block size during block recovery
[ https://issues.apache.org/jira/browse/HDFS-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961543#comment-14961543 ] Tony Wu commented on HDFS-9236: --- checksyle and pre-patch error are not related to this patch. > Missing sanity check for block size during block recovery > - > > Key: HDFS-9236 > URL: https://issues.apache.org/jira/browse/HDFS-9236 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Affects Versions: 2.7.1 >Reporter: Tony Wu >Assignee: Tony Wu > Attachments: HDFS-9236.001.patch, HDFS-9236.002.patch, > HDFS-9236.003.patch > > > Ran into an issue while running test against faulty data-node code. > Currently in DataNode.java: > {code:java} > /** Block synchronization */ > void syncBlock(RecoveringBlock rBlock, > List syncList) throws IOException { > … > // Calculate the best available replica state. > ReplicaState bestState = ReplicaState.RWR; > … > // Calculate list of nodes that will participate in the recovery > // and the new block size > List participatingList = new ArrayList(); > final ExtendedBlock newBlock = new ExtendedBlock(bpid, blockId, > -1, recoveryId); > switch(bestState) { > … > case RBW: > case RWR: > long minLength = Long.MAX_VALUE; > for(BlockRecord r : syncList) { > ReplicaState rState = r.rInfo.getOriginalReplicaState(); > if(rState == bestState) { > minLength = Math.min(minLength, r.rInfo.getNumBytes()); > participatingList.add(r); > } > } > newBlock.setNumBytes(minLength); > break; > … > } > … > nn.commitBlockSynchronization(block, > newBlock.getGenerationStamp(), newBlock.getNumBytes(), true, false, > datanodes, storages); > } > {code} > This code is called by the DN coordinating the block recovery. In the above > case, it is possible for none of the rState (reported by DNs with copies of > the replica being recovered) to match the bestState. This can either be > caused by faulty DN code or stale/modified/corrupted files on DN. When this > happens the DN will end up reporting the minLengh of Long.MAX_VALUE. > Unfortunately there is no check on the NN for replica length. See > FSNamesystem.java: > {code:java} > void commitBlockSynchronization(ExtendedBlock oldBlock, > long newgenerationstamp, long newlength, > boolean closeFile, boolean deleteblock, DatanodeID[] newtargets, > String[] newtargetstorages) throws IOException { > … > if (deleteblock) { > Block blockToDel = ExtendedBlock.getLocalBlock(oldBlock); > boolean remove = iFile.removeLastBlock(blockToDel) != null; > if (remove) { > blockManager.removeBlock(storedBlock); > } > } else { > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); > > // XXX block length is updated without any check <<< storedBlock.setNumBytes(newlength); > } > … > if (closeFile) { > LOG.info("commitBlockSynchronization(oldBlock=" + oldBlock > + ", file=" + src > + (copyTruncate ? ", newBlock=" + truncatedBlock > : ", newgenerationstamp=" + newgenerationstamp) > + ", newlength=" + newlength > + ", newtargets=" + Arrays.asList(newtargets) + ") successful"); > } else { > LOG.info("commitBlockSynchronization(" + oldBlock + ") successful"); > } > } > {code} > After this point the block length becomes Long.MAX_VALUE. Any subsequent > block report (even with correct length) will cause the block to be marked as > corrupted. Since this is block could be the last block of the file. If this > happens and the client goes away, NN won’t be able to recover the lease and > close the file because the last block is under-replicated. > I believe we need to have a sanity check for block size on both DN and NN to > prevent such case from happening. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
[ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961540#comment-14961540 ] Jitendra Nath Pandey commented on HDFS-9239: bq. .. Well before node liveness is affected by inundation of IBRs and FBRs, the namenode performance will degrade to unacceptable level... Yes, indeed. But if datanodes are marked as dead in that situation, that completely destabilizes the system. At that point, even if we kill certain offending jobs, it takes a while before NN can come back to an acceptable service level. This proposal should help prevent the death after NN is past the overloading scenario. I think ZKFC healthcheck should also be separated into a different queue or port so that they are not blocked by other messages in NN's call queue. A failover because NN is busy is not very helpful. The other NN also gets busy and we end up seeing active-standby flip-flop between the namenodes. > DataNode Lifeline Protocol: an alternative protocol for reporting DataNode > liveness > --- > > Key: HDFS-9239 > URL: https://issues.apache.org/jira/browse/HDFS-9239 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: DataNode-Lifeline-Protocol.pdf > > > This issue proposes introduction of a new feature: the DataNode Lifeline > Protocol. This is an RPC protocol that is responsible for reporting liveness > and basic health information about a DataNode to a NameNode. Compared to the > existing heartbeat messages, it is lightweight and not prone to resource > contention problems that can harm accurate tracking of DataNode liveness > currently. The attached design document contains more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961530#comment-14961530 ] Hudson commented on HDFS-9253: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2444 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2444/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/test_fuse_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_stat_struct.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Description: This patch changes the datastructures used for BlockInfos and Replicas to keep them sorted. This allows faster and more GC friendly handling of full block reports. Would like to hear peoples feedback on this change and also some help investigating/understanding a few outstanding issues if we are interested in moving forward with this. There seems to be some timing issues I hit when testing the patch, not sure if it is a bug in the patch or something else (most likely the earlier)... Tests that fail for me: The issues seems to be that the blocks are not on any storage, so no replication can occur causing the tests to fail in different ways. TestDecomission.testDecommision If I add a little sleep after the cleanup/delete things seem to work TestDFSStripedOutputStreamWithFailure A couple of tests fails in this class. was: This patch changes the datastructures used for BlockInfos and Replicas to keep them sorted. This allows faster and more GC friendly handling of full block reports. Would like to hear peoples feedback on this change and also some help investigating/understanding a few outstanding issues if we are interested in moving forward with this. There seems to be some timing issues I hit when testing the patch, not sure if it is a bug in the patch or something else (most likely the earlier)... Tests that fail for me: The issues seems to be that the blocks is not on any storage, so no replication can occurs causing the tests to fail in different ways. TestDecomission.testDecommision If I add a little sleep after the cleanup/delete things seem to work TestDFSStripedOutputStreamWithFailure A couple of tests fails in this class. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961480#comment-14961480 ] Jitendra Nath Pandey commented on HDFS-9184: I will commit it to trunk if there are no objections. [~aw], I think the latest patch addresses your concern of change in audit log by keeping it disabled by default. If you are ok, I would like to commit this to branch-2 as well. > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, > HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, > HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961475#comment-14961475 ] Jitendra Nath Pandey commented on HDFS-9184: +1 > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, > HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, > HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961458#comment-14961458 ] Arpit Agarwal commented on HDFS-4015: - bq. When the operator makes the name node leave safe mode manually, the -force option is not checked, even if there are orphaned blocks. Is this possible? If true, is it expected? [~liuml07], you are right. It's admittedly odd for an administrator to enter safe mode manually during startup but we should guard against the sequence of steps you described. I need to think about this some more but we should be able to remove the {{isInStartupSafeMode()}} from the clause below. i.e. never exit safe mode without the force flag if there bytes with future generation stamps. (The rollback exception is already handled elsewhere). {code} private synchronized void leave(boolean force) { ... if (!force && isInStartupSafeMode() && (blockManager.getBytesInFuture() > 0)) { {code} > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9184) Logging HDFS operation's caller context into audit logs
[ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961413#comment-14961413 ] Mingliang Liu commented on HDFS-9184: - The failing tests seem unrelated and can pass locally (Gentoo Linux and Mac). > Logging HDFS operation's caller context into audit logs > --- > > Key: HDFS-9184 > URL: https://issues.apache.org/jira/browse/HDFS-9184 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch, > HDFS-9184.002.patch, HDFS-9184.003.patch, HDFS-9184.004.patch, > HDFS-9184.005.patch, HDFS-9184.006.patch, HDFS-9184.007.patch > > > For a given HDFS operation (e.g. delete file), it's very helpful to track > which upper level job issues it. The upper level callers may be specific > Oozie tasks, MR jobs, and hive queries. One scenario is that the namenode > (NN) is abused/spammed, the operator may want to know immediately which MR > job should be blamed so that she can kill it. To this end, the caller context > contains at least the application-dependent "tracking id". > There are several existing techniques that may be related to this problem. > 1. Currently the HDFS audit log tracks the users of the the operation which > is obviously not enough. It's common that the same user issues multiple jobs > at the same time. Even for a single top level task, tracking back to a > specific caller in a chain of operations of the whole workflow (e.g.Oozie -> > Hive -> Yarn) is hard, if not impossible. > 2. HDFS integrated {{htrace}} support for providing tracing information > across multiple layers. The span is created in many places interconnected > like a tree structure which relies on offline analysis across RPC boundary. > For this use case, {{htrace}} has to be enabled at 100% sampling rate which > introduces significant overhead. Moreover, passing additional information > (via annotations) other than span id from root of the tree to leaf is a > significant additional work. > 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there > are some related discussion on this topic. The final patch implemented the > tracking id as a part of delegation token. This protects the tracking > information from being changed or impersonated. However, kerberos > authenticated connections or insecure connections don't have tokens. > [HADOOP-8779] proposes to use tokens in all the scenarios, but that might > mean changes to several upstream projects and is a major change in their > security implementation. > We propose another approach to address this problem. We also treat HDFS audit > log as a good place for after-the-fact root cause analysis. We propose to put > the caller id (e.g. Hive query id) in threadlocals. Specially, on client side > the threadlocal object is passed to NN as a part of RPC header (optional), > while on sever side NN retrieves it from header and put it to {{Handler}}'s > threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will record the > caller context for each operation. In this way, the existing code is not > affected. > It is still challenging to keep "lying" client from abusing the caller > context. Our proposal is to add a {{signature}} field to the caller context. > The client choose to provide its signature along with the caller id. The > operator may need to validate the signature at the time of offline analysis. > The NN is not responsible for validating the signature online. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9262) Reconfigure lazy writer interval on the fly
[ https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-9262: Affects Version/s: 2.7.0 > Reconfigure lazy writer interval on the fly > --- > > Key: HDFS-9262 > URL: https://issues.apache.org/jira/browse/HDFS-9262 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.7.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > > This is to reconfigure > {code} > dfs.datanode.lazywriter.interval.sec > {code} > without restarting DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9262) Reconfigure lazy writer interval on the fly
[ https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-9262: Description: This is to reconfigure {code} dfs.datanode.lazywriter.interval.sec {code} without restarting DN. was: This is to reconfigure dfs.datanode.lazywriter.interval.sec without restarting DN. > Reconfigure lazy writer interval on the fly > --- > > Key: HDFS-9262 > URL: https://issues.apache.org/jira/browse/HDFS-9262 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 2.7.0 >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > > This is to reconfigure > {code} > dfs.datanode.lazywriter.interval.sec > {code} > without restarting DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9262) Reconfigure lazy writer interval on the fly
[ https://issues.apache.org/jira/browse/HDFS-9262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaobing Zhou updated HDFS-9262: Description: This is to reconfigure dfs.datanode.lazywriter.interval.sec without restarting DN. > Reconfigure lazy writer interval on the fly > --- > > Key: HDFS-9262 > URL: https://issues.apache.org/jira/browse/HDFS-9262 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Xiaobing Zhou >Assignee: Xiaobing Zhou > > This is to reconfigure > dfs.datanode.lazywriter.interval.sec > without restarting DN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9262) Reconfigure lazy writer interval on the fly
Xiaobing Zhou created HDFS-9262: --- Summary: Reconfigure lazy writer interval on the fly Key: HDFS-9262 URL: https://issues.apache.org/jira/browse/HDFS-9262 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Xiaobing Zhou Assignee: Xiaobing Zhou -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9245) Fix findbugs warnings in hdfs-nfs/WriteCtx
[ https://issues.apache.org/jira/browse/HDFS-9245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961358#comment-14961358 ] Li Lu commented on HDFS-9245: - Yes I think using volatile here is appropriate. Findbugs also turned green for the fix. > Fix findbugs warnings in hdfs-nfs/WriteCtx > -- > > Key: HDFS-9245 > URL: https://issues.apache.org/jira/browse/HDFS-9245 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Mingliang Liu >Assignee: Mingliang Liu > Attachments: HDFS-9245.000.patch > > > There are findbugs warnings as follows, brought by [HDFS-9092]. > It seems fine to ignore them by write a filter rule in the > {{findbugsExcludeFile.xml}} file. > {code:xml} > instanceHash="592511935f7cb9e5f97ef4c99a6c46c2" instanceOccurrenceNum="0" > priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" > instanceOccurrenceMax="0"> > Inconsistent synchronization > > Inconsistent synchronization of > org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.offset; locked 75% of time > > > sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" > sourcefile="WriteCtx.java" end="314"> > At WriteCtx.java:[lines 40-314] > > In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx > > {code} > and > {code:xml} > instanceHash="4f3daa339eb819220f26c998369b02fe" instanceOccurrenceNum="0" > priority="2" abbrev="IS" type="IS2_INCONSISTENT_SYNC" cweid="366" > instanceOccurrenceMax="0"> > Inconsistent synchronization > > Inconsistent synchronization of > org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount; locked 50% of time > > > sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" > sourcefile="WriteCtx.java" end="314"> > At WriteCtx.java:[lines 40-314] > > In class org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx > > name="originalCount" primary="true" signature="I"> > sourcepath="org/apache/hadoop/hdfs/nfs/nfs3/WriteCtx.java" > sourcefile="WriteCtx.java"> > In WriteCtx.java > > > Field org.apache.hadoop.hdfs.nfs.nfs3.WriteCtx.originalCount > > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961338#comment-14961338 ] Hudson commented on HDFS-9253: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1279 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1279/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/test_fuse_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_stat_struct.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961317#comment-14961317 ] Hudson commented on HDFS-9257: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #506 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/506/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961316#comment-14961316 ] Hudson commented on HDFS-9205: -- ABORTED: Integrated in Hadoop-Hdfs-trunk-Java8 #506 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/506/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
[ https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9259: -- Assignee: Mingliang Liu Thanks [~liuml07]! I have assigned it to you. > Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario > -- > > Key: HDFS-9259 > URL: https://issues.apache.org/jira/browse/HDFS-9259 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma >Assignee: Mingliang Liu > > We recently found that cross-DC hdfs write could be really slow. Further > investigation identified that is due to SendBufferSize and ReceiveBufferSize > used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file > across DC with different SendBufferSize and ReceiveBufferSize values. The > results showed that c much faster than b; b is faster than a. > a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting). > b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning). > c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both) > HDFS-8829 has enabled scenario b. We would like to enable scenario c by > making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He > Tianyi] [~kanaka] [~vinayrpet]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
[ https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-9259: -- Description: We recently found that cross-DC hdfs write could be really slow. Further investigation identified that is due to SendBufferSize and ReceiveBufferSize used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file across DC with different SendBufferSize and ReceiveBufferSize values. The results showed that c much faster than b; b is faster than a. a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting). b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning). c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both) HDFS-8829 has enabled scenario b. We would like to enable scenario c by making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He Tianyi] [~kanaka] [~vinayrpet]. was: We recently found that cross-DC hdfs write could be really slow. Further investigation identified that is due to SendBufferSize and ReceiveBufferSize used for hdfs write. The test is to do "hadoop -fs -copyFromLocal" of a 256MB file across DC with different SendBufferSize and ReceiveBufferSize values. The results showed that c much faster than b; b is faster than a. a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting). b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning). c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both) HDFS-8829 has enabled scenario b. We would like to enable scenario c to make SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He Tianyi] [~kanaka] [~vinayrpet]. > Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario > -- > > Key: HDFS-9259 > URL: https://issues.apache.org/jira/browse/HDFS-9259 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > > We recently found that cross-DC hdfs write could be really slow. Further > investigation identified that is due to SendBufferSize and ReceiveBufferSize > used for hdfs write. The test ran "hadoop -fs -copyFromLocal" of a 256MB file > across DC with different SendBufferSize and ReceiveBufferSize values. The > results showed that c much faster than b; b is faster than a. > a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting). > b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning). > c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both) > HDFS-8829 has enabled scenario b. We would like to enable scenario c by > making SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He > Tianyi] [~kanaka] [~vinayrpet]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961301#comment-14961301 ] Hudson commented on HDFS-9253: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #558 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/558/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_stat_struct.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hd
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961279#comment-14961279 ] Hudson commented on HDFS-9253: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #543 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/543/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_stat_struct.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/test_fuse_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/i
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961242#comment-14961242 ] Hudson commented on HDFS-9253: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2492 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2492/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/test_fuse_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-na
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961226#comment-14961226 ] Hudson commented on HDFS-9205: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2443 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2443/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961227#comment-14961227 ] Hudson commented on HDFS-9257: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2443 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2443/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.
[ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961221#comment-14961221 ] Hadoop QA commented on HDFS-9249: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 16s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 46s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 37s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 46s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 35s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 53m 4s | Tests failed in hadoop-hdfs. | | | | 104m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestNodeCount | | | hadoop.hdfs.server.namenode.TestBackupNode | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767098/HDFS-9249.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 52ac73f | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13034/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13034/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13034/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13034/console | This message was automatically generated. > NPE thrown if an IOException is thrown in NameNode. > - > > Key: HDFS-9249 > URL: https://issues.apache.org/jira/browse/HDFS-9249 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch > > > This issue was found when running test case > TestBackupNode.testCheckpointNode, but upon closer look, the problem is not > due to the test case. > Looks like an IOException was thrown in > try { > initializeGenericKeys(conf, nsId, namenodeId); > initialize(conf); > try { > haContext.writeLock(); > state.prepareToEnterState(haContext); > state.enterState(haContext); > } finally { > haContext.writeUnlock(); > } > causing the namenode to stop, but the namesystem was not yet properly > instantiated, causing NPE. > I tried to reproduce locally, but to no avail. > Because I could not reproduce the bug, and the log does not indicate what > caused the IOException, I suggest make this a supportability JIRA to log the > exception for future improvement. > Stacktrace > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906) > at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827) > at > org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130) > The last few lines of log: > 2015-10-14 19:45:07,807 INFO namenode.NameNode > (NameNode.java:createNameNode(1422)) - createNameNode
[jira] [Commented] (HDFS-8766) Implement a libhdfs(3) compatible API
[ https://issues.apache.org/jira/browse/HDFS-8766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961217#comment-14961217 ] Haohui Mai commented on HDFS-8766: -- Thanks for updating the patch. Some comments: 1. Remove {[hdfs.h}}, {{c_api_test.cc}} in this patch and reuse the code existing repo, as HDFS-9207 and HDFS-9253 have landed. 2. Remove {{hdfs_macros.h}} and use {{unique_ptr}}. 3. Separate the bug fixes in {{hadoop-hdfs-project/hadoop-hdfs-client/src/main/native/libhdfspp/lib/reader/remote_block_reader_impl.h}} into another jira. 4. Rename {{HdfsInternal::pread}} to {{HdfsInternal::Pread}} to follow the Google C++ styling guide. 5. Separate the implementation of the C API and the definition of {{HdfsInternal}} in different files. Haven't closely looked into the logics yet -- the patch should be much smaller and cleaner after the above changes. Will do it in the next round. > Implement a libhdfs(3) compatible API > - > > Key: HDFS-8766 > URL: https://issues.apache.org/jira/browse/HDFS-8766 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-8766.HDFS-8707.000.patch, > HDFS-8766.HDFS-8707.001.patch, HDFS-8766.HDFS-8707.002.patch, > HDFS-8766.HDFS-8707.003.patch, HDFS-8766.HDFS-8707.004.patch, > HDFS-8766.HDFS-8707.005.patch, HDFS-8766.HDFS-8707.006.patch > > > Add a synchronous API that is compatible with the hdfs.h header used in > libhdfs and libhdfs3. This will make it possible for projects using > libhdfs/libhdfs3 to relink against libhdfspp with minimal changes. > This also provides a pure C interface that can be linked against projects > that aren't built in C++11 mode for various reasons but use the same > compiler. It also allows many other programming languages to access > libhdfspp through builtin FFI interfaces. > The libhdfs API is very similar to the posix file API which makes it easier > for programs built using posix filesystem calls to be modified to access HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9255) Consolidate block recovery related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961210#comment-14961210 ] Hadoop QA commented on HDFS-9255: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 13s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 33s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 25s | The applied patch generated 8 new checkstyle issues (total was 512, now 493). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 34s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 50m 53s | Tests failed in hadoop-hdfs. | | | | 97m 15s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767097/HDFS-9255.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 52ac73f | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13033/console | This message was automatically generated. > Consolidate block recovery related implementation into a single class > - > > Key: HDFS-9255 > URL: https://issues.apache.org/jira/browse/HDFS-9255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, > HDFS-9255.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9254) HDFS Secure Mode Documentation updates
[ https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961186#comment-14961186 ] Arpit Agarwal commented on HDFS-9254: - The test failures do seem to be caused by my patch, oddly. I'll take a look. bq. in the patch, you say that `d...@realm.tld` is allowed. I recall seeing some JIRAs where people were saying you get a stack trace unless you have the /HOST value of some kind or other Let me verify that in a test cluster, thanks for looking at the patch. > HDFS Secure Mode Documentation updates > -- > > Key: HDFS-9254 > URL: https://issues.apache.org/jira/browse/HDFS-9254 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9254.01.patch > > > Some Kerberos configuration parameters are not documented well enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-9253: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~wheat9] for the contribution. > Refactor tests of libhdfs into a directory > -- > > Key: HDFS-9253 > URL: https://issues.apache.org/jira/browse/HDFS-9253 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch, > HDFS-9253.002.patch > > > This jira proposes to refactor the current tests in libhdfs into a separate > directory. The refactor opens up the opportunity to reuse tests in libhdfs, > libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of > different implementation of the libhdfs API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961182#comment-14961182 ] Haohui Mai edited comment on HDFS-9253 at 10/16/15 6:42 PM: I've committed the patch to trunk and branch-2. Thanks Jing for the reviews. was (Author: wheat9): I've committed the patch to trunk and branch-2. Thanks [~wheat9] for the contribution. > Refactor tests of libhdfs into a directory > -- > > Key: HDFS-9253 > URL: https://issues.apache.org/jira/browse/HDFS-9253 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.8.0 > > Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch, > HDFS-9253.002.patch > > > This jira proposes to refactor the current tests in libhdfs into a separate > directory. The refactor opens up the opportunity to reuse tests in libhdfs, > libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of > different implementation of the libhdfs API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS-7435.002.patch Merged with latest head > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks is not on any storage, so no > replication can occurs causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961174#comment-14961174 ] Hudson commented on HDFS-9253: -- FAILURE: Integrated in Hadoop-trunk-Commit #8652 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8652/]) HDFS-9253. Refactor tests of libhdfs into a directory. Contributed by (wheat9: rev 79b8d60d085ae196b05ff4ab511ff89f652e3c55) * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/fuse_workload.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_trash.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_http_client.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_web.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/native_mini_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/hdfs_json_parser.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/vecsum.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_connect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs_test.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/exception.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/contrib/libwebhdfs/src/test_libwebhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_stat_struct.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_file_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/test/test_fuse_dfs.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_read.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_libhdfs_write.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/fuse-dfs/fuse_context_handle.h * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/expect.c * hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/native_mini_dfs.h * hadoop-hdfs-project/hadoop-hdfs-native-client/pom.xml * hadoop-hdfs-project/hadoop-hdfs-native-client/sr
[jira] [Commented] (HDFS-9208) Disabling atime may fail clients like distCp
[ https://issues.apache.org/jira/browse/HDFS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961132#comment-14961132 ] Hadoop QA commented on HDFS-9208: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 45s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 39s | The applied patch generated 1 new checkstyle issues (total was 14, now 15). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 42s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 38s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 53s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 41s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 69m 14s | Tests failed in hadoop-hdfs. | | | | 121m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.TestCrcCorruption | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767090/HDFS-9208.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 52ac73f | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13032/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13032/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13032/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13032/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13032/console | This message was automatically generated. > Disabling atime may fail clients like distCp > > > Key: HDFS-9208 > URL: https://issues.apache.org/jira/browse/HDFS-9208 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9208.patch > > > When atime is disabled, {{setTimes()}} throws an exception if the passed-in > atime is not -1. But since atime is not -1, distCp fails when it tries to > set the mtime and atime. > There are several options: > 1) make distCp check for 0 atime and call {{setTimes()}} with -1. I am not > very enthusiastic about it. > 2) make NN also accept 0 atime in addition to -1, when the atime support is > disabled. > 3) support setting mtime & atime regardless of the atime support. The main > reason why atime is disabled is to avoid edit logging/syncing during > {{getBlockLocations()}} read calls. Explicit setting can be allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9261) Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group
Rakesh R created HDFS-9261: -- Summary: Erasure Coding: Skip encoding the data cells if all the parity data streamers are failed for the current block group Key: HDFS-9261 URL: https://issues.apache.org/jira/browse/HDFS-9261 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Priority: Minor {{DFSStripedOutputStream}} will continue writing with minimum number (dataBlockNum) of live datanodes. It won't replace the failed datanodes immediately for the current block group. Consider a case where all the parity data streamers are failed, now it is unnecessary to encode the data block cells and generate the parity data. This is a corner case where it can skip {{writeParityCells()}} step. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9129) Move the safemode block count into BlockManager
[ https://issues.apache.org/jira/browse/HDFS-9129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961110#comment-14961110 ] Mingliang Liu commented on HDFS-9129: - The failing tests can pass locally. Addressing the findbugs and checkstyle warnings. > Move the safemode block count into BlockManager > --- > > Key: HDFS-9129 > URL: https://issues.apache.org/jira/browse/HDFS-9129 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Haohui Mai >Assignee: Mingliang Liu > Attachments: HDFS-9129.000.patch, HDFS-9129.001.patch, > HDFS-9129.002.patch, HDFS-9129.003.patch, HDFS-9129.004.patch, > HDFS-9129.005.patch > > > The {{SafeMode}} needs to track whether there are enough blocks so that the > NN can get out of the safemode. These fields can moved to the > {{BlockManager}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9098) Erasure coding: emulate race conditions among striped streamers in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9098: Affects Version/s: 3.0.0 > Erasure coding: emulate race conditions among striped streamers in write > pipeline > - > > Key: HDFS-9098 > URL: https://issues.apache.org/jira/browse/HDFS-9098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > Apparently the interleaving of events among {{StripedDataStreamer}}'s is very > tricky to handle. [~walter.k.su] and [~jingzhao] have discussed several race > conditions under HDFS-9040. > Let's use FaultInjector to emulate different combinations of interleaved > events. > In particular, we should consider inject delays in the following places: > # {{Streamer#endBlock}} > # {{Streamer#locateFollowingBlock}} > # {{Streamer#updateBlockForPipeline}} > # {{Streamer#updatePipeline}} > # {{OutputStream#writeChunk}} > # {{OutputStream#close}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9098) Erasure coding: emulate race conditions among striped streamers in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-9098: Component/s: erasure-coding > Erasure coding: emulate race conditions among striped streamers in write > pipeline > - > > Key: HDFS-9098 > URL: https://issues.apache.org/jira/browse/HDFS-9098 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding >Affects Versions: 3.0.0 >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > Apparently the interleaving of events among {{StripedDataStreamer}}'s is very > tricky to handle. [~walter.k.su] and [~jingzhao] have discussed several race > conditions under HDFS-9040. > Let's use FaultInjector to emulate different combinations of interleaved > events. > In particular, we should consider inject delays in the following places: > # {{Streamer#endBlock}} > # {{Streamer#locateFollowingBlock}} > # {{Streamer#updateBlockForPipeline}} > # {{Streamer#updatePipeline}} > # {{OutputStream#writeChunk}} > # {{OutputStream#close}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9252) Change TestFileTruncate to use FsDatasetTestUtils to get block file size and genstamp.
[ https://issues.apache.org/jira/browse/HDFS-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-9252: Summary: Change TestFileTruncate to use FsDatasetTestUtils to get block file size and genstamp. (was: Change TestFileTruncate to FsDatasetTestUtils to get block file size and genstamp.) > Change TestFileTruncate to use FsDatasetTestUtils to get block file size and > genstamp. > -- > > Key: HDFS-9252 > URL: https://issues.apache.org/jira/browse/HDFS-9252 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-9252.00.patch > > > {{TestFileTruncate}} verifies block size and genstamp by directly accessing > the local filesystem, e.g.: > {code} > assertTrue(cluster.getBlockMetadataFile(dn0, >newBlock.getBlock()).getName().endsWith( >newBlock.getBlock().getGenerationStamp() + ".meta")); > {code} > Lets abstract the fsdataset-special logic behind FsDatasetTestUtils. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9260: Assignee: Staffan Friberg > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks is not on any storage, so no > replication can occurs causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
[ https://issues.apache.org/jira/browse/HDFS-9259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961091#comment-14961091 ] Mingliang Liu commented on HDFS-9259: - Hi [~mingma], can I work on this, if we reach consensus on the issue itself? > Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario > -- > > Key: HDFS-9259 > URL: https://issues.apache.org/jira/browse/HDFS-9259 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Ming Ma > > We recently found that cross-DC hdfs write could be really slow. Further > investigation identified that is due to SendBufferSize and ReceiveBufferSize > used for hdfs write. The test is to do "hadoop -fs -copyFromLocal" of a 256MB > file across DC with different SendBufferSize and ReceiveBufferSize values. > The results showed that c much faster than b; b is faster than a. > a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting). > b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning). > c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both) > HDFS-8829 has enabled scenario b. We would like to enable scenario c to make > SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He Tianyi] > [~kanaka] [~vinayrpet]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961087#comment-14961087 ] Hudson commented on HDFS-9257: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2491 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2491/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961086#comment-14961086 ] Hudson commented on HDFS-9205: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2491 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2491/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9253) Refactor tests of libhdfs into a directory
[ https://issues.apache.org/jira/browse/HDFS-9253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961075#comment-14961075 ] Jing Zhao commented on HDFS-9253: - +1 > Refactor tests of libhdfs into a directory > -- > > Key: HDFS-9253 > URL: https://issues.apache.org/jira/browse/HDFS-9253 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Attachments: HDFS-9253.000.patch, HDFS-9253.001.patch, > HDFS-9253.002.patch > > > This jira proposes to refactor the current tests in libhdfs into a separate > directory. The refactor opens up the opportunity to reuse tests in libhdfs, > libwebhdfs and libhdfspp in HDFS-8707 and to also allow cross validation of > different implementation of the libhdfs API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961049#comment-14961049 ] Hudson commented on HDFS-9257: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #542 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/542/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961048#comment-14961048 ] Hudson commented on HDFS-9205: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #542 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/542/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9208) Disabling atime may fail clients like distCp
[ https://issues.apache.org/jira/browse/HDFS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9208: - Target Version/s: 2.8.0 Status: Patch Available (was: Open) > Disabling atime may fail clients like distCp > > > Key: HDFS-9208 > URL: https://issues.apache.org/jira/browse/HDFS-9208 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9208.patch > > > When atime is disabled, {{setTimes()}} throws an exception if the passed-in > atime is not -1. But since atime is not -1, distCp fails when it tries to > set the mtime and atime. > There are several options: > 1) make distCp check for 0 atime and call {{setTimes()}} with -1. I am not > very enthusiastic about it. > 2) make NN also accept 0 atime in addition to -1, when the atime support is > disabled. > 3) support setting mtime & atime regardless of the atime support. The main > reason why atime is disabled is to avoid edit logging/syncing during > {{getBlockLocations()}} read calls. Explicit setting can be allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS Block and Replica Management 20151013.pdf > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks is not on any storage, so no > replication can occurs causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: (was: HDFS Block and Replica Management 20151013.pdf) > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks is not on any storage, so no > replication can occurs causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS Block and Replica Management 20151013.pdf > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks is not on any storage, so no > replication can occurs causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Staffan Friberg updated HDFS-9260: -- Attachment: HDFS-7435.001.patch > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg > Attachments: HDFS-7435.001.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks is not on any storage, so no > replication can occurs causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961037#comment-14961037 ] Hudson commented on HDFS-9257: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #557 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/557/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961036#comment-14961036 ] Hudson commented on HDFS-9205: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #557 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/557/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
Staffan Friberg created HDFS-9260: - Summary: Improve performance and GC friendliness of startup and FBRs Key: HDFS-9260 URL: https://issues.apache.org/jira/browse/HDFS-9260 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode, performance Affects Versions: 2.7.1 Reporter: Staffan Friberg This patch changes the datastructures used for BlockInfos and Replicas to keep them sorted. This allows faster and more GC friendly handling of full block reports. Would like to hear peoples feedback on this change and also some help investigating/understanding a few outstanding issues if we are interested in moving forward with this. There seems to be some timing issues I hit when testing the patch, not sure if it is a bug in the patch or something else (most likely the earlier)... Tests that fail for me: The issues seems to be that the blocks is not on any storage, so no replication can occurs causing the tests to fail in different ways. TestDecomission.testDecommision If I add a little sleep after the cleanup/delete things seem to work TestDFSStripedOutputStreamWithFailure A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.
[ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-9249: -- Attachment: HDFS-9249.002.patch Attaching rev2. This patch adds a test case that verifies the fix for the NPE when the authentication of backup node is incorrectly configured. Thanks [~steve_l] for thoughtful comments. > NPE thrown if an IOException is thrown in NameNode. > - > > Key: HDFS-9249 > URL: https://issues.apache.org/jira/browse/HDFS-9249 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch > > > This issue was found when running test case > TestBackupNode.testCheckpointNode, but upon closer look, the problem is not > due to the test case. > Looks like an IOException was thrown in > try { > initializeGenericKeys(conf, nsId, namenodeId); > initialize(conf); > try { > haContext.writeLock(); > state.prepareToEnterState(haContext); > state.enterState(haContext); > } finally { > haContext.writeUnlock(); > } > causing the namenode to stop, but the namesystem was not yet properly > instantiated, causing NPE. > I tried to reproduce locally, but to no avail. > Because I could not reproduce the bug, and the log does not indicate what > caused the IOException, I suggest make this a supportability JIRA to log the > exception for future improvement. > Stacktrace > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906) > at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827) > at > org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130) > The last few lines of log: > 2015-10-14 19:45:07,807 INFO namenode.NameNode > (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint] > 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started > (again) > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is > hdfs://localhost:37835 > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use > localhost:37835 to access this namenode/service. > 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster > (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster > 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem > (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for > active state > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time > for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of > syncs: 4 SyncTimes(ms): 2 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, > exiting > 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003 > 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_
[jira] [Assigned] (HDFS-9098) Erasure coding: emulate race conditions among striped streamers in write pipeline
[ https://issues.apache.org/jira/browse/HDFS-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang reassigned HDFS-9098: --- Assignee: Zhe Zhang > Erasure coding: emulate race conditions among striped streamers in write > pipeline > - > > Key: HDFS-9098 > URL: https://issues.apache.org/jira/browse/HDFS-9098 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > > Apparently the interleaving of events among {{StripedDataStreamer}}'s is very > tricky to handle. [~walter.k.su] and [~jingzhao] have discussed several race > conditions under HDFS-9040. > Let's use FaultInjector to emulate different combinations of interleaved > events. > In particular, we should consider inject delays in the following places: > # {{Streamer#endBlock}} > # {{Streamer#locateFollowingBlock}} > # {{Streamer#updateBlockForPipeline}} > # {{Streamer#updatePipeline}} > # {{OutputStream#writeChunk}} > # {{OutputStream#close}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9254) HDFS Secure Mode Documentation updates
[ https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960995#comment-14960995 ] Steve Loughran commented on HDFS-9254: -- I'd thought "underdocumented" is a complete summary of kerberos info —good to see you trying to fix this. in the patch, you say that `d...@realm.tld` is allowed. I recall seeing some JIRAs where people were saying you get a stack trace unless you have the /HOST value of some kind or other > HDFS Secure Mode Documentation updates > -- > > Key: HDFS-9254 > URL: https://issues.apache.org/jira/browse/HDFS-9254 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9254.01.patch > > > Some Kerberos configuration parameters are not documented well enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9255) Consolidate block recovery related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9255: Attachment: HDFS-9255.03.patch > Consolidate block recovery related implementation into a single class > - > > Key: HDFS-9255 > URL: https://issues.apache.org/jira/browse/HDFS-9255 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Walter Su >Assignee: Walter Su >Priority: Minor > Attachments: HDFS-9255.01.patch, HDFS-9255.02.patch, > HDFS-9255.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances
[ https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960987#comment-14960987 ] Haohui Mai commented on HDFS-9241: -- bq. the changes for the hdfs client classpath make instantiating HdfsConfiguration from the client impossible; it only lives server side. This breaks any app which creates one. I'm trying to understand the use cases of applications creating a {{HdfsConfiguration}} instance. Is it because that the apps need a way to force the hdfs configurations to be loaded? Old applications can still depend on {{hadoop-hdfs}} and nothing will break. However, the application might need to change a couple lines of code if it only wants to depend on {{hadoop-hdfs-client}}. Thoughts? > HDFS clients can't construct HdfsConfiguration instances > > > Key: HDFS-9241 > URL: https://issues.apache.org/jira/browse/HDFS-9241 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Steve Loughran >Assignee: Mingliang Liu > Attachments: HDFS-9241.000.patch > > > the changes for the hdfs client classpath make instantiating > {{HdfsConfiguration}} from the client impossible; it only lives server side. > This breaks any app which creates one. > I know people will look at the {{@Private}} tag and say "don't do that then", > but it's worth considering precisely why I, at least, do this: it's the only > way to guarantee that the hdfs-default and hdfs-site resources get on the > classpath, including all the security settings. It's precisely the use case > which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code. > What am I meant to do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances
[ https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960983#comment-14960983 ] Haohui Mai commented on HDFS-9241: -- bq. One other thing to consider is "would we expect thin clients to ever instantiate this class?". If so, should it be in that JAR. My answer is no -- the current implementation has a class {{HdfsConfigurationLoader}} to load the configurations that serves the original purposes of {{HdfsConfiguration}} on the client side. The reason is that {{HdfsConfiguration}} are used by both the client and the server side. It contains deprecated keys for the server side, which IMO should not be exposed to the clients at all. > HDFS clients can't construct HdfsConfiguration instances > > > Key: HDFS-9241 > URL: https://issues.apache.org/jira/browse/HDFS-9241 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Steve Loughran >Assignee: Mingliang Liu > Attachments: HDFS-9241.000.patch > > > the changes for the hdfs client classpath make instantiating > {{HdfsConfiguration}} from the client impossible; it only lives server side. > This breaks any app which creates one. > I know people will look at the {{@Private}} tag and say "don't do that then", > but it's worth considering precisely why I, at least, do this: it's the only > way to guarantee that the hdfs-default and hdfs-site resources get on the > classpath, including all the security settings. It's precisely the use case > which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code. > What am I meant to do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9250) LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty
[ https://issues.apache.org/jira/browse/HDFS-9250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-9250: Status: Open (was: Patch Available) > LocatedBlock#addCachedLoc may throw ArrayStoreException when cache is empty > --- > > Key: HDFS-9250 > URL: https://issues.apache.org/jira/browse/HDFS-9250 > Project: Hadoop HDFS > Issue Type: Bug > Components: HDFS >Reporter: Xiao Chen >Assignee: Xiao Chen > Attachments: HDFS-9250.001.patch > > > We may see the following exception: > {noformat} > java.lang.ArrayStoreException > at java.util.ArrayList.toArray(ArrayList.java:389) > at > org.apache.hadoop.hdfs.protocol.LocatedBlock.addCachedLoc(LocatedBlock.java:205) > at > org.apache.hadoop.hdfs.server.namenode.CacheManager.setCachedLocations(CacheManager.java:907) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1974) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873) > {noformat} > The cause is that in LocatedBlock.java, when {{addCachedLoc}}: > - Passed in parameter {{loc}}, which is type {{DatanodeDescriptor}}, is added > to {{cachedList}} > - {{cachedList}} was assigned to {{EMPTY_LOCS}}, which is type > {{DatanodeInfoWithStorage}}. > Both {{DatanodeDescriptor}} and {{DatanodeInfoWithStorage}} are subclasses of > {{DatanodeInfo}} but do not inherit from each other, resulting in the > ArrayStoreException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960974#comment-14960974 ] Hudson commented on HDFS-9205: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1278 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1278/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960975#comment-14960975 ] Hudson commented on HDFS-9257: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1278 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1278/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9208) Disabling atime may fail clients like distCp
[ https://issues.apache.org/jira/browse/HDFS-9208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9208: - Attachment: HDFS-9208.patch The {{setTimes()}} call through {{getBlockLocations()}} does not force update, while explicit call does. So it is a simple matter of removing the check. The behavior of {{getBlockLocations()}} is not changed. > Disabling atime may fail clients like distCp > > > Key: HDFS-9208 > URL: https://issues.apache.org/jira/browse/HDFS-9208 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9208.patch > > > When atime is disabled, {{setTimes()}} throws an exception if the passed-in > atime is not -1. But since atime is not -1, distCp fails when it tries to > set the mtime and atime. > There are several options: > 1) make distCp check for 0 atime and call {{setTimes()}} with -1. I am not > very enthusiastic about it. > 2) make NN also accept 0 atime in addition to -1, when the atime support is > disabled. > 3) support setting mtime & atime regardless of the atime support. The main > reason why atime is disabled is to avoid edit logging/syncing during > {{getBlockLocations()}} read calls. Explicit setting can be allowed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9259) Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario
Ming Ma created HDFS-9259: - Summary: Make SO_SNDBUF size configurable at DFSClient side for hdfs write scenario Key: HDFS-9259 URL: https://issues.apache.org/jira/browse/HDFS-9259 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma We recently found that cross-DC hdfs write could be really slow. Further investigation identified that is due to SendBufferSize and ReceiveBufferSize used for hdfs write. The test is to do "hadoop -fs -copyFromLocal" of a 256MB file across DC with different SendBufferSize and ReceiveBufferSize values. The results showed that c much faster than b; b is faster than a. a. SendBufferSize=128k, ReceiveBufferSize=128k (hdfs default setting). b. SendBufferSize=128K, ReceiveBufferSize=not set(TCP auto tuning). c. SendBufferSize=not set, ReceiveBufferSize=not set(TCP auto tuning for both) HDFS-8829 has enabled scenario b. We would like to enable scenario c to make SendBufferSize configurable at DFSClient side. Cc: [~cmccabe] [~He Tianyi] [~kanaka] [~vinayrpet]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9254) HDFS Secure Mode Documentation updates
[ https://issues.apache.org/jira/browse/HDFS-9254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960901#comment-14960901 ] Arpit Agarwal commented on HDFS-9254: - Documentation-only patch, needs no new tests. Test failures are unrelated to the patch. > HDFS Secure Mode Documentation updates > -- > > Key: HDFS-9254 > URL: https://issues.apache.org/jira/browse/HDFS-9254 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.7.1 >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Attachments: HDFS-9254.01.patch > > > Some Kerberos configuration parameters are not documented well enough. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8287) DFSStripedOutputStream.writeChunk should not wait for writing parity
[ https://issues.apache.org/jira/browse/HDFS-8287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960884#comment-14960884 ] Rakesh R commented on HDFS-8287: Thank you [~kaisasak] for taking care this, latest patch mostly looks fine to me. There are some more comments, could you please take a look at it. # There are few minor checkstyle warnings, please fix it. # I failed to understand the purpose of synchronization here. Is this required? {code} synchronized public CellBuffers flip() {code} # During DFSStripedOutputStream#closeImpl, I could see a corner case - number of bytes reaches striped boundary. Assume writeParityCells() has submitted a parity generator task and again assume the client has invoked #close() function. Now, generateParityCellsForLastStripe() will return false and its not waiting for the parity gen task in queue of previous cell, right? IMHO, we could have a mechanism to wait for any previously submitted parity gen task before closure. {code} private boolean generateParityCellsForLastStripe(){ final long lastStripeSize = currentBlockGroupBytes % stripeDataSize(); if (lastStripeSize == 0) { return false; } {code} # I think executor service can be moved to DFSClient, rather than creating again and again for every DFSStripedOutputStream, isn't it? {code} private final ExecutorService executorService; {code} Also, I've one comment about {{Executors.newCachedThreadPool}} -> It's unbounded, which means that you're opening the door for anyone to cripple your JVM by simply injecting more work into the service (DoS attack). Any specific reason to use cachedThreadPool? If not, I prefer to use fixed Executors.newFixedThreadPool or a ThreadPoolExecutor with a set maximum number of threads; # {{public}} class DoubleCellBuffer, please make this to {{private}}. Also, you can make the methods to private visibility. > DFSStripedOutputStream.writeChunk should not wait for writing parity > - > > Key: HDFS-8287 > URL: https://issues.apache.org/jira/browse/HDFS-8287 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Tsz Wo Nicholas Sze >Assignee: Kai Sasaki > Attachments: HDFS-8287-HDFS-7285.00.patch, > HDFS-8287-HDFS-7285.01.patch, HDFS-8287-HDFS-7285.02.patch, > HDFS-8287-HDFS-7285.03.patch, HDFS-8287-HDFS-7285.04.patch, > HDFS-8287-HDFS-7285.05.patch, HDFS-8287-HDFS-7285.06.patch, > HDFS-8287-HDFS-7285.07.patch, HDFS-8287-HDFS-7285.08.patch, > HDFS-8287-HDFS-7285.09.patch, HDFS-8287-HDFS-7285.10.patch, > HDFS-8287-HDFS-7285.11.patch, HDFS-8287-HDFS-7285.WIP.patch, > HDFS-8287-performance-report.pdf, HDFS-8287.12.patch, h8287_20150911.patch, > jstack-dump.txt > > > When a stripping cell is full, writeChunk computes and generates parity > packets. It sequentially calls waitAndQueuePacket so that user client cannot > continue to write data until it finishes. > We should allow user client to continue writing instead but not blocking it > when writing parity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960879#comment-14960879 ] Hudson commented on HDFS-9257: -- FAILURE: Integrated in Hadoop-trunk-Commit #8651 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8651/]) HDFS-9257. improve error message for "Absolute path required" in (harsh: rev 52ac73f344e822e41457582f82abb4f35eba9dec) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INode.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9205) Do not schedule corrupt blocks for replication
[ https://issues.apache.org/jira/browse/HDFS-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960870#comment-14960870 ] Hudson commented on HDFS-9205: -- FAILURE: Integrated in Hadoop-trunk-Commit #8650 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8650/]) Revert "Move HDFS-9205 to trunk in CHANGES.txt." (szetszwo: rev a554701fe4402ae30461e2ef165cb60970a202a0) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Do not schedule corrupt blocks for replication > -- > > Key: HDFS-9205 > URL: https://issues.apache.org/jira/browse/HDFS-9205 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze >Priority: Minor > Fix For: 2.8.0 > > Attachments: h9205_20151007.patch, h9205_20151007b.patch, > h9205_20151008.patch, h9205_20151009.patch, h9205_20151009b.patch, > h9205_20151013.patch, h9205_20151015.patch > > > Corrupted blocks by definition are blocks cannot be read. As a consequence, > they cannot be replicated. In UnderReplicatedBlocks, there is a queue for > QUEUE_WITH_CORRUPT_BLOCKS and chooseUnderReplicatedBlocks may choose blocks > from it. It seems that scheduling corrupted block for replication is wasting > resource and potentially slow down replication for the higher priority blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated HDFS-9257: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thank you for the improvement contribution Marcell, hope to see more! > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Fix For: 2.8.0 > > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960854#comment-14960854 ] Harsh J commented on HDFS-9257: --- +1, failed tests are unrelated. Tests shouldn't be necessary for the trivial message improvement. Committing shortly. > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9258) NN should indicate which nodes are stale
[ https://issues.apache.org/jira/browse/HDFS-9258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla reassigned HDFS-9258: - Assignee: Kuhu Shukla > NN should indicate which nodes are stale > > > Key: HDFS-9258 > URL: https://issues.apache.org/jira/browse/HDFS-9258 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Kuhu Shukla > > Determining why the NN is not coming out of safemode is difficult - is it a > bug or pending block reports? If the number of nodes appears sufficient, but > there are missing blocks, it would be nice to know which nodes haven't block > reported (stale). Instead of forcing the NN to leave safemode prematurely, > the SE can first force block reports from stale nodes. > The datanode report and the web ui's node list should contain this > information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9258) NN should indicate which nodes are stale
Daryn Sharp created HDFS-9258: - Summary: NN should indicate which nodes are stale Key: HDFS-9258 URL: https://issues.apache.org/jira/browse/HDFS-9258 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Daryn Sharp Determining why the NN is not coming out of safemode is difficult - is it a bug or pending block reports? If the number of nodes appears sufficient, but there are missing blocks, it would be nice to know which nodes haven't block reported (stale). Instead of forcing the NN to leave safemode prematurely, the SE can first force block reports from stale nodes. The datanode report and the web ui's node list should contain this information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8898) Create API and command-line argument to get quota without need to get file and directory counts
[ https://issues.apache.org/jira/browse/HDFS-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960823#comment-14960823 ] Ming Ma commented on HDFS-8898: --- Thanks [~kihwal]! I will update the patch with new unit tests. > Create API and command-line argument to get quota without need to get file > and directory counts > --- > > Key: HDFS-8898 > URL: https://issues.apache.org/jira/browse/HDFS-8898 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Reporter: Joep Rottinghuis > Attachments: HDFS-8898.patch > > > On large directory structures it takes significant time to iterate through > the file and directory counts recursively to get a complete ContentSummary. > When you want to just check for the quota on a higher level directory it > would be good to have an option to skip the file and directory counts. > Moreover, currently one can only check the quota if you have access to all > the directories underneath. For example, if I have a large home directory > under /user/joep and I host some files for another user in a sub-directory, > the moment they create an unreadable sub-directory under my home I can no > longer check what my quota is. Understood that I cannot check the current > file counts unless I can iterate through all the usage, but for > administrative purposes it is nice to be able to get the current quota > setting on a directory without the need to iterate through and run into > permission issues on sub-directories. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9257) improve error message for "Absolute path required" in INode.java to contain the rejected path
[ https://issues.apache.org/jira/browse/HDFS-9257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960788#comment-14960788 ] Hadoop QA commented on HDFS-9257: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 49s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 53s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 17s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 22s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 51m 44s | Tests failed in hadoop-hdfs. | | | | 97m 9s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.server.blockmanagement.TestNodeCount | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12767066/HDFS-9257.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / cf23f2c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13031/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13031/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13031/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13031/console | This message was automatically generated. > improve error message for "Absolute path required" in INode.java to contain > the rejected path > - > > Key: HDFS-9257 > URL: https://issues.apache.org/jira/browse/HDFS-9257 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.1 >Reporter: Marcell Szabo >Assignee: Marcell Szabo >Priority: Trivial > Attachments: HDFS-9257.000.patch > > > throw new AssertionError("Absolute path required"); > message should also show the path to help debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9241) HDFS clients can't construct HdfsConfiguration instances
[ https://issues.apache.org/jira/browse/HDFS-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960785#comment-14960785 ] Steve Loughran commented on HDFS-9241: -- One other thing to consider is "would we expect thin clients to ever instantiate this class?". If so, should it be in that JAR. until now, creating it has been the way to force hdfs-site in, just as creating a {{YarnConfiguration()}} forced that in. After hitting problems with race conditions in UGI init, I now load all of these on startup. Should this be necessary? > HDFS clients can't construct HdfsConfiguration instances > > > Key: HDFS-9241 > URL: https://issues.apache.org/jira/browse/HDFS-9241 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Steve Loughran >Assignee: Mingliang Liu > Attachments: HDFS-9241.000.patch > > > the changes for the hdfs client classpath make instantiating > {{HdfsConfiguration}} from the client impossible; it only lives server side. > This breaks any app which creates one. > I know people will look at the {{@Private}} tag and say "don't do that then", > but it's worth considering precisely why I, at least, do this: it's the only > way to guarantee that the hdfs-default and hdfs-site resources get on the > classpath, including all the security settings. It's precisely the use case > which {{HdfsConfigurationLoader.init();}} offers internally to the hdfs code. > What am I meant to do now? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
[ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960776#comment-14960776 ] Kihwal Lee commented on HDFS-9239: -- It may not help much with the namenode side. Even on extremely busy clusters, I have not seen nodes missing heartbeat and considered dead because of the contention among heartbeats, incremental block reports (IBR) and full block reports (FBR). Well before node liveness is affected by inundation of IBRs and FBRs, the namenode performance will degrade to unacceptable level. It is really easy to test this. Create a wide job that creates a lot small files. However,making it lighter on the datanode side is a good idea. We have seen many cases where nodes are declared dead because the service actor thread is delayed/blocked. > DataNode Lifeline Protocol: an alternative protocol for reporting DataNode > liveness > --- > > Key: HDFS-9239 > URL: https://issues.apache.org/jira/browse/HDFS-9239 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: DataNode-Lifeline-Protocol.pdf > > > This issue proposes introduction of a new feature: the DataNode Lifeline > Protocol. This is an RPC protocol that is responsible for reporting liveness > and basic health information about a DataNode to a NameNode. Compared to the > existing heartbeat messages, it is lightweight and not prone to resource > contention problems that can harm accurate tracking of DataNode liveness > currently. The attached design document contains more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-9239) DataNode Lifeline Protocol: an alternative protocol for reporting DataNode liveness
[ https://issues.apache.org/jira/browse/HDFS-9239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960776#comment-14960776 ] Kihwal Lee edited comment on HDFS-9239 at 10/16/15 2:24 PM: It may not help much with the namenode side. Even on extremely busy clusters, I have not seen nodes missing heartbeat and considered dead because of the contention among heartbeats, incremental block reports (IBR) and full block reports (FBR). Well before node liveness is affected by inundation of IBRs and FBRs, the namenode performance will degrade to unacceptable level. It is really easy to test this. Create a wide job that creates a lot of small files. However,making it lighter on the datanode side is a good idea. We have seen many cases where nodes are declared dead because the service actor thread is delayed/blocked. was (Author: kihwal): It may not help much with the namenode side. Even on extremely busy clusters, I have not seen nodes missing heartbeat and considered dead because of the contention among heartbeats, incremental block reports (IBR) and full block reports (FBR). Well before node liveness is affected by inundation of IBRs and FBRs, the namenode performance will degrade to unacceptable level. It is really easy to test this. Create a wide job that creates a lot small files. However,making it lighter on the datanode side is a good idea. We have seen many cases where nodes are declared dead because the service actor thread is delayed/blocked. > DataNode Lifeline Protocol: an alternative protocol for reporting DataNode > liveness > --- > > Key: HDFS-9239 > URL: https://issues.apache.org/jira/browse/HDFS-9239 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Reporter: Chris Nauroth >Assignee: Chris Nauroth > Attachments: DataNode-Lifeline-Protocol.pdf > > > This issue proposes introduction of a new feature: the DataNode Lifeline > Protocol. This is an RPC protocol that is responsible for reporting liveness > and basic health information about a DataNode to a NameNode. Compared to the > existing heartbeat messages, it is lightweight and not prone to resource > contention problems that can harm accurate tracking of DataNode liveness > currently. The attached design document contains more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.
[ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960775#comment-14960775 ] Steve Loughran commented on HDFS-9249: -- don't worry about being new: welcome to the fun of debugging hadoop from stack traces. I agree, you may not have found the cause, but you have certainly found what would have triggered it, verified that there's logging and there's a shutdown. would you be able to derive a test from that? We shouldn't need to have a miniHDFS cluster spun up, just try to start an NN with those configuration options > NPE thrown if an IOException is thrown in NameNode. > - > > Key: HDFS-9249 > URL: https://issues.apache.org/jira/browse/HDFS-9249 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Minor > Labels: supportability > Attachments: HDFS-9249.001.patch > > > This issue was found when running test case > TestBackupNode.testCheckpointNode, but upon closer look, the problem is not > due to the test case. > Looks like an IOException was thrown in > try { > initializeGenericKeys(conf, nsId, namenodeId); > initialize(conf); > try { > haContext.writeLock(); > state.prepareToEnterState(haContext); > state.enterState(haContext); > } finally { > haContext.writeUnlock(); > } > causing the namenode to stop, but the namesystem was not yet properly > instantiated, causing NPE. > I tried to reproduce locally, but to no avail. > Because I could not reproduce the bug, and the log does not indicate what > caused the IOException, I suggest make this a supportability JIRA to log the > exception for future improvement. > Stacktrace > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906) > at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210) > at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:827) > at > org.apache.hadoop.hdfs.server.namenode.BackupNode.(BackupNode.java:89) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130) > The last few lines of log: > 2015-10-14 19:45:07,807 INFO namenode.NameNode > (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint] > 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started > (again) > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is > hdfs://localhost:37835 > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use > localhost:37835 to access this namenode/service. > 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster > (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster > 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem > (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for > active state > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time > for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of > syncs: 4 SyncTimes(ms): 2 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, > exiting > 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_001-003 > 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/had
[jira] [Commented] (HDFS-7964) Add support for async edit logging
[ https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960721#comment-14960721 ] Daryn Sharp commented on HDFS-7964: --- # The thread is the only one calling the real logEdit so the lastest txid is the one it last logged. # I changed it because the bookkeeper tests emitted nothing at all which makes it really hard to debug. I can undo it or change to INFO (what I intended) if you like. # It depends on the latest changes in HADOOP-12483. I renamed a new method in HADOOP-10300 to be more explicit about it's purpose. > Add support for async edit logging > -- > > Key: HDFS-7964 > URL: https://issues.apache.org/jira/browse/HDFS-7964 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 2.0.2-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: HDFS-7964.patch, HDFS-7964.patch > > > Edit logging is a major source of contention within the NN. LogEdit is > called within the namespace write log, while logSync is called outside of the > lock to allow greater concurrency. The handler thread remains busy until > logSync returns to provide the client with a durability guarantee for the > response. > Write heavy RPC load and/or slow IO causes handlers to stall in logSync. > Although the write lock is not held, readers are limited/starved and the call > queue fills. Combining an edit log thread with postponed RPC responses from > HADOOP-10300 will provide the same durability guarantee but immediately free > up the handlers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8880) NameNode metrics logging
[ https://issues.apache.org/jira/browse/HDFS-8880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960683#comment-14960683 ] Steve Loughran commented on HDFS-8880: -- I'd be in favour of improving our own general metrics sinks, rather than adding new stuff to every service # it just adds more stuff to maintain, to test, to document and debug # it's not broadly re-usable # it adds 1/thread service, which in test runs could be many more per vm. I note that Coda Hale metrics has a [stdout streamer|https://dropwizard.github.io/metrics/3.1.0/getting-started/] for such purposes. If we were to do things with metrics, codahale integration would seem a good strategy (though the transitive LGPL dependency on the ganglia reporter is something to be aware of) > NameNode metrics logging > > > Key: HDFS-8880 > URL: https://issues.apache.org/jira/browse/HDFS-8880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Arpit Agarwal >Assignee: Arpit Agarwal > Fix For: 2.8.0 > > Attachments: HDFS-8880.01.patch, HDFS-8880.02.patch, > HDFS-8880.03.patch, HDFS-8880.04.patch, namenode-metrics.log > > > The NameNode can periodically log metrics to help debugging when the cluster > is not setup with another metrics monitoring scheme. -- This message was sent by Atlassian JIRA (v6.3.4#6332)