[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127238#comment-15127238 ] Colin Patrick McCabe commented on HDFS-9260: Thanks, [~sfriberg]. The unit test failures appear to be flaky tests; they succeed for me locally (except {{TestDirectoryScanner}}, which fails for me both with and without the patch applied). +1. Will commit in a few hours if there are no further comments. We can do any additional refactoring or optimization in follow-ups. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFS-9260.017.patch, HDFS-9260.018.patch, > HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127435#comment-15127435 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 15 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 7s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 29s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 685 unchanged - 10 fixed = 688 total (was 695) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 5s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 20s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 61m 27s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 151m 34s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests |
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126967#comment-15126967 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 15 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 35s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 18s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 16s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 21s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 48s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 684 unchanged - 10 fixed = 687 total (was 694) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 28s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 25s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 141m 52s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 127m 21s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 36s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 319m 11s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests |
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124362#comment-15124362 ] Staffan Friberg commented on HDFS-9260: --- Patch17 Added timeout as a tuneable Smaller read-lock region and optimized the fill ratio calculation so not all nodes are required to be iterated (still need to find the last node). Updated tuneable names as per Colin's suggestion > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFS-9260.017.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124565#comment-15124565 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 15 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 50s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 3 new + 686 unchanged - 11 fixed = 689 total (was 697) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 42s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 40s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 38s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 132m 17s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.TestEncryptedTransfer | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12785269/HDFS-9260.017.patch | | JIRA
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124467#comment-15124467 ] Colin Patrick McCabe commented on HDFS-9260: Thanks, [~sfriberg]. {code} 4194 if (!storage.treeSetCompact(storageInfoDefragmentTimeout)) { 4195// Compaction timed out, reset iterator to continue with 4196// the same storage next iteration. 4197i += 2; 4198 } 4199 LOG.info("StorageInfo TreeSet defragmented {} : {}", 4200 storage.getStorageID(), storage.treeSetFillRatio()); 4201} {code} Hmm. The comment says "reset iterator." Is this intended to be subtracing 2 from i, rather than adding? Also, shouldn't we log a message when compaction times out? Right now we log the success message whether or not it actually succeeded, right? > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFS-9260.017.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124260#comment-15124260 ] Colin Patrick McCabe commented on HDFS-9260: Hmm. Let's make the compaction time a tunable, and then it can be adjusted if 4ms is too long. I think it's a pretty reasonable default, given that we've used it as such in the past. Agree that we could make the compaction more efficient and low-latency. Let's do that in a follow-on JIRA so that we can get this patch into trunk and get some validation of the idea. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124049#comment-15124049 ] Jing Zhao commented on HDFS-9260: - Thanks for updating the patch, [~sfriberg]. Yeah, I think we can do #4 and #7 as follow-on. For compaction, do you think we can track the total number of nodes in a treeset? We already know the total number of entries (through size), if we can also track the total number of nodes and keep it updated along with the create/delete Node operations, the fill ratio calculation is O(1) per tree and the read lock holding time can be greatly decreased. Also holding the read lock around the inner most iteration should also work I think. For the second part when doing real compaction, currently I do not have exact time limit number. Maybe we can first use 4ms as mentioned by HDFS-9198. Will be helpful if [~daryn] can comment here. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122433#comment-15122433 ] Staffan Friberg commented on HDFS-9260: --- Thanks for the comments Patch 15 (and 16) should address all your comments. I did not change the protected to private as there are some direct access in the two subclasses. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122460#comment-15122460 ] Staffan Friberg commented on HDFS-9260: --- Hi [~jingzhao], Thank you for your comments! Updated with the patch (version 16). 1. Done, moved to context 2. Done 3. Done, removed 4. I have started to look at this multiple times as I have been working on the patch, but have so far failed to find a simple way to separate it. The remove methods are so deeply linked when removing a block that I can't really figure out a clean way to lift it out, and if it was possible it would in itself be a fairly large change I believe. Let me know if you have any ideas. 5. Done, locking up directly in the map with a new Block(replicaID). 6. Done, removed 7. The reason for duplicating it is basically to avoid that the NN allocates 4 LinkedLists as part of each block that is being reported in an IBR. Potentially one could change the fullBR to not rely on lists and simply add/remove as it finds entries. Two issues that needs to be thought about for this, how should logging be handled since some counting is done as part of number of handled blocks, and, is it better to have multiple loops with smaller code footprint than expanding the already large one with even more code to handle each case directly. I agree with you that it is bad with the two code paths, but I think it the reduction in allocation for IBRs could be worth it. 8. Done, I do the same checks I do in removeLeft/Right 9, 10. Good point. Is it required to hold the readlock around the loops, or would it be enough to just hold it around the inner most iteration that calculates the fragmentation for a storage. Would help reduce time significantly for the first iteration. Need to think a bit for about the second part when actually doing defragmentation on abort mechanism. What is an OK time limit? I saw 4ms being mentioned in HDFS-9198. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122693#comment-15122693 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 15 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 5 new + 687 unchanged - 11 fixed = 692 total (was 698) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 65m 3s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 43s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 154m 30s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockManagerSafeMode | | | hadoop.hdfs.server.datanode.TestBlockScanner | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.TestDFSUpgradeFromImage | \\ \\ || Subsystem || Report/Notes || | Docker |
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122898#comment-15122898 ] Colin Patrick McCabe commented on HDFS-9260: Thanks, [~sfriberg]. bq. 7. The reason for duplicating it is basically to avoid that the NN allocates 4 LinkedLists as part of each block that is being reported in an IBR. Potentially one could change the fullBR to not rely on lists and simply add/remove as it finds entries. Two issues that needs to be thought about for this, how should logging be handled since some counting is done as part of number of handled blocks, and, is it better to have multiple loops with smaller code footprint than expanding the already large one with even more code to handle each case directly. I agree with you that it is bad with the two code paths, but I think it the reduction in allocation for IBRs could be worth it. Yeah, this seems like something that should be done in a follow-on JIRA. {code} + public static final String DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_MS_KEY + = "dfs.namenode.storageinfo.efficiency.interval.ms"; {code} I appreciate that this now has time units associated with it, but I feel that "defragmenter" should be somewhere in the name. This is basically a configuration key for the defragmenter. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch, > HDFS-9260.016.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120593#comment-15120593 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 44s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 31s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 33s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 6 new + 705 unchanged - 11 fixed = 711 total (was 716) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 30s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 97m 7s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 85m 48s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 220m 12s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | |
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120237#comment-15120237 ] Jing Zhao commented on HDFS-9260: - Thanks for the great work, [~sfriberg]. The patch looks good overall. Some comments and questions: # For {{DatanodeProtocol#blockReport}}, can we move "boolean sorted" inside of BlockReportContext and make its default value false? In this way we can keep the backward compatibility between DN and NN (though allow old DN sending reports to NN can lead to bad performance). # Need to update the javadoc of ReplicaMap#map. # {{DatanodeStorageInfo#addBlockInitial(BlockInfo)}} has not been called. # In {{BlockManager#removeBlocksAssociatedTo}} and {{removeZombieReplicas}}, looks like we're removing the block from the DatanodeStorageInfo twice? Can we modify {{removeStoredBlock}} to avoid the second remove op? {code} for (DatanodeStorageInfo storage : node.getStorageInfos()) { final Iterator it = storage.getBlockIterator(); while (it.hasNext()) { BlockInfo block = it.next(); // DatanodeStorageInfo must be removed using the iterator to avoid // ConcurrentModificationException in the underlying storage it.remove(); removeStoredBlock(block, node); } } {code} # In {{reportDiffSorted}}, since we already convert the replica id in the beginning, when we call {{getStoredBlock}} we can pass in the replicaID. Or we can directly call {{getStoredBlock}} in the beginning. {code} if (BlockIdManager.isStripedBlockID(replicaID) && (!hasNonEcBlockUsingStripedID || !blocksMap.containsBlock(replica))) { replicaID = BlockIdManager.convertToStripedID(replicaID); } {code} # The following TODO has been deleted from the current trunk. {code} if (invalidateBlocks.contains(dn, replica)) { /* * TODO: following assertion is incorrect, see HDFS-2668 assert * storedBlock.findDatanode(dn) < 0 : "Block " + block + * " in recentInvalidatesSet should not appear in DN " + dn; */ return; } {code} # It's better to avoid duplicated code between {{processAndHandleReportedBlock}} and {{reportDiffSortedInner}}. The logic for processing reported block is complicated. Having copies in two different places causes extra maintenance burden. # In TreeSet#removeElementAt, currently the compaction is only done to merge node into next/prev. Do you also want to check if we can merge next/prev into the current node if next/perv's size is 1? (Looks like Apache Harmony's implementation has this extra check) # Maybe we should have an upper bound for the total number of storage compaction done in each iteration? And the next iteration can continue from the stopping point of the previous iteration. Considering calcualting the fill ratio also needs to go through the whole tree, each compaction iteration will go through at least (total_number_blocks * replication_factor / 64) tree nodes. This may be a big workload and in the best scenario (i.e., no compaction is necessary) the read lock will still be held for a while. # It will also be helpful if you can post performance numbers about the compaction. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118269#comment-15118269 ] Colin Patrick McCabe commented on HDFS-9260: bq. Should I convert storages field to private? (The triplets field was protected) That sounds like a good idea. We have functions to manipulate these fields, so the subclasses don't need to directly poke at the fields. If that involves code changes to the subclasses, though, let's just do it in a follow-on JIRA. This is one case where checkstyle is not that useful, since it's warning about a problem that already exists before this patch (and isn't even really a "problem," just an infelicity). {code} - public static final String DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY = "dfs.namenode.replication.max-streams-hard-limit"; + public static final String DFS_NAMENODE_REPLICATION_STREAMS_HARD_LIMIT_KEY = + "dfs.namenode.replication.max-streams-hard-limit"; {code} Can we skip this change? It's not really part of this work and it makes the diff bigger. {code} + public static final String DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_KEY + = "dfs.namenode.storageinfo.efficiency.interval"; + public static final int DFS_NAMENODE_STORAGEINFO_EFFICIENCY_INTERVAL_DEFAULT + = 600; {code} Should be something like {{dfs.namenode.storageinfo.defragmenter.interval.ms}} to indicate that it's a scan interval, and that it's in milliseconds. {code} - String poolId, StorageBlockReport[] reports, BlockReportContext context) + String poolId, StorageBlockReport[] reports, boolean sorted, + BlockReportContext context) {code} Another unnecessary diff {code} +Object[] old = storages; +storages = new DatanodeStorageInfo[(last+num)]; +System.arraycopy(old, 0, storages, 0, last); {code} Now "old" can have type {{DatanodeStorageInfo[]}} rather than {{Object[]}}, right? {code} + storageInfoMonitorThread.interrupt(); {code} Maybe this should be something like {{storageInfoDefragmenterThread}}? "monitor" suggests something like the block scanner, not defragmentation (at least in my mind?) {code} import org.apache.hadoop.hdfs.util.TreeSet; {code} I think this might be less confusing if you called it {{ChunkedTreeSet}}. If I were just a new developer looking at the code (or even an experienced developer), I wouldn't really expect us to be using something called {{TreeSet}} which was actually completely different than {{java.util.TreeSet}}. {code} --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto @@ -246,6 +246,7 @@ message BlockReportRequestProto { required string blockPoolId = 2; repeated StorageBlockReportProto reports = 3; optional BlockReportContextProto context = 4; + optional bool sorted = 5 [default = false]; } {code} The other fields in {{BlockReportRequestProto}} have comments explaining what they are. Let's add one for "sorted" > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115356#comment-15115356 ] Staffan Friberg commented on HDFS-9260: --- Fixed checkstyle on TreeSet. Should I convert storages field to private? (The triplets field was protected) > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFS-9260.013.patch, HDFS-9260.014.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15115578#comment-15115578 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 2 new + 704 unchanged - 12 fixed = 706 total (was 716) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 51m 34s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 34s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 128m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15112866#comment-15112866 ] Colin Patrick McCabe commented on HDFS-9260: bq. This comment is simply copied from the method "processAndHandleReportedBlock" in the same class and not mine (doesn't show up since I didn't edit that method). I kept it as part of the structure since I wanted to make sure the algorithm behaves in the same way. So might be best to address it in a separate bug. That's fair. bq. Yes this is a change in behavior compared to earlier. Started down this path since add on a Set doesn't replace, which unfortunately doesn't match what the Map API does. I added a "replace" method in the class to be used when a replace behavior is needed and went through the code to ensure the right method is called when needed. Not really happy about this choice, perhaps a cleaner way would be to have a addWithReplace method on the TreeSet and keep the old add behavior of the ReplicaMap. I believe it would reduce the size of the patch and only add one "ugly" method on the TreeSet. That's a good idea. Yes, please, let's have an {{addWithReplace}} method. Otherwise, there is a risk that we will accidentally change the semantics of something in an incorrect way (I notice a lot of cases where "add" turns into "replace" in this patch and it makes me nervous). I had some minor conflicts applying to trunk, I guess it needs a rebase anyway. Thanks, [~sfriberg]. Very exciting to see this make progress. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFSBenchmarks.zip, > HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113195#comment-15113195 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} HDFS-9260 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12783927/HDFS-9260.012.patch | | JIRA Issue | HDFS-9260 | | Powered by | Apache Yetus 0.2.0-SNAPSHOT http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14210/console | This message was automatically generated. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFS-9260.011.patch, HDFS-9260.012.patch, > HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15113600#comment-15113600 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 45s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 16s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 4 new + 704 unchanged - 12 fixed = 708 total (was 716) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 9s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 25s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 53m 15s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 141m 31s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.server.datanode.TestBlockScanner | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110816#comment-15110816 ] Staffan Friberg commented on HDFS-9260: --- Thanks for the comments [~cmccabe]! Did all your suggested changes, except the two below which needed some further discussion. {quote} {code} > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // TODO: Pretty confident this should be s/storedBlock/block below, > // since we should be postponing the info of the reported block, not > // the stored block. See HDFS-6289 for more context. > queueReportedBlock(storageInfo, storedBlock, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { {code} If we're really confident that this should be "block" rather than "storedBlock", let's fix it. {quote} This comment is simply copied from the method "processAndHandleReportedBlock" in the same class and not mine (doesn't show up since I didn't edit that method). I kept it as part of the structure since I wanted to make sure the algorithm behaves in the same way. So might be best to address it in a separate bug. {quote} {code} /** * Add a replica's meta information into the map * * @param bpid block pool id * @param replicaInfo a replica's meta information - * @return previous meta information of the replica + * @return true if inserted into the set * @throws IllegalArgumentException if the input parameter is null */ - ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) { + boolean add(String bpid, ReplicaInfo replicaInfo) { {code} I would like to see some clear comments in this function on what happens if there is already a copy of the replicaInfo in the ReplicaMap. I might be wrong, but based on my reading of TreeSet.java, it seems like the new entry won't be added, which is a behavior change from what we did earlier. Unless I'm missing something, this doesn't seem quite right since the new ReplicaInfo might have a different genstamp, etc. {quote} Yes this is a change in behavior compared to earlier. Started down this path since add on a Set doesn't replace, which unfortunately doesn't match what the Map API does. I added a "replace" method in the class to be used when a replace behavior is needed and went through the code to ensure the right method is called when needed. Not really happy about this choice, perhaps a cleaner way would be to have a addWithReplace method on the TreeSet and keep the old add behavior of the ReplicaMap. I believe it would reduce the size of the patch and only add one "ugly" method on the TreeSet. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1552#comment-1552 ] Hadoop QA commented on HDFS-9260: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 19 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 26s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 7 new + 844 unchanged - 12 fixed = 851 total (was 856) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 16s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs generated 3 new + 0 unchanged - 0 fixed = 3 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 48s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 6s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 15s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 153m 21s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs-project/hadoop-hdfs | | | org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo$1.next() can't throw NoSuchElementException At BlockInfo.java:At BlockInfo.java:[line 117] | | | Load of known null value in
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15102574#comment-15102574 ] Colin Patrick McCabe commented on HDFS-9260: Thanks for working on this, [~sfriberg]. It looks promising. {code} > @Override > boolean removeStorage(DatanodeStorageInfo storage) { > int dnIndex = findStorageInfoFromEnd(storage); > if (dnIndex < 0) { // the node is not found > return false; > } > // set the triplet to null > setStorageInfo(dnIndex, null); > indices[dnIndex] = -1; > return true; > } {code} This still refers to "the triplet" but there are no more triplets, right? There are some other comments referencing "triplets" in {{BlockInfoStriped}} that should be fixed as well. What is the strategy for shrinking {{BlockInfo#storages}}? It seems like right now {{setStorageInfo(, null)}} will create a "hole" in the array, but it is never actually shrunk. {code} > // Remove here for now as removeStoredBlock will do it otherwise > // and cause concurrent modification exception {code} This comment could be clearer. How about "we must remove the block via the iterator"? {code} > if (shouldPostponeBlocksFromFuture) { > // If the block is an out-of-date generation stamp or state, > // but we're the standby, we shouldn't treat it as corrupt, > // but instead just queue it for later processing. > // TODO: Pretty confident this should be s/storedBlock/block below, > // since we should be postponing the info of the reported block, not > // the stored block. See HDFS-6289 for more context. > queueReportedBlock(storageInfo, storedBlock, reportedState, > QUEUE_REASON_CORRUPT_STATE); > } else { {code} If we're really confident that this should be "block" rather than "storedBlock", let's fix it. {code} @@ -122,8 +91,9 @@ BlockInfo addBlockCollection(BlockInfo b, BlockCollection bc) { */ void removeBlock(Block block) { BlockInfo blockInfo = blocks.remove(block); -if (blockInfo == null) +if (blockInfo == null) { return; +} ... @@ -191,8 +177,9 @@ int numNodes(Block b) { */ boolean removeNode(Block b, DatanodeDescriptor node) { BlockInfo info = blocks.get(b); -if (info == null) +if (info == null) { return false; +} {code} Let's try to avoid "no-op" changes like this in this patch, since it's already pretty big. We can fix whitespace and so forth in other JIRAs to avoid creating confusion about what was changed here. {code} return set != null ? set.get(blockId, LONG_AND_BLOCK_COMPARATOR) : null; {code} This might be simpler as: {code} if (set == null) { return null; } return set.get(blockId, LONG_AND_BLOCK_COMPARATOR); {code} {code} /** * Add a replica's meta information into the map * * @param bpid block pool id * @param replicaInfo a replica's meta information - * @return previous meta information of the replica + * @return true if inserted into the set * @throws IllegalArgumentException if the input parameter is null */ - ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) { + boolean add(String bpid, ReplicaInfo replicaInfo) { {code} I would like to see some clear comments in this function on what happens if there is already a copy of the replicaInfo in the ReplicaMap. I might be wrong, but based on my reading of TreeSet.java, it seems like the new entry won't be added, which is a behavior change from what we did earlier. Unless I'm missing something, this doesn't seem quite right since the new ReplicaInfo might have a different genstamp, etc. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFS-9260.010.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15027676#comment-15027676 ] Staffan Friberg commented on HDFS-9260: --- Added a new benchmark that does IBRs on a blockmap/datastorageinfo that contains 2M entries and deletes/re-adds 20% of those entries. The updates are spread out over multiple IBRs and each IBR contains between 50-350 changed blocks. The IntMapping version is again the patch from HDFS-6658. {noformat} Some further benchmarking of Incremental BR. ==> benchmarks_trunkMarch11_intMapping.jar.output <== Benchmark Mode Cnt ScoreError Units IncrementalBlockReport.receivedAndDeleted avgt 50 3969.207 ± 14.979 ms/op ==> benchmarks_treeset_baseline.jar.output <== Benchmark Mode CntScoreError Units IncrementalBlockReport.receivedAndDeleted avgt 50 387.936 ± 25.634 ms/op ==> benchmarks_treeset.jar.output <== Benchmark Mode Cnt ScoreError Units IncrementalBlockReport.receivedAndDeleted avgt 50 1205.779 ± 75.464 ms/op {noformat} > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, > HDFSBenchmarks.zip > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990758#comment-14990758 ] Staffan Friberg commented on HDFS-9260: --- Hi Daryn, Thanks for the comments and the additional data points. Interesting to learn more about the scale of HDFS instances. I wonder if the NN was running on older and slower hardware in my case compared to your setup, the cluster I was able to get my hands on for these runs has fairly old machines. Adds of new blocks are relatively fast since they will be at the far right of the Tree the number of lookups will be minimal. However the current implementation only needs to do around two writes to insert something at the head/end of the list nothing that has a more complicated datastructure will be able to match it. It will be a question of trade-off. Also to clarify, the microbenchmarks only measures the actual remove and insert of random values not the whole process of copying files etc. I would expect the other parts to far outweigh the time it takes to update the datastructures, so while the 4x sounds scary it should be a minor part of the whole transaction. I think the patch you are referring to is HDFS-6658. I applied it to the 3.0.0 branch from March 11 2015 which was from when the patch was created and ran it on the same microbenchmarks I built to test my patch. I will attach the source code for the benchmarks so you can check that I used the right APIs for it to be comparable. From what I can tell the benchmarks should do the same thing on a high level. The performance overhead for adding and removing are similar between our two implementations. {noformat} fbrAllExisting - Do a Full Block Report with the same 2M entries that are already registered for the Storage in the NN. addRemoveBulk - Remove 32k random blocks from a StorageInfo that has 64k entries, then re-add them all. addRemoveRandom - Remove and directly re-add a block from a Storage entry, repeat for 32k blocks from a StorageInfo with 64k blocks iterate - Iterate and get blockID for 64k blocks associated with a particular StorageInfo ==> benchmarks_trunkMarch11_intMapping.jar.output <== Benchmark Mode CntScore Error Units FullBlockReport.fbrAllExisting avgt 25 379.659 ± 5.463 ms/op StorageInfoAccess.addRemoveBulkavgt 25 16.426 ± 0.380 ms/op StorageInfoAccess.addRemoveRandom avgt 25 15.401 ± 0.196 ms/op StorageInfoAccess.iterate avgt 251.496 ± 0.004 ms/op ==> benchmarks_trunk_baseline.jar.output <== Benchmark Mode CntScore Error Units FullBlockReport.fbrAllExisting avgt 25 288.974 ± 3.970 ms/op StorageInfoAccess.addRemoveBulkavgt 253.157 ± 0.046 ms/op StorageInfoAccess.addRemoveRandom avgt 252.815 ± 0.012 ms/op StorageInfoAccess.iterate avgt 250.788 ± 0.006 ms/op ==> benchmarks_trunk_treeset.jar.output <== Benchmark Mode CntScore Error Units FullBlockReport.fbrAllExisting avgt 25 231.270 ± 3.450 ms/op StorageInfoAccess.addRemoveBulkavgt 25 11.596 ± 0.521 ms/op StorageInfoAccess.addRemoveRandom avgt 25 11.249 ± 0.101 ms/op StorageInfoAccess.iterate avgt 250.385 ± 0.010 ms/op {noformat} Do you have a good suggestion for some other perf test/stress test that would be good to try out? Any stress load you have on your end that would be possible to try it out on? > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: FBR processing.png, HDFS Block and Replica Management > 20151013.pdf, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14985524#comment-14985524 ] Daryn Sharp commented on HDFS-9260: --- I'll try to review the patch today. Only skimmed comments, and it's a big change. My initial questions: # Performance. What impact does it have on FBRs? Especially startup. # Time to initialize replication queue. # Time to decommission. # Does memory usage increase or decrease? > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986062#comment-14986062 ] Staffan Friberg commented on HDFS-9260: --- Hi Daryn, Thanks for taking a look at the patch. 1. FBR and startup improves, please see the attached PDF. 2. Will need to check what we do here (and if I still have the old logs), but doesn't feel like it should be affected 3. We will be slightly slower when deleting a file or removing with the current algorithms as it goes through the LightWeightGSet to first lookup/remove each affected blockinfo, and after that remove it from the linked list. In my case it will be removed from treeset which requires a new lookup. However while this is slower I think the time it takes to that process is far outweighed by the time it takes for deleting or redistributing blocks on all DN. Deleting files with a large number of blocks seems to take on the order of hours since we only send small parts of the total block list to each node on every heartbeat. No to familiar with how aggressive the redistribution is in the event of a DN decommission. 4. It will decrease as long as the TreeSet is kept above ~50% fill ratio, since the reference to each blockinfo no is a single pointer from the treeset instead of the double linked list. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981442#comment-14981442 ] Staffan Friberg commented on HDFS-9260: --- I have been running through the other failed tests without being able to reproduce them locally. Was able to reproduce the failed test in hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes once on the trunk, but not yet with my branch. So it seems like these might be intermittent issues. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975804#comment-14975804 ] Hadoop QA commented on HDFS-9260: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 20m 25s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 19 new or modified test files. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 25s | The applied patch generated 9 new checkstyle issues (total was 882, now 878). | | {color:green}+1{color} | whitespace | 0m 36s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 36s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 53m 59s | Tests failed in hadoop-hdfs. | | | | 103m 27s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints | | | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | | | hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768859/HDFS-9260.008.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 96677be | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13214/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13214/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13214/console | This message was automatically generated. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977515#comment-14977515 ] Hadoop QA commented on HDFS-9260: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 25m 50s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 19 new or modified test files. | | {color:green}+1{color} | javac | 11m 12s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 14m 24s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 33s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 58s | The applied patch generated 9 new checkstyle issues (total was 882, now 878). | | {color:green}+1{color} | whitespace | 0m 42s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 4s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 45s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 40s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 4m 21s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 69m 7s | Tests failed in hadoop-hdfs. | | | | 134m 43s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.shortcircuit.TestShortCircuitCache | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyWriter | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles | | | hadoop.hdfs.server.namenode.TestFSImageWithAcl | | | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.TestEncryptionZones | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12769065/HDFS-9260.009.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 68ce93c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13233/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13233/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13233/console | This message was automatically generated. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974876#comment-14974876 ] Zhe Zhang commented on HDFS-9260: - Quick note: the patches are all named after HDFS-7435 > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975040#comment-14975040 ] Zhe Zhang commented on HDFS-9260: - Thanks for the great work Staffan. The only change related to erasure coding is the below block ID translation, and I think it is done correctly. {code} + long replicaID = replica.getBlockId(); + if (BlockIdManager.isStripedBlockID(replicaID) + && (!hasNonEcBlockUsingStripedID || + !blocksMap.containsBlock(replica))) { +replicaID = BlockIdManager.convertToStripedID(replicaID); } {code} Will post a full review shortly. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975042#comment-14975042 ] Hadoop QA commented on HDFS-9260: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 29m 54s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 19 new or modified test files. | | {color:green}+1{color} | javac | 12m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 14m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 33s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 3s | The applied patch generated 21 new checkstyle issues (total was 883, now 892). | | {color:red}-1{color} | whitespace | 0m 45s | The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 2m 11s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 55s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 8s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 36s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 57m 38s | Tests failed in hadoop-hdfs. | | | | 128m 13s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768759/HDFS-7435.007.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 123b3db | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13195/console | This message was automatically generated. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch, > HDFS-7435.007.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972281#comment-14972281 ] Hadoop QA commented on HDFS-9260: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 18m 12s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | @author | 0m 0s | The patch appears to contain 1 @author tags which the Hadoop community has agreed to not allow in code contributions. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 19 new or modified test files. | | {color:green}+1{color} | javac | 7m 55s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 10m 20s | The applied patch generated 3 additional warning messages. | | {color:green}+1{color} | release audit | 0m 26s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 25s | The applied patch generated 77 new checkstyle issues (total was 883, now 952). | | {color:red}-1{color} | whitespace | 0m 33s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 29s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 33s | The patch appears to introduce 3 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 54m 34s | Tests failed in hadoop-hdfs. | | | | 101m 17s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.balancer.TestBalancer | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768437/HDFS-7435.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 15eb84b | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | javadoc | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13175/console | This message was automatically generated. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch, HDFS-7435.006.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971979#comment-14971979 ] Hadoop QA commented on HDFS-9260: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768368/HDFS-7435.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 15eb84b | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/13167/console | This message was automatically generated. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971575#comment-14971575 ] Staffan Friberg commented on HDFS-9260: --- Also handles negative non-striped (EC) entries as efficiently as possible. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, > HDFS-7435.004.patch, HDFS-7435.005.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964719#comment-14964719 ] Walter Su commented on HDFS-9260: - -TreeMap- TreeSet > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964712#comment-14964712 ] Walter Su commented on HDFS-9260: - The biggest part of BlockInfo is the implicit linked lists. If we move the lists to TreeSet, Does it go against the effort of moving BlockInfo off-heap? The TreeMap brings more on-heap reference to off-heap BlockInfo. {{BlocksMap}} is a reference map still need to be off-heap.(HDFS-7846). I think TreeMap need to be an off-heap set as well. Since set/map have the same numbers of blocks, and block number grows. But does it worth it? The point of TreeSet is for fast iteration for processing FBR. As said by [~cmccabe] in [here|https://issues.apache.org/jira/browse/HDFS-7846?focusedCommentId=14355701=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14355701], iterating over all the blocks on a given datanode isn't quite necessary during FBR. Although iteration still needed in other use-cases. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965242#comment-14965242 ] Staffan Friberg commented on HDFS-9260: --- The TreeSet reduces the number of reference compared to the Double-LinkedList currently built using the the triplets datastructure. If we would move things out of the heap the TreeSet, if still used, would contain memory addresses rather than longs which are trivial for the GC to handle (no need to scan the array). Potentially the BlockMap could be the same way a large long array on heap that contains memory addresses of the blockinfos that are off heap, and collisions could be handled by the blockinfo's off heap (linking in the same way they are now). > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963055#comment-14963055 ] Walter Su commented on HDFS-9260: - Excellent work [~sfriberg]! 1. Per your tests, the result looks good. But I'm more interested how does the new datastructure incorporate with HDFS-7836. 2. To avoid moving entries in {{DatanodeStorageInfo}}, an easy way is to make blockReport a serializable HashMap. Instead of iterating blockReport, we iterate {{DatanodeStorageInfo}} with O(1) comparing block from blockReport. > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9260) Improve performance and GC friendliness of startup and FBRs
[ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963610#comment-14963610 ] Staffan Friberg commented on HDFS-9260: --- Hi Walter 1) I think this is part could reduce the need to go off heap. If we still see scalability issues and need to put the block map and blockinfo off heap, this could potenially serve as an idea on how to structure the data of heap, since even if the data is off heap continuously updating reference will be costly since we will invalidate the CPU cache. Potentially a version of the TreeSet holding primitives (blockinfo address) could be used for fast iteration, but need to think a bit further about. 2) Interesting idea, I think the key point would be that you could do quick lookup directly on the serialized data so you don't need to instantiate the whole map since it might be rather large. Not sure if this is easily doable with ProtoBuf and still keeping the message as compact as possible? > Improve performance and GC friendliness of startup and FBRs > --- > > Key: HDFS-9260 > URL: https://issues.apache.org/jira/browse/HDFS-9260 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode, performance >Affects Versions: 2.7.1 >Reporter: Staffan Friberg >Assignee: Staffan Friberg > Attachments: HDFS Block and Replica Management 20151013.pdf, > HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch > > > This patch changes the datastructures used for BlockInfos and Replicas to > keep them sorted. This allows faster and more GC friendly handling of full > block reports. > Would like to hear peoples feedback on this change and also some help > investigating/understanding a few outstanding issues if we are interested in > moving forward with this. > There seems to be some timing issues I hit when testing the patch, not sure > if it is a bug in the patch or something else (most likely the earlier)... > Tests that fail for me: >The issues seems to be that the blocks are not on any storage, so no > replication can occur causing the tests to fail in different ways. >TestDecomission.testDecommision >If I add a little sleep after the cleanup/delete things seem to work >TestDFSStripedOutputStreamWithFailure >A couple of tests fails in this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)