[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412902#comment-16412902 ] genericqa commented on HDFS-13056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 5m 1s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 25m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 9s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 36s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 10s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 25s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 15s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 23m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 23m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 23m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 27s{color} | {color:green} root: The patch generated 0 new + 680 unchanged - 14 fixed = 680 total (was 694) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 2m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 1s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 7m 59s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 17s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 20s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 26s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 34s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 10m 15s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 28s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}226m 42s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.server.blockmanagement.TestBlockStatsMXBean | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | | |
[jira] [Updated] (HDFS-13204) RBF: Optimize name service safe mode icon
[ https://issues.apache.org/jira/browse/HDFS-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuhongtong updated HDFS-13204: --- Attachment: HDFS-13204.007.patch > RBF: Optimize name service safe mode icon > - > > Key: HDFS-13204 > URL: https://issues.apache.org/jira/browse/HDFS-13204 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuhongtong >Assignee: liuhongtong >Priority: Minor > Attachments: HDFS-13204.001.patch, HDFS-13204.002.patch, > HDFS-13204.003.patch, HDFS-13204.004.patch, HDFS-13204.005.patch, > HDFS-13204.006.patch, HDFS-13204.007.patch, Routers.png, Subclusters.png, > image-2018-02-28-18-33-09-972.png, image-2018-02-28-18-33-47-661.png, > image-2018-02-28-18-35-35-708.png, image-2018-03-23-18-06-54-354.png > > > In federation health webpage, the safe mode icons of Subclusters and Routers > are inconsistent. > The safe mode icon of Subclusters may induce users the name service is > maintaining. > !image-2018-02-28-18-33-09-972.png! > The safe mode icon of Routers: > !image-2018-02-28-18-33-47-661.png! > In fact, if the name service is in safe mode, users can't do writing related > operations. So I think the safe mode icon in Subclusters should be modified, > which may be more reasonable. > !image-2018-02-28-18-35-35-708.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13204) RBF: Optimize name service safe mode icon
[ https://issues.apache.org/jira/browse/HDFS-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412894#comment-16412894 ] genericqa commented on HDFS-13204: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 25m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 35m 43s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 22s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 48m 15s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13204 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916084/HDFS-13204.006.patch | | Optional Tests | asflicense shadedclient | | uname | Linux 0e84677c471f 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 28790b8 | | maven | version: Apache Maven 3.3.9 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/23661/artifact/out/whitespace-eol.txt | | Max. process+thread count | 315 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23661/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RBF: Optimize name service safe mode icon > - > > Key: HDFS-13204 > URL: https://issues.apache.org/jira/browse/HDFS-13204 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuhongtong >Assignee: liuhongtong >Priority: Minor > Attachments: HDFS-13204.001.patch, HDFS-13204.002.patch, > HDFS-13204.003.patch, HDFS-13204.004.patch, HDFS-13204.005.patch, > HDFS-13204.006.patch, Routers.png, Subclusters.png, > image-2018-02-28-18-33-09-972.png, image-2018-02-28-18-33-47-661.png, > image-2018-02-28-18-35-35-708.png, image-2018-03-23-18-06-54-354.png > > > In federation health webpage, the safe mode icons of Subclusters and Routers > are inconsistent. > The safe mode icon of Subclusters may induce users the name service is > maintaining. > !image-2018-02-28-18-33-09-972.png! > The safe mode icon of Routers: > !image-2018-02-28-18-33-47-661.png! > In fact, if the name service is in safe mode, users can't do writing related > operations. So I think the safe mode icon in Subclusters should be modified, > which may be more reasonable. > !image-2018-02-28-18-35-35-708.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13291) RBF: Implement available space based OrderResolver
[ https://issues.apache.org/jira/browse/HDFS-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412883#comment-16412883 ] genericqa commented on HDFS-13291: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 48s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 21s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 11m 44s{color} | {color:green} hadoop-hdfs-rbf in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 68m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:8620d2b | | JIRA Issue | HDFS-13291 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916083/HDFS-13291.009.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle cc | | uname | Linux 1b246834e9d6 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 28790b8 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23660/testReport/ | | Max. process+thread count | 935 (vs. ulimit of 1) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23660/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > RBF: Implement available space based
[jira] [Commented] (HDFS-13204) RBF: Optimize name service safe mode icon
[ https://issues.apache.org/jira/browse/HDFS-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412878#comment-16412878 ] liuhongtong commented on HDFS-13204: [~elgoiri] Thx. I have modified rbf.css. > RBF: Optimize name service safe mode icon > - > > Key: HDFS-13204 > URL: https://issues.apache.org/jira/browse/HDFS-13204 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuhongtong >Assignee: liuhongtong >Priority: Minor > Attachments: HDFS-13204.001.patch, HDFS-13204.002.patch, > HDFS-13204.003.patch, HDFS-13204.004.patch, HDFS-13204.005.patch, > HDFS-13204.006.patch, Routers.png, Subclusters.png, > image-2018-02-28-18-33-09-972.png, image-2018-02-28-18-33-47-661.png, > image-2018-02-28-18-35-35-708.png, image-2018-03-23-18-06-54-354.png > > > In federation health webpage, the safe mode icons of Subclusters and Routers > are inconsistent. > The safe mode icon of Subclusters may induce users the name service is > maintaining. > !image-2018-02-28-18-33-09-972.png! > The safe mode icon of Routers: > !image-2018-02-28-18-33-47-661.png! > In fact, if the name service is in safe mode, users can't do writing related > operations. So I think the safe mode icon in Subclusters should be modified, > which may be more reasonable. > !image-2018-02-28-18-35-35-708.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13204) RBF: Optimize name service safe mode icon
[ https://issues.apache.org/jira/browse/HDFS-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuhongtong updated HDFS-13204: --- Attachment: HDFS-13204.006.patch > RBF: Optimize name service safe mode icon > - > > Key: HDFS-13204 > URL: https://issues.apache.org/jira/browse/HDFS-13204 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: liuhongtong >Assignee: liuhongtong >Priority: Minor > Attachments: HDFS-13204.001.patch, HDFS-13204.002.patch, > HDFS-13204.003.patch, HDFS-13204.004.patch, HDFS-13204.005.patch, > HDFS-13204.006.patch, Routers.png, Subclusters.png, > image-2018-02-28-18-33-09-972.png, image-2018-02-28-18-33-47-661.png, > image-2018-02-28-18-35-35-708.png, image-2018-03-23-18-06-54-354.png > > > In federation health webpage, the safe mode icons of Subclusters and Routers > are inconsistent. > The safe mode icon of Subclusters may induce users the name service is > maintaining. > !image-2018-02-28-18-33-09-972.png! > The safe mode icon of Routers: > !image-2018-02-28-18-33-47-661.png! > In fact, if the name service is in safe mode, users can't do writing related > operations. So I think the safe mode icon in Subclusters should be modified, > which may be more reasonable. > !image-2018-02-28-18-35-35-708.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13291) RBF: Implement available space based OrderResolver
[ https://issues.apache.org/jira/browse/HDFS-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412869#comment-16412869 ] Yiqun Lin commented on HDFS-13291: -- Fix checktyle warning. > RBF: Implement available space based OrderResolver > -- > > Key: HDFS-13291 > URL: https://issues.apache.org/jira/browse/HDFS-13291 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13291.001.patch, HDFS-13291.002.patch, > HDFS-13291.003.patch, HDFS-13291.004.patch, HDFS-13291.005.patch, > HDFS-13291.006.patch, HDFS-13291.007.patch, HDFS-13291.008.patch, > HDFS-13291.009.patch > > > Implement available space based OrderResolver, this type resolver will > benefit for balancing the data across subclusters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13291) RBF: Implement available space based OrderResolver
[ https://issues.apache.org/jira/browse/HDFS-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-13291: - Attachment: HDFS-13291.009.patch > RBF: Implement available space based OrderResolver > -- > > Key: HDFS-13291 > URL: https://issues.apache.org/jira/browse/HDFS-13291 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: 3.0.0 >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Major > Attachments: HDFS-13291.001.patch, HDFS-13291.002.patch, > HDFS-13291.003.patch, HDFS-13291.004.patch, HDFS-13291.005.patch, > HDFS-13291.006.patch, HDFS-13291.007.patch, HDFS-13291.008.patch, > HDFS-13291.009.patch > > > Implement available space based OrderResolver, this type resolver will > benefit for balancing the data across subclusters. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Attachment: HDFS-13056.012.patch > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Assignee: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.002.patch, HDFS-13056-branch-2.8.003.patch, > HDFS-13056-branch-2.8.004.patch, HDFS-13056-branch-2.8.005.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > HDFS-13056.003.patch, HDFS-13056.003.patch, HDFS-13056.004.patch, > HDFS-13056.005.patch, HDFS-13056.006.patch, HDFS-13056.007.patch, > HDFS-13056.008.patch, HDFS-13056.009.patch, HDFS-13056.010.patch, > HDFS-13056.011.patch, HDFS-13056.012.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13300) Ozone: Remove DatanodeID dependency from HDSL and Ozone
[ https://issues.apache.org/jira/browse/HDFS-13300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412641#comment-16412641 ] Nanda kumar commented on HDFS-13300: Thanks for the explanation [~elek], I get your point. Good catch. I'm working on the patch to fix the issues you mentioned, will upload it soon. > Ozone: Remove DatanodeID dependency from HDSL and Ozone > > > Key: HDFS-13300 > URL: https://issues.apache.org/jira/browse/HDFS-13300 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Reporter: Nanda kumar >Assignee: Nanda kumar >Priority: Major > Attachments: HDFS-13300-HDFS-7240.000.patch, > HDFS-13300-HDFS-7240.001.patch, HDFS-13300-HDFS-7240.002.patch, > HDFS-13300-HDFS-7240.003.patch > > > DatanodeID has been modified to add HDSL/Ozone related information > previously. This jira is to remove DatanodeID dependency from HDSL/Ozone to > make it truly pluggable without having the need to modify DatanodeID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13195) DataNode conf page cannot display the current value after reconfig
[ https://issues.apache.org/jira/browse/HDFS-13195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412595#comment-16412595 ] maobaolong commented on HDFS-13195: --- [~kihwal] [~elgoiri] [~liuhongtong] [~bharatviswa] Thank you for the reviewing, [~kihwal] Thank you for help me to commit this patch. > DataNode conf page cannot display the current value after reconfig > --- > > Key: HDFS-13195 > URL: https://issues.apache.org/jira/browse/HDFS-13195 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: maobaolong >Assignee: maobaolong >Priority: Minor > Fix For: 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.2, 3.2.0, 3.1.1 > > Attachments: HDFS-13195-branch-2.7.001.patch, > HDFS-13195-branch-2.7.002.patch, HDFS-13195.001.patch, HDFS-13195.002.patch > > > Now the branch-2.7 support dfs.datanode.data.dir reconfig, but after i > reconfig this key, the conf page's value is still the old config value. > The reason is that: > {code:java} > public DatanodeHttpServer(final Configuration conf, > final DataNode datanode, > final ServerSocketChannel externalHttpChannel) > throws IOException { > this.conf = conf; > Configuration confForInfoServer = new Configuration(conf); > confForInfoServer.setInt(HttpServer2.HTTP_MAX_THREADS, 10); > HttpServer2.Builder builder = new HttpServer2.Builder() > .setName("datanode") > .setConf(confForInfoServer) > .setACL(new AccessControlList(conf.get(DFS_ADMIN, " "))) > .hostName(getHostnameForSpnegoPrincipal(confForInfoServer)) > .addEndpoint(URI.create("http://localhost:0;)) > .setFindPort(true); > this.infoServer = builder.build(); > {code} > The confForInfoServer is a new configuration instance, while the dfsadmin > reconfig the datanode's config, the config result cannot reflect to > confForInfoServer, so we should use the datanode's conf. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412536#comment-16412536 ] genericqa commented on HDFS-13056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 8 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 1m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 5s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 8s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 14m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 39s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 17s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 26m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 26m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 26m 8s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 2m 46s{color} | {color:orange} root: The patch generated 6 new + 689 unchanged - 4 fixed = 695 total (was 693) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 3m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 2s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 8m 43s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 8m 3s{color} | {color:red} hadoop-common in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 33s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}137m 48s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 12m 32s{color} | {color:green} hadoop-distcp in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}290m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.shell.TestCopyFromLocal | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | |
[jira] [Updated] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Huo updated HDFS-13056: -- Attachment: HDFS-13056.011.patch > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Assignee: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.002.patch, HDFS-13056-branch-2.8.003.patch, > HDFS-13056-branch-2.8.004.patch, HDFS-13056-branch-2.8.005.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > HDFS-13056.003.patch, HDFS-13056.003.patch, HDFS-13056.004.patch, > HDFS-13056.005.patch, HDFS-13056.006.patch, HDFS-13056.007.patch, > HDFS-13056.008.patch, HDFS-13056.009.patch, HDFS-13056.010.patch, > HDFS-13056.011.patch, Reference_only_zhen_PPOC_hadoop2.6.X.diff, > hdfs-file-composite-crc32-v1.pdf, hdfs-file-composite-crc32-v2.pdf, > hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412452#comment-16412452 ] Dennis Huo commented on HDFS-13056: --- Merged trunk to produce [^HDFS-13056.011.patch] > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Assignee: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.002.patch, HDFS-13056-branch-2.8.003.patch, > HDFS-13056-branch-2.8.004.patch, HDFS-13056-branch-2.8.005.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > HDFS-13056.003.patch, HDFS-13056.003.patch, HDFS-13056.004.patch, > HDFS-13056.005.patch, HDFS-13056.006.patch, HDFS-13056.007.patch, > HDFS-13056.008.patch, HDFS-13056.009.patch, HDFS-13056.010.patch, > HDFS-13056.011.patch, Reference_only_zhen_PPOC_hadoop2.6.X.diff, > hdfs-file-composite-crc32-v1.pdf, hdfs-file-composite-crc32-v2.pdf, > hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412449#comment-16412449 ] Dennis Huo commented on HDFS-13056: --- Thanks for taking a look [~ste...@apache.org]! Applied your suggestions in [^HDFS-13056.010.patch]: -Mark getFileChecksumWithCombineMode as LimitedPrivate -Add TestCopyMapperCompositeCrc extending TestCopyMapper with differentiation of behaviors between the checksum options in terms of what kinds of file layouts are supported. -Remove String.format from some LOG.debug statements -Make ReplicatedFileChecksumComputer raise PathIOExceptions -Switch TestCrcUtil and TestCrcComposer to use LambdaTestUtils.intercept instead of junit ExpectedException There are a few places that will require followup work in LambdaTestUtils before switching over, namely: 1. Supporting checking more messages in the causal chain and/or suppressed exceptions, 2. Making it easy to check for multiple different string fragments in the exception text. I have some rudimentary parts of that in this followup: https://issues.apache.org/jira/browse/HDFS-13256 Re: [~xiaochen] - Removing or marking deprecated sounds good to me; I'll do that part in the followup Jira that also tracks adding WebHDFS support. Filed https://issues.apache.org/jira/browse/HDFS-13345 to track. Re: distcp, I agree there are some significant shortcomings in the existing behaviors; it's worse than always trying to overwrite when clusters have different configs, right now the copy does all the work and then fails on commit due to the checksum check after the copy has been performed, so it's wasted work. We could maybe file some followup work in DistCp to improve its behavior; as a user I initially expected its natural behavior to take into consideration whether FileChecksum#getAlgorithmName returns the same value for both sides before deciding if it's okay to compare them. I'd have expected mismatch algorithm names to be ignored the same way null FileChecksums are, so that syncing falls back to just file sizes. Instead, if the algorithm names differ right now, distcp tries to copy and then fails on commit. I guess we can discuss how to improve distcp semantics in a followup Jira. > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Assignee: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.002.patch, HDFS-13056-branch-2.8.003.patch, > HDFS-13056-branch-2.8.004.patch, HDFS-13056-branch-2.8.005.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > HDFS-13056.003.patch, HDFS-13056.003.patch, HDFS-13056.004.patch, > HDFS-13056.005.patch, HDFS-13056.006.patch, HDFS-13056.007.patch, > HDFS-13056.008.patch, HDFS-13056.009.patch, HDFS-13056.010.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to
[jira] [Created] (HDFS-13345) Add support for new COMPOSITE_CRC in WebHDFS
Dennis Huo created HDFS-13345: - Summary: Add support for new COMPOSITE_CRC in WebHDFS Key: HDFS-13345 URL: https://issues.apache.org/jira/browse/HDFS-13345 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Dennis Huo As a followup to https://issues.apache.org/jira/browse/HDFS-13056, plumbing through support for new FileChecksum types (in particular, COMPOSITE_CRC) in WebHDFS will help remove dependencies on the old DFSClient.getFileChecksum method that hard-codes MD5MD5CRC32FileChecksum as its return type. Once this is done, this work should also include marking the old getFileChecksum method as deprecated or removing it entirely from DFSClient. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13056) Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts
[ https://issues.apache.org/jira/browse/HDFS-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16412441#comment-16412441 ] genericqa commented on HDFS-13056: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s{color} | {color:red} HDFS-13056 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-13056 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916033/HDFS-13056.010.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/23657/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Expose file-level composite CRCs in HDFS which are comparable across > different instances/layouts > > > Key: HDFS-13056 > URL: https://issues.apache.org/jira/browse/HDFS-13056 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, distcp, erasure-coding, federation, hdfs >Affects Versions: 3.0.0 >Reporter: Dennis Huo >Assignee: Dennis Huo >Priority: Major > Attachments: HDFS-13056-branch-2.8.001.patch, > HDFS-13056-branch-2.8.002.patch, HDFS-13056-branch-2.8.003.patch, > HDFS-13056-branch-2.8.004.patch, HDFS-13056-branch-2.8.005.patch, > HDFS-13056-branch-2.8.poc1.patch, HDFS-13056.001.patch, HDFS-13056.002.patch, > HDFS-13056.003.patch, HDFS-13056.003.patch, HDFS-13056.004.patch, > HDFS-13056.005.patch, HDFS-13056.006.patch, HDFS-13056.007.patch, > HDFS-13056.008.patch, HDFS-13056.009.patch, HDFS-13056.010.patch, > Reference_only_zhen_PPOC_hadoop2.6.X.diff, hdfs-file-composite-crc32-v1.pdf, > hdfs-file-composite-crc32-v2.pdf, hdfs-file-composite-crc32-v3.pdf > > > FileChecksum was first introduced in > [https://issues-test.apache.org/jira/browse/HADOOP-3981] and ever since then > has remained defined as MD5-of-MD5-of-CRC, where per-512-byte chunk CRCs are > already stored as part of datanode metadata, and the MD5 approach is used to > compute an aggregate value in a distributed manner, with individual datanodes > computing the MD5-of-CRCs per-block in parallel, and the HDFS client > computing the second-level MD5. > > A shortcoming of this approach which is often brought up is the fact that > this FileChecksum is sensitive to the internal block-size and chunk-size > configuration, and thus different HDFS files with different block/chunk > settings cannot be compared. More commonly, one might have different HDFS > clusters which use different block sizes, in which case any data migration > won't be able to use the FileChecksum for distcp's rsync functionality or for > verifying end-to-end data integrity (on top of low-level data integrity > checks applied at data transfer time). > > This was also revisited in https://issues.apache.org/jira/browse/HDFS-8430 > during the addition of checksum support for striped erasure-coded files; > while there was some discussion of using CRC composability, it still > ultimately settled on hierarchical MD5 approach, which also adds the problem > that checksums of basic replicated files are not comparable to striped files. > > This feature proposes to add a "COMPOSITE-CRC" FileChecksum type which uses > CRC composition to remain completely chunk/block agnostic, and allows > comparison between striped vs replicated files, between different HDFS > instances, and possible even between HDFS and other external storage systems. > This feature can also be added in-place to be compatible with existing block > metadata, and doesn't need to change the normal path of chunk verification, > so is minimally invasive. This also means even large preexisting HDFS > deployments could adopt this feature to retroactively sync data. A detailed > design document can be found here: > https://storage.googleapis.com/dennishuo/hdfs-file-composite-crc32-v1.pdf -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org