[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum
[ https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324883 ] ASF GitHub Bot logged work on HDDS-2259: Author: ASF GitHub Bot Created on: 08/Oct/19 06:38 Start Date: 08/Oct/19 06:38 Worklog Time Spent: 10m Work Description: adoroszlai commented on issue #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605#issuecomment-539363289 Thanks @anuengineer for reviewing and committing it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 324883) Time Spent: 1h (was: 50m) > Container Data Scrubber computes wrong checksum > --- > > Key: HDDS-2259 > URL: https://issues.apache.org/jira/browse/HDDS-2259 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Chunk checksum verification fails for (almost) any file. This is caused by > computing checksum for the entire buffer, regardless of the actual size of > the chunk. > {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273} > byte[] buffer = new byte[cData.getBytesPerChecksum()]; > ... > v = fs.read(buffer); > ... > bytesRead += v; > ... > ByteString actual = cal.computeChecksum(buffer) > .getChecksums().get(0); > {code} > This results in marking all closed containers as unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum
[ https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324624 ] ASF GitHub Bot logged work on HDDS-2259: Author: ASF GitHub Bot Created on: 07/Oct/19 21:36 Start Date: 07/Oct/19 21:36 Worklog Time Spent: 10m Work Description: anuengineer commented on issue #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605#issuecomment-539214958 +1. LGTM. Thank you for fixing this very important issue. I have committed this patch to the trunk. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 324624) Time Spent: 40m (was: 0.5h) > Container Data Scrubber computes wrong checksum > --- > > Key: HDDS-2259 > URL: https://issues.apache.org/jira/browse/HDDS-2259 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Chunk checksum verification fails for (almost) any file. This is caused by > computing checksum for the entire buffer, regardless of the actual size of > the chunk. > {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273} > byte[] buffer = new byte[cData.getBytesPerChecksum()]; > ... > v = fs.read(buffer); > ... > bytesRead += v; > ... > ByteString actual = cal.computeChecksum(buffer) > .getChecksums().get(0); > {code} > This results in marking all closed containers as unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum
[ https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324625 ] ASF GitHub Bot logged work on HDDS-2259: Author: ASF GitHub Bot Created on: 07/Oct/19 21:36 Start Date: 07/Oct/19 21:36 Worklog Time Spent: 10m Work Description: anuengineer commented on pull request #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 324625) Time Spent: 50m (was: 40m) > Container Data Scrubber computes wrong checksum > --- > > Key: HDDS-2259 > URL: https://issues.apache.org/jira/browse/HDDS-2259 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Chunk checksum verification fails for (almost) any file. This is caused by > computing checksum for the entire buffer, regardless of the actual size of > the chunk. > {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273} > byte[] buffer = new byte[cData.getBytesPerChecksum()]; > ... > v = fs.read(buffer); > ... > bytesRead += v; > ... > ByteString actual = cal.computeChecksum(buffer) > .getChecksums().get(0); > {code} > This results in marking all closed containers as unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum
[ https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324027 ] ASF GitHub Bot logged work on HDDS-2259: Author: ASF GitHub Bot Created on: 06/Oct/19 09:02 Start Date: 06/Oct/19 09:02 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on issue #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605#issuecomment-538725824 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | 0 | reexec | 88 | Docker mode activated. | ||| _ Prechecks _ | | +1 | dupname | 0 | No case conflicting files found. | | +1 | @author | 0 | The patch does not contain any @author tags. | | +1 | test4tests | 0 | The patch appears to include 1 new or modified test files. | ||| _ trunk Compile Tests _ | | -1 | mvninstall | 55 | hadoop-hdds in trunk failed. | | -1 | mvninstall | 41 | hadoop-ozone in trunk failed. | | -1 | compile | 21 | hadoop-hdds in trunk failed. | | -1 | compile | 16 | hadoop-ozone in trunk failed. | | -0 | checkstyle | 37 | The patch fails to run checkstyle in hadoop-ozone | | +1 | mvnsite | 0 | trunk passed | | +1 | shadedclient | 865 | branch has no errors when building and testing our client artifacts. | | -1 | javadoc | 22 | hadoop-hdds in trunk failed. | | -1 | javadoc | 20 | hadoop-ozone in trunk failed. | | 0 | spotbugs | 968 | Used deprecated FindBugs config; considering switching to SpotBugs. | | -1 | findbugs | 38 | hadoop-hdds in trunk failed. | | -1 | findbugs | 18 | hadoop-ozone in trunk failed. | ||| _ Patch Compile Tests _ | | -1 | mvninstall | 36 | hadoop-hdds in the patch failed. | | -1 | mvninstall | 37 | hadoop-ozone in the patch failed. | | -1 | compile | 22 | hadoop-hdds in the patch failed. | | -1 | compile | 17 | hadoop-ozone in the patch failed. | | -1 | javac | 22 | hadoop-hdds in the patch failed. | | -1 | javac | 17 | hadoop-ozone in the patch failed. | | -0 | checkstyle | 30 | The patch fails to run checkstyle in hadoop-ozone | | +1 | mvnsite | 0 | the patch passed | | +1 | whitespace | 0 | The patch has no whitespace issues. | | +1 | shadedclient | 730 | patch has no errors when building and testing our client artifacts. | | -1 | javadoc | 20 | hadoop-hdds in the patch failed. | | -1 | javadoc | 22 | hadoop-ozone in the patch failed. | | -1 | findbugs | 32 | hadoop-hdds in the patch failed. | | -1 | findbugs | 19 | hadoop-ozone in the patch failed. | ||| _ Other Tests _ | | -1 | unit | 28 | hadoop-hdds in the patch failed. | | -1 | unit | 26 | hadoop-ozone in the patch failed. | | +1 | asflicense | 35 | The patch does not generate ASF License warnings. | | | | 2457 | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.2 Server=19.03.2 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/1605 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 018e3fba29bc 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | personality/hadoop.sh | | git revision | trunk / 55c5436 | | Default Java | 1.8.0_222 | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-mvninstall-hadoop-hdds.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-mvninstall-hadoop-ozone.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-compile-hadoop-hdds.txt | | compile | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-compile-hadoop-ozone.txt | | checkstyle | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out//home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1605/out/maven-branch-checkstyle-hadoop-ozone.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-javadoc-hadoop-hdds.txt | | javadoc | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-javadoc-hadoop-ozone.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-findbugs-hadoop-hdds.txt | | findbugs | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-findbugs-hadoop-ozone.txt | | mvninstall | https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1
[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum
[ https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324025 ] ASF GitHub Bot logged work on HDDS-2259: Author: ASF GitHub Bot Created on: 06/Oct/19 08:21 Start Date: 06/Oct/19 08:21 Worklog Time Spent: 10m Work Description: adoroszlai commented on issue #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605#issuecomment-538722906 /label ozone This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 324025) Time Spent: 20m (was: 10m) > Container Data Scrubber computes wrong checksum > --- > > Key: HDDS-2259 > URL: https://issues.apache.org/jira/browse/HDDS-2259 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Chunk checksum verification fails for (almost) any file. This is caused by > computing checksum for the entire buffer, regardless of the actual size of > the chunk. > {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273} > byte[] buffer = new byte[cData.getBytesPerChecksum()]; > ... > v = fs.read(buffer); > ... > bytesRead += v; > ... > ByteString actual = cal.computeChecksum(buffer) > .getChecksums().get(0); > {code} > This results in marking all closed containers as unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum
[ https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324024 ] ASF GitHub Bot logged work on HDDS-2259: Author: ASF GitHub Bot Created on: 06/Oct/19 08:20 Start Date: 06/Oct/19 08:20 Worklog Time Spent: 10m Work Description: adoroszlai commented on pull request #1605: HDDS-2259. Container Data Scrubber computes wrong checksum URL: https://github.com/apache/hadoop/pull/1605 ## What changes were proposed in this pull request? Compute checksum in container scrubber only for the actual length of data read. Otherwise, if the actual chunk size is not an integer multiple of the number of bytes per checksum (ie. buffer size), leftover data in the buffer results in wrong checksum and unhealthy containers. ``` Corruption detected in container: [1] Exception: [Inconsistent read for chunk=102914246583189504_chunk_1 len=671 expected checksum [0, 0, 0, 0, -14, -102, -99, -51] actual checksum [0, 0, 0, 0, 23, -23, 53, -79] for block conID: 1 locID: 102914246583189504 bcsId: 3] ``` https://issues.apache.org/jira/browse/HDDS-2259 ## How was this patch tested? 1. Changed unit test to reproduce the problem by making sure that "bytes per checksum" and "chunk size" are different. 2. Tested manually 1. Created and closed containers with small (<1KB), medium (~7MB) and large (100MB) files. 2. Verified that container scanner does not mark any of these unhealthy. 3. Appended some garbage data to one of the chunk files. 4. Verified that container scanner marks the corrupted container as unhealthy. ``` ozone sh volume create vol1 ozone sh bucket create vol1/bucket1 ozone sh key put vol1/bucket1/small /etc/passwd ozone scmcli container close 1 ozone sh key put vol1/bucket1/medium /opt/hadoop/share/ozone/lib/hadoop-hdfs-client-3.2.0.jar ozone scmcli container close 2 ozone sh key put vol1/bucket1/large /opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar ozone scmcli container close 3 # later echo asdfasdf >> /data/hdds/hdds/*/current/containerDir0/2/chunks/*_chunk_1 ``` Log: ``` Completed an iteration of container data scrubber in 1 minutes. Number of iterations (since the data-node restart) : 16, Number of containers scanned in this iteration : 3, Number of unhealthy containers found in this iteration : 0 ... Corruption detected in container: [2] Exception: [Inconsistent read for chunk=102914295727980545_chunk_1 len=5023516 expected checksum [0, 0, 0, 0, 21, 105, -33, 7] actual checksum [0, 0, 0, 0, -103, -121, 23, -96] for block conID: 2 locID: 102914295727980545 bcsId: 9] Completed an iteration of container data scrubber in 1 minutes. Number of iterations (since the data-node restart) : 19, Number of containers scanned in this iteration : 3, Number of unhealthy containers found in this iteration : 1 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 324024) Remaining Estimate: 0h Time Spent: 10m > Container Data Scrubber computes wrong checksum > --- > > Key: HDDS-2259 > URL: https://issues.apache.org/jira/browse/HDDS-2259 > Project: Hadoop Distributed Data Store > Issue Type: Sub-task > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Chunk checksum verification fails for (almost) any file. This is caused by > computing checksum for the entire buffer, regardless of the actual size of > the chunk. > {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273} > byte[] buffer = new byte[cData.getBytesPerChecksum()]; > ... > v = fs.read(buffer); > ... > bytesRead += v; > ... > ByteString actual = cal.computeChecksum(buffer) > .getChecksums().get(0); > {code} > This results in marking all closed containers as unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) -