[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324883&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324883
 ]

ASF GitHub Bot logged work on HDDS-2259:


Author: ASF GitHub Bot
Created on: 08/Oct/19 06:38
Start Date: 08/Oct/19 06:38
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on issue #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605#issuecomment-539363289
 
 
   Thanks @anuengineer for reviewing and committing it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324883)
Time Spent: 1h  (was: 50m)

> Container Data Scrubber computes wrong checksum
> ---
>
> Key: HDDS-2259
> URL: https://issues.apache.org/jira/browse/HDDS-2259
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file.  This is caused by 
> computing checksum for the entire buffer, regardless of the actual size of 
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
> byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
> v = fs.read(buffer);
> ...
> bytesRead += v;
> ...
> ByteString actual = cal.computeChecksum(buffer)
> .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324624&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324624
 ]

ASF GitHub Bot logged work on HDDS-2259:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:36
Start Date: 07/Oct/19 21:36
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on issue #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605#issuecomment-539214958
 
 
   +1. LGTM. Thank you for fixing this very important issue. I have committed 
this patch to the trunk.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324624)
Time Spent: 40m  (was: 0.5h)

> Container Data Scrubber computes wrong checksum
> ---
>
> Key: HDDS-2259
> URL: https://issues.apache.org/jira/browse/HDDS-2259
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file.  This is caused by 
> computing checksum for the entire buffer, regardless of the actual size of 
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
> byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
> v = fs.read(buffer);
> ...
> bytesRead += v;
> ...
> ByteString actual = cal.computeChecksum(buffer)
> .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

2019-10-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324625&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324625
 ]

ASF GitHub Bot logged work on HDDS-2259:


Author: ASF GitHub Bot
Created on: 07/Oct/19 21:36
Start Date: 07/Oct/19 21:36
Worklog Time Spent: 10m 
  Work Description: anuengineer commented on pull request #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324625)
Time Spent: 50m  (was: 40m)

> Container Data Scrubber computes wrong checksum
> ---
>
> Key: HDDS-2259
> URL: https://issues.apache.org/jira/browse/HDDS-2259
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file.  This is caused by 
> computing checksum for the entire buffer, regardless of the actual size of 
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
> byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
> v = fs.read(buffer);
> ...
> bytesRead += v;
> ...
> ByteString actual = cal.computeChecksum(buffer)
> .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

2019-10-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324027
 ]

ASF GitHub Bot logged work on HDDS-2259:


Author: ASF GitHub Bot
Created on: 06/Oct/19 09:02
Start Date: 06/Oct/19 09:02
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on issue #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605#issuecomment-538725824
 
 
   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |::|--:|:|:|
   | 0 | reexec | 88 | Docker mode activated. |
   ||| _ Prechecks _ |
   | +1 | dupname | 0 | No case conflicting files found. |
   | +1 | @author | 0 | The patch does not contain any @author tags. |
   | +1 | test4tests | 0 | The patch appears to include 1 new or modified test 
files. |
   ||| _ trunk Compile Tests _ |
   | -1 | mvninstall | 55 | hadoop-hdds in trunk failed. |
   | -1 | mvninstall | 41 | hadoop-ozone in trunk failed. |
   | -1 | compile | 21 | hadoop-hdds in trunk failed. |
   | -1 | compile | 16 | hadoop-ozone in trunk failed. |
   | -0 | checkstyle | 37 | The patch fails to run checkstyle in hadoop-ozone |
   | +1 | mvnsite | 0 | trunk passed |
   | +1 | shadedclient | 865 | branch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 22 | hadoop-hdds in trunk failed. |
   | -1 | javadoc | 20 | hadoop-ozone in trunk failed. |
   | 0 | spotbugs | 968 | Used deprecated FindBugs config; considering 
switching to SpotBugs. |
   | -1 | findbugs | 38 | hadoop-hdds in trunk failed. |
   | -1 | findbugs | 18 | hadoop-ozone in trunk failed. |
   ||| _ Patch Compile Tests _ |
   | -1 | mvninstall | 36 | hadoop-hdds in the patch failed. |
   | -1 | mvninstall | 37 | hadoop-ozone in the patch failed. |
   | -1 | compile | 22 | hadoop-hdds in the patch failed. |
   | -1 | compile | 17 | hadoop-ozone in the patch failed. |
   | -1 | javac | 22 | hadoop-hdds in the patch failed. |
   | -1 | javac | 17 | hadoop-ozone in the patch failed. |
   | -0 | checkstyle | 30 | The patch fails to run checkstyle in hadoop-ozone |
   | +1 | mvnsite | 0 | the patch passed |
   | +1 | whitespace | 0 | The patch has no whitespace issues. |
   | +1 | shadedclient | 730 | patch has no errors when building and testing 
our client artifacts. |
   | -1 | javadoc | 20 | hadoop-hdds in the patch failed. |
   | -1 | javadoc | 22 | hadoop-ozone in the patch failed. |
   | -1 | findbugs | 32 | hadoop-hdds in the patch failed. |
   | -1 | findbugs | 19 | hadoop-ozone in the patch failed. |
   ||| _ Other Tests _ |
   | -1 | unit | 28 | hadoop-hdds in the patch failed. |
   | -1 | unit | 26 | hadoop-ozone in the patch failed. |
   | +1 | asflicense | 35 | The patch does not generate ASF License warnings. |
   | | | 2457 | |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | Client=19.03.2 Server=19.03.2 base: 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/1605 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient findbugs checkstyle |
   | uname | Linux 018e3fba29bc 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | personality/hadoop.sh |
   | git revision | trunk / 55c5436 |
   | Default Java | 1.8.0_222 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-mvninstall-hadoop-hdds.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-mvninstall-hadoop-ozone.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-compile-hadoop-hdds.txt
 |
   | compile | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-compile-hadoop-ozone.txt
 |
   | checkstyle | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out//home/jenkins/jenkins-slave/workspace/hadoop-multibranch_PR-1605/out/maven-branch-checkstyle-hadoop-ozone.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-javadoc-hadoop-hdds.txt
 |
   | javadoc | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-javadoc-hadoop-ozone.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-findbugs-hadoop-hdds.txt
 |
   | findbugs | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1/artifact/out/branch-findbugs-hadoop-ozone.txt
 |
   | mvninstall | 
https://builds.apache.org/job/hadoop-multibranch/job/PR-1605/1

[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

2019-10-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324025&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324025
 ]

ASF GitHub Bot logged work on HDDS-2259:


Author: ASF GitHub Bot
Created on: 06/Oct/19 08:21
Start Date: 06/Oct/19 08:21
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on issue #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605#issuecomment-538722906
 
 
   /label ozone
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324025)
Time Spent: 20m  (was: 10m)

> Container Data Scrubber computes wrong checksum
> ---
>
> Key: HDDS-2259
> URL: https://issues.apache.org/jira/browse/HDDS-2259
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file.  This is caused by 
> computing checksum for the entire buffer, regardless of the actual size of 
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
> byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
> v = fs.read(buffer);
> ...
> bytesRead += v;
> ...
> ByteString actual = cal.computeChecksum(buffer)
> .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDDS-2259) Container Data Scrubber computes wrong checksum

2019-10-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2259?focusedWorklogId=324024&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-324024
 ]

ASF GitHub Bot logged work on HDDS-2259:


Author: ASF GitHub Bot
Created on: 06/Oct/19 08:20
Start Date: 06/Oct/19 08:20
Worklog Time Spent: 10m 
  Work Description: adoroszlai commented on pull request #1605: HDDS-2259. 
Container Data Scrubber computes wrong checksum
URL: https://github.com/apache/hadoop/pull/1605
 
 
   ## What changes were proposed in this pull request?
   
   Compute checksum in container scrubber only for the actual length of data 
read.  Otherwise, if the actual chunk size is not an integer multiple of the 
number of bytes per checksum (ie. buffer size), leftover data in the buffer 
results in wrong checksum and unhealthy containers.
   
   ```
   Corruption detected in container: [1] Exception: [Inconsistent read for 
chunk=102914246583189504_chunk_1 len=671 expected checksum [0, 0, 0, 0, -14, 
-102, -99, -51] actual checksum [0, 0, 0, 0, 23, -23, 53, -79] for block conID: 
1 locID: 102914246583189504 bcsId: 3]
   ```
   
   https://issues.apache.org/jira/browse/HDDS-2259
   
   ## How was this patch tested?
   
   1. Changed unit test to reproduce the problem by making sure that "bytes per 
checksum" and "chunk size" are different.
   2. Tested manually
  1. Created and closed containers with small (<1KB), medium (~7MB) and 
large (100MB) files.
  2. Verified that container scanner does not mark any of these unhealthy.
  3. Appended some garbage data to one of the chunk files.
  4. Verified that container scanner marks the corrupted container as 
unhealthy.
   
   ```
   ozone sh volume create vol1
   ozone sh bucket create vol1/bucket1
   ozone sh key put vol1/bucket1/small /etc/passwd
   ozone scmcli container close 1
   ozone sh key put vol1/bucket1/medium 
/opt/hadoop/share/ozone/lib/hadoop-hdfs-client-3.2.0.jar
   ozone scmcli container close 2
   ozone sh key put vol1/bucket1/large 
/opt/hadoop/share/ozone/lib/hadoop-ozone-filesystem-lib-legacy-0.5.0-SNAPSHOT.jar
   ozone scmcli container close 3
   # later
   echo asdfasdf >> /data/hdds/hdds/*/current/containerDir0/2/chunks/*_chunk_1 
   ```
   
   Log:
   
   ```
   Completed an iteration of container data scrubber in 1 minutes. Number of 
iterations (since the data-node restart) : 16, Number of containers scanned in 
this iteration : 3, Number of unhealthy containers found in this iteration : 0
   ...
   Corruption detected in container: [2] Exception: [Inconsistent read for 
chunk=102914295727980545_chunk_1 len=5023516 expected checksum [0, 0, 0, 0, 21, 
105, -33, 7] actual checksum [0, 0, 0, 0, -103, -121, 23, -96] for block conID: 
2 locID: 102914295727980545 bcsId: 9]
   Completed an iteration of container data scrubber in 1 minutes. Number of 
iterations (since the data-node restart) : 19, Number of containers scanned in 
this iteration : 3, Number of unhealthy containers found in this iteration : 1
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 324024)
Remaining Estimate: 0h
Time Spent: 10m

> Container Data Scrubber computes wrong checksum
> ---
>
> Key: HDDS-2259
> URL: https://issues.apache.org/jira/browse/HDDS-2259
> Project: Hadoop Distributed Data Store
>  Issue Type: Sub-task
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Chunk checksum verification fails for (almost) any file.  This is caused by 
> computing checksum for the entire buffer, regardless of the actual size of 
> the chunk.
> {code:title=https://github.com/apache/hadoop/blob/55c5436f39120da0d7dabf43d7e5e6404307123b/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainerCheck.java#L259-L273}
> byte[] buffer = new byte[cData.getBytesPerChecksum()];
> ...
> v = fs.read(buffer);
> ...
> bytesRead += v;
> ...
> ByteString actual = cal.computeChecksum(buffer)
> .getChecksums().get(0);
> {code}
> This results in marking all closed containers as unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-