[ 
https://issues.apache.org/jira/browse/HDFS-16533?focusedWorklogId=771431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-771431
 ]

ASF GitHub Bot logged work on HDFS-16533:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 17/May/22 15:22
            Start Date: 17/May/22 15:22
    Worklog Time Spent: 10m 
      Work Description: ZanderXu commented on code in PR #4155:
URL: https://github.com/apache/hadoop/pull/4155#discussion_r874957305


##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FileChecksumHelper.java:
##########
@@ -316,18 +317,22 @@ FileChecksum makeCompositeCrcResult() throws IOException {
             "Added blockCrc 0x{} for block index {} of size {}",
             Integer.toString(blockCrc, 16), i, block.getBlockSize());
       }
-
-      // NB: In some cases the located blocks have their block size adjusted
-      // explicitly based on the requested length, but not all cases;
-      // these numbers may or may not reflect actual sizes on disk.
-      long reportedLastBlockSize =
-          blockLocations.getLastLocatedBlock().getBlockSize();
-      long consumedLastBlockLength = reportedLastBlockSize;
-      if (length - sumBlockLengths < reportedLastBlockSize) {
-        LOG.warn(
-            "Last block length {} is less than reportedLastBlockSize {}",
-            length - sumBlockLengths, reportedLastBlockSize);
-        consumedLastBlockLength = length - sumBlockLengths;
+      LocatedBlock nextBlock = locatedBlocks.get(i);
+      long consumedLastBlockLength = Math.min(length - sumBlockLengths,
+          nextBlock.getBlockSize());
+      LocatedBlock lastBlock = blockLocations.getLastLocatedBlock();
+      if (nextBlock.equals(lastBlock)) {

Review Comment:
   Whether it is a replicated file or striped file, for a block, we will obtain 
a 4-bytes composer crc, and the actual size corresponding to the crc is very 
important, because line 336 will use it to compute the composer crc.
   
   Suppose a file has 4 blocks, number block1, block2, block3 and block4 
respectively, and the size of each blocks is 10MB, 10MB, 10MB, 7MB. And i use 
getFilecheck(mockFile, 29MB).  The correct consumedLastBlockLength in line 336 
should be 9MB, but the result of the current logic is 7MB which from the last 
block size of the file. So we will get an error composer crc.





Issue Time Tracking
-------------------

    Worklog Id:     (was: 771431)
    Time Spent: 2h  (was: 1h 50m)

> COMPOSITE_CRC failed between replicated file and striped file.
> --------------------------------------------------------------
>
>                 Key: HDFS-16533
>                 URL: https://issues.apache.org/jira/browse/HDFS-16533
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, hdfs-client
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HDFS-16533.001.patch
>
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> After testing the COMPOSITE_CRC with some random length between replicated 
> file and striped file which has same data with replicated file, it failed. 
> Reproduce step like this:
> {code:java}
> @Test(timeout = 90000)
> public void testStripedAndReplicatedFileChecksum2() throws Exception {
>   int abnormalSize = (dataBlocks * 2 - 2) * blockSize +
>       (int) (blockSize * 0.5);
>   prepareTestFiles(abnormalSize, new String[] {stripedFile1, replicatedFile});
>   int loopNumber = 100;
>   while (loopNumber-- > 0) {
>     int verifyLength = ThreadLocalRandom.current()
>         .nextInt(10, abnormalSize);
>     FileChecksum stripedFileChecksum1 = getFileChecksum(stripedFile1,
>         verifyLength, false);
>     FileChecksum replicatedFileChecksum = getFileChecksum(replicatedFile,
>         verifyLength, false);
>     if (checksumCombineMode.equals(ChecksumCombineMode.COMPOSITE_CRC.name())) 
> {
>       Assert.assertEquals(stripedFileChecksum1, replicatedFileChecksum);
>     } else {
>       Assert.assertNotEquals(stripedFileChecksum1, replicatedFileChecksum);
>     }
>   }
> } {code}
> And after tracing the root cause, `FileChecksumHelper#makeCompositeCrcResult` 
> maybe compute an error `consumedLastBlockLength` when updating checksum for 
> the last block of the fixed length which maybe not the last block in the file.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to