[PR] HADOOP-19604. ABFS: BlockId generation based on blockCount along with full blob md5 computation change [hadoop]

via GitHub Wed, 23 Jul 2025 00:51:32 -0700


anmolanmol1234 opened a new pull request, #7819:
URL: https://github.com/apache/hadoop/pull/7819


   Jira :- https://issues.apache.org/jira/browse/HADOOP-19604
   
   BlockId computation to be consistent across clients for PutBlock and 
PutBlockList so made use of blockCount instead of offset.
   Block IDs were previously derived from the data offset, which could lead to 
inconsistency across different clients. The change now uses blockCount (i.e., 
the index of the block) to compute the Block ID, ensuring deterministic and 
consistent ID generation for both PutBlock and PutBlockList operations across 
clients.
   
   Restrict URL encoding of certain JSON metadata during setXAttr calls.
   When setting extended attributes (xAttrs), the JSON metadata 
(hdi_permission) was previously URL-encoded, which could cause unnecessary 
escaping or compatibility issues. This change ensures that only required 
metadata are encoded.
   
   Maintain the MD5 hash of the whole block to validate data integrity during 
flush.
   During flush operations, the MD5 hash of the entire block's data is computed 
and stored. This hash is later used to validate that the block correctly 
persisted, ensuring data integrity and helping detect corruption or 
transmission errors.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[PR] HADOOP-19604. ABFS: BlockId generation based on blockCount along with full blob md5 computation change [hadoop]

Reply via email to