[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773391#comment-17773391 ]
ASF GitHub Bot commented on HADOOP-18910: ----------------------------------------- steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1350484904 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ########## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List<AbfsHttpHeader> requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { + try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { + dataToBeWritten = buffer; + } else { + System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, + reqParams.getLength()); + } + + byte[] md5Bytes = md5Digest.digest(dataToBeWritten); + String md5Hash = Base64.getEncoder().encodeToString(md5Bytes); + requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); + } catch (NoSuchAlgorithmException ex) { + throw new AbfsRuntimeException(ex); + } + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { + // Number of bytes returned by server could be less than or equal to what + // caller requests. In case it is less, extra bytes will be initialized to 0 + // Server returned MD5 Hash will be computed on what server returned. + // We need to get exact data that server returned and compute its md5 hash + // Computed hash should be equal to what server returned + int numberOfBytesRead = (int)result.getBytesReceived(); + if (numberOfBytesRead == 0) { + return; + } + byte[] dataRead = new byte[numberOfBytesRead]; + + if (bufferOffset == 0 && numberOfBytesRead == buffer.length) { + dataRead = buffer; + } else { + System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead); + } + + try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); Review Comment: ok > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > ------------------------------------------------------------------------------------------------------- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Reporter: Anuj Modi > Assignee: Anuj Modi > Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org