[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826958#comment-17826958 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1996612955 > e more t Working on all the test fixes. Will create a common PR for all these related Jira. If required, will create a new Jira and link all these to that one. Thanks for reporting and patience. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825768#comment-17825768 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1992362377 think its time for the default for namespace.enabled to become true? I'd support that > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825421#comment-17825421 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1989236236 > > > You also need to add following test configuration to specify the account type you are using `fs.azure.test.namespace.enabled "` > > > ``` > > > > > > fs.azure.test.namespace.enabled > > > true > > > > > > ``` > > > > > > Yes after adding this, ITestGetNameSpaceEnabled succeed. But shouldn't tests be written in a way that either it succeeds or gets skipped if this value is not set in auth-keys.xml > > Hi @mukund-thakur That makes sense. We will take up a WI to skip tests if required configs are not set. > > Having said that, we recommend that developers making changes in driver, should follow the configuration template we have designed so that they do not end up skipping a lot of tests. Recommendation is to follow the template and set all the configs mentioned there and then use our test scripts to run the whole test suite. That will ensure easy test runs with all the combinations of account types and auth types.. > > We will also make sure to update the documentations regarding ABFS Testing with this recommendation. > > Thanks for pointing this out. Have you created the Jira's for these test issues? Found one more today https://issues.apache.org/jira/browse/HADOOP-19106 > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823874#comment-17823874 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1980185382 @mukund-thakur Backmerge PR: https://github.com/apache/hadoop/pull/6611 Created a common PR for both commits as they tend to have conflicts > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821944#comment-17821944 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1970491157 > > You also need to add following test configuration to specify the account type you are using `fs.azure.test.namespace.enabled "` > > ``` > > > > fs.azure.test.namespace.enabled > > true > > > > ``` > > Yes after adding this, ITestGetNameSpaceEnabled succeed. But shouldn't tests be written in a way that either it succeeds or gets skipped if this value is not set in auth-keys.xml Hi Mukund... That makes sense. We will take up a WI to skip tests if required configs are not set. Having said that, we recommend that developers making changes in driver, should follow the configuration template we have designed so that they do not end up skipping a lot of tests. Recommendation is to follow the template and set all the configs mentioned there and then use our test scripts to run the whole test suite. That will ensure easy test runs with all the combinations of account types and auth types.. We will also make sure to update the documentations regarding ABFS Testing with this recommendation. Thanks for pointing this out. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821419#comment-17821419 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1967594293 > You also need to add following test configuration to specify the account type you are using `fs.azure.test.namespace.enabled "` > > ``` > > fs.azure.test.namespace.enabled > true > > ``` Yes after adding this, ITestGetNameSpaceEnabled succeed. But shouldn't tests be written in a way that either it succeeds or gets skipped if this value is not set in auth-keys.xml > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820993#comment-17820993 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1965889640 You also need to add following test configuration to specify the account type you are using ``fs.azure.test.namespace.enabled "`` ``` fs.azure.test.namespace.enabled true ``` > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820832#comment-17820832 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1965105331 ``` fs.azure.abfs.account.name account_name.dfs.core.windows.net fs.azure.account.auth.type.account_name.dfs.core.windows.net SharedKey fs.azure.account.key.account_name.dfs.core.windows.net xxx fs.azure.wasb.account.name account_name.blob.core.windows.net fs.contract.test.fs.abfs abfs://abfs-test@account_name.dfs.core.windows.net A file system URI to be used by the contract tests. fs.azure.scale.test.enabled true ``` This is my auth-keys for a HNS account. and all of the above 3 tests fails from me even on trunk. > ITestAzureBlobFileSystemLease.testTwoCreate ITestExponentialRetryPolicy.testThrottlingIntercept ITestGetNameSpaceEnabled > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820595#comment-17820595 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1963472510 Will run the test suite again on the backport PR to 3.4 as well... > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820592#comment-17820592 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1963467491 AGGREGATED TEST RESULT On branch-3.4 HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 HNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 586, Failures: 0, Errors: 0, Skipped: 266 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44 NonHNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11 [INFO] Results: [INFO] [WARNING] Tests run: 586, Failures: 0, Errors: 0, Skipped: 266 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44 AppendBlob-HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 Time taken: 21 mins 25 secs. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820581#comment-17820581 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1963452101 > can you please create a backport PR on branch-3.4 and run the tests? Sure, Mukund. Will create one. Regarding, the failures you indicated above. I do not see these tests failing for me on either trunk or branch-3.4 (as of 26th Feb) Can you please share the configs you are using to run these tests? Like Auth type, Account type etc.?? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820145#comment-17820145 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1961713969 can you please create a backport PR on branch-3.4 and run the tests? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819910#comment-17819910 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1960744484 > Seeing these failures in branch-3.4 after backporting this and #5881. These failures are happening even without these changes. @anujmodi2021 Can you figure out what other commits are missing in branch-3.4 or these are genuine failures? ITestExponentialRetryPolicy has been renamed recently in #5881 from UT to IT but it is still failing in older version. Do I need to add some extra keys in auth-keys.xml? > > `[ERROR] Failures: [ERROR] ITestAzureBlobFileSystemLease.testTwoCreate:142 Expected to find 'There is currently a lease on the resource and no lease ID was specified in the request' but got unexpected exception: org.apache.hadoop.fs.PathIOException: `abfs://[abfs-testcontainer-5d2e6422-c3f1-4670-a9a7-4bb79a367...@mthakurdata.dfs.core.windows.net](mailto:abfs-testcontainer-5d2e6422-c3f1-4670-a9a7-4bb79a367...@mthakurdata.dfs.core.windows.net)/fork-0001/test/testTwoCreate71defab45746/testfile': Input/output error: Parallel access to the create path detected. Failing request to honor single writer semantics at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1538) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1231) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1208) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1089) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1076) at org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease.lambda$testTwoCreate$1(ITestAzureBlobFileSystemLease.java:144) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:498) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:453) at org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease.testTwoCreate(ITestAzureBlobFileSystemLease.java:142) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:750) Caused by: Parallel access to the create path detected. Failing request to honor single writer semantics at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.conditionalCreateOverwriteFile(AzureBlobFileSystemStore.java:711) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:622) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:341) ... 21 more > > [ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsFalse:98->unsetAndAssert:109 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[fals]e> but was:<[tru]e> [ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsTrue:88->unsetAndAssert:109 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[fals]e> but was:<[tru]e> [ERROR] ITestGetNameSpaceEnabled.testNonXNSAccount:77->Assert.assertFalse:65->Assert.assertTrue:42->Assert.fail:89 Expecting getIsNamespaceEnabled() return false [ERROR] Errors: [ERROR] ITestExponentialRetryPolicy.testThrottlingIntercept:106 » KeyProvider Failure ... [INFO] [ERROR] Tests run: 27, Failures: 4, Errors: 1, Skipped: 3 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 91.812 s <<< FAILURE! - in org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy [ERROR] testThrottlingIntercept(org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy) Time elapsed: 0.93 s <<< ERROR! Failure to initialize configuration for dummy.dfs.core.windows.net key ="null": Invalid configuration value detected for fs.azure.acc
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819814#comment-17819814 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1960305686 Seeing these failures in branch-3.4 after backporting this and https://github.com/apache/hadoop/pull/5881. These failures are happening even without these changes. @anujmodi2021 Can you figure out what other commits are missing in branch-3.4 or these are genuine failures? ITestExponentialRetryPolicy has been renamed recently in https://github.com/apache/hadoop/pull/5881 from UT to IT but it is still failing in older version. Do I need to add some extra keys in auth-keys.xml? `[ERROR] Failures: [ERROR] ITestAzureBlobFileSystemLease.testTwoCreate:142 Expected to find 'There is currently a lease on the resource and no lease ID was specified in the request' but got unexpected exception: org.apache.hadoop.fs.PathIOException: `abfs://abfs-testcontainer-5d2e6422-c3f1-4670-a9a7-4bb79a367...@mthakurdata.dfs.core.windows.net/fork-0001/test/testTwoCreate71defab45746/testfile': Input/output error: Parallel access to the create path detected. Failing request to honor single writer semantics at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1538) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:347) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1231) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1208) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1089) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1076) at org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease.lambda$testTwoCreate$1(ITestAzureBlobFileSystemLease.java:144) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:498) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:384) at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:453) at org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemLease.testTwoCreate(ITestAzureBlobFileSystemLease.java:142) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:750) Caused by: Parallel access to the create path detected. Failing request to honor single writer semantics at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.conditionalCreateOverwriteFile(AzureBlobFileSystemStore.java:711) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:622) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:341) ... 21 more [ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsFalse:98->unsetAndAssert:109 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[fals]e> but was:<[tru]e> [ERROR] ITestGetNameSpaceEnabled.testGetIsNamespaceEnabledWhenConfigIsTrue:88->unsetAndAssert:109 [getIsNamespaceEnabled should return the value configured for fs.azure.test.namespace.enabled] expected:<[fals]e> but was:<[tru]e> [ERROR] ITestGetNameSpaceEnabled.testNonXNSAccount:77->Assert.assertFalse:65->Assert.assertTrue:42->Assert.fail:89 Expecting getIsNamespaceEnabled() return false [ERROR] Errors: [ERROR] ITestExponentialRetryPolicy.testThrottlingIntercept:106 » KeyProvider Failure ... [INFO] [ERROR] Tests run: 27, Failures: 4, Errors: 1, Skipped: 3 [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 91.812 s <<< FAILURE! - in org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy [ERROR] testThrottlingIntercept(org.apache.hadoop.fs.
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819753#comment-17819753 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur merged PR #6069: URL: https://github.com/apache/hadoop/pull/6069 > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819536#comment-17819536 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1958993080 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 55s | | trunk passed | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 32s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 36s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 32m 47s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 33m 8s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 19s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 23s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 8s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 127m 47s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/22/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 014a619b3b58 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 59a905ff228277dfb90171db84634592d5411297 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/22/testReport/ | | Max. process+thread count | 555 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819483#comment-17819483 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1498704079 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/AbfsInvalidChecksumException.java: ## @@ -0,0 +1,56 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.hadoop.fs.azurebfs.contracts.exceptions; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.azurebfs.contracts.services.AzureServiceErrorCode; + +/** + * Exception to wrap invalid checksum verification on client side. + */ +@InterfaceAudience.Public +@InterfaceStability.Evolving +public class AbfsInvalidChecksumException extends AbfsRestOperationException { + + private static final String ERROR_MESSAGE = "Checksum Validation Failed, MD5 Mismatch Error"; + + public AbfsInvalidChecksumException(final AbfsRestOperationException abfsRestOperationException) { +super( +abfsRestOperationException != null +? abfsRestOperationException.getStatusCode() +: AzureServiceErrorCode.UNKNOWN.getStatusCode(), +abfsRestOperationException != null +? abfsRestOperationException.getErrorCode().getErrorCode() +: AzureServiceErrorCode.UNKNOWN.getErrorCode(), +abfsRestOperationException != null +? abfsRestOperationException.toString() Review Comment: toString() will be better it has more information like the URL which was hit. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819386#comment-17819386 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1957878931 `./hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java:28:import java.nio.charset.StandardCharsets;:8: Unused import - java.nio.charset.StandardCharsets. [UnusedImports]` please fix this checkstyle error. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819385#comment-17819385 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1498282476 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStream.java: ## @@ -279,3 +283,17 @@ private void verifyAfterSeek(AbfsInputStream abfsInputStream, long seekPos) thro assertEquals(0, abfsInputStream.getBCursor()); } } + Review Comment: cut this ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStream.java: ## @@ -279,3 +283,17 @@ private void verifyAfterSeek(AbfsInputStream abfsInputStream, long seekPos) thro assertEquals(0, abfsInputStream.getBCursor()); } } + +// +// +//<<< HEAD +//private AzureBlobFileSystem getFileSystem(boolean optimizeFooterRead, +//int fileSize) throws IOException { +//final AzureBlobFileSystem fs = getFileSystem(); +//getAbfsStore(fs).getAbfsConfiguration() +//.setOptimizeFooterRead(optimizeFooterRead); +//getAbfsStore(fs).getAbfsConfiguration() +//.setIsChecksumValidationEnabled(true); +//if (fileSize <= getAbfsStore(fs).getAbfsConfiguration() +//.getReadBufferSize()) { +//=== Review Comment: cut this > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819384#comment-17819384 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1498282476 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/ITestAbfsInputStream.java: ## @@ -279,3 +283,17 @@ private void verifyAfterSeek(AbfsInputStream abfsInputStream, long seekPos) thro assertEquals(0, abfsInputStream.getBCursor()); } } + Review Comment: cut this > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819379#comment-17819379 ] ASF GitHub Bot commented on HADOOP-18910: - mukund-thakur commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1498253744 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/AbfsInvalidChecksumException.java: ## @@ -0,0 +1,56 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.hadoop.fs.azurebfs.contracts.exceptions; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.azurebfs.contracts.services.AzureServiceErrorCode; + +/** + * Exception to wrap invalid checksum verification on client side. + */ +@InterfaceAudience.Public +@InterfaceStability.Evolving +public class AbfsInvalidChecksumException extends AbfsRestOperationException { + + private static final String ERROR_MESSAGE = "Checksum Validation Failed, MD5 Mismatch Error"; + + public AbfsInvalidChecksumException(final AbfsRestOperationException abfsRestOperationException) { +super( +abfsRestOperationException != null +? abfsRestOperationException.getStatusCode() +: AzureServiceErrorCode.UNKNOWN.getStatusCode(), +abfsRestOperationException != null +? abfsRestOperationException.getErrorCode().getErrorCode() +: AzureServiceErrorCode.UNKNOWN.getErrorCode(), +abfsRestOperationException != null +? abfsRestOperationException.toString() Review Comment: do we need full toString here or just ex.getMessage() > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819069#comment-17819069 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1955881493 @steveloughran @mukund-thakur Gentle reminder to review this PR and get it merged. Thanks a lot. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809753#comment-17809753 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1905370470 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 16m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 19s | | trunk passed | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 38s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 33s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 32m 9s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 32m 28s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 27s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 18s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/21/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8) | | +1 :green_heart: | mvnsite | 0m 29s | | the patch passed | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 32m 13s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 1s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 137m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/21/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux d6110f155599 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5cb0d2bfc1131f9387033f299dac0113499640ed | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multi
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809745#comment-17809745 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1905329385 AGGREGATED TEST RESULT HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 HNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 NonHNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11 [INFO] Results: [INFO] [WARNING] Tests run: 592, Failures: 0, Errors: 0, Skipped: 266 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44 AppendBlob-HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 Time taken: 25 mins 26 secs. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802168#comment-17802168 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1440430936 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1074,11 +1079,14 @@ public AbfsRestOperation read(final String path, ContextEncryptionAdapter contextEncryptionAdapter, TracingContext tracingContext) throws AzureBlobFileSystemException { final List requestHeaders = createDefaultHeaders(); -addCustomerProvidedKeyHeaders(requestHeaders); AbfsHttpHeader rangeHeader = new AbfsHttpHeader(RANGE, String.format("bytes=%d-%d", position, position + bufferLength - 1)); requestHeaders.add(rangeHeader); +addEncryptionKeyRequestHeaders(path, requestHeaders, false, +contextEncryptionAdapter, tracingContext); +requestHeaders.add(new AbfsHttpHeader(RANGE, Review Comment: Seems to be outdated. Caused by merge conflicts but it was fixed in the latest commit: 590a003048de696ff12490d87a2d6e6c2553b77d More merge conflicts need to be resolved now. Will take them up. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802152#comment-17802152 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1440414581 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1074,11 +1079,14 @@ public AbfsRestOperation read(final String path, ContextEncryptionAdapter contextEncryptionAdapter, TracingContext tracingContext) throws AzureBlobFileSystemException { final List requestHeaders = createDefaultHeaders(); -addCustomerProvidedKeyHeaders(requestHeaders); AbfsHttpHeader rangeHeader = new AbfsHttpHeader(RANGE, String.format("bytes=%d-%d", position, position + bufferLength - 1)); requestHeaders.add(rangeHeader); +addEncryptionKeyRequestHeaders(path, requestHeaders, false, +contextEncryptionAdapter, tracingContext); +requestHeaders.add(new AbfsHttpHeader(RANGE, Review Comment: why is this going in here when like 1085 sets this header too? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802096#comment-17802096 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1875106413 Thanks for the review @steveloughran If it looks good, please get it merged to trunk > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802042#comment-17802042 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1874956395 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 25s | | trunk passed | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 39s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 4s | | trunk passed | | +1 :green_heart: | shadedclient | 32m 17s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 32m 37s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 26s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 18s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/20/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8) | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 32m 15s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 59s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 121m 31s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/20/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 9c1e84d3d107 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 590a003048de696ff12490d87a2d6e6c2553b77d | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multi
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17802026#comment-17802026 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1874921180 AGGREGATED TEST RESULT HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 HNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 NonHNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11 [INFO] Results: [INFO] [WARNING] Tests run: 590, Failures: 0, Errors: 0, Skipped: 266 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44 AppendBlob-HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 Time taken: 28 mins 4 secs. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801770#comment-17801770 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1873984948 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 16m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 40m 57s | | trunk passed | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 32s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 38s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 6s | | trunk passed | | +1 :green_heart: | shadedclient | 32m 30s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 32m 49s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 26s | [/patch-mvninstall-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patch-mvninstall-hadoop-tools_hadoop-azure.txt) | hadoop-azure in the patch failed. | | -1 :x: | compile | 0m 28s | [/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-azure in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javac | 0m 28s | [/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-azure in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 0m 25s | [/patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-azure in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -1 :x: | javac | 0m 25s | [/patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-azure in the patch failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 19s | | the patch passed | | -1 :x: | mvnsite | 0m 27s | [/patch-mvnsite-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patch-mvnsite-hadoop-tools_hadoop-azure.txt) | hadoop-azure in the patch failed. | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 0m 26s | [/patch-spotbugs-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/19/artifact/out/patc
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801766#comment-17801766 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1873952960 > LGTM +1 > > does need rebase to trunk before merging. Thanks for the review... Resolved Conflicts and did sanity check. Please merge with trunk. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801765#comment-17801765 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1873952335 AGGREGATED TEST RESULT HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 HNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 NonHNS-SharedKey [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 11 [INFO] Results: [INFO] [WARNING] Tests run: 590, Failures: 0, Errors: 0, Skipped: 266 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 44 AppendBlob-HNS-OAuth [INFO] Results: [INFO] [WARNING] Tests run: 141, Failures: 0, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 340, Failures: 0, Errors: 0, Skipped: 41 Time taken: 25 mins 17 secs. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789368#comment-17789368 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1825308347 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 16m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 1s | | trunk passed | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 40s | | trunk passed | | +1 :green_heart: | javadoc | 0m 37s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 33s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 32m 39s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 32m 58s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 26s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 19s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 23s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 32m 23s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 1s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 139m 15s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/18/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 4ef7d5af7d87 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / cfa404f5059ef8de305d74b28a502cd416f8886e | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/18/testReport/ | | Max. process+thread count | 555 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-m
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789337#comment-17789337 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1825191803 @steveloughran, @mukund-thakur @mehakmeet Gentle Reminder to review this PR. All comments addressed. Thanks > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786350#comment-17786350 ] ASF GitHub Bot commented on HADOOP-18910: - anmolanmol1234 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1394192118 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -55,7 +54,7 @@ public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTes private static final int MB_8 = 8 * ONE_MB; private static final int MB_15 = 15 * ONE_MB; private static final int MB_16 = 16 * ONE_MB; - private static final String invalidText = "Text for Invalid MD5 Computation"; + private static final String TEXT_FOR_INVALID_MD5_COMPUTATION = "Text for Invalid MD5 Computation"; Review Comment: Can we shorten this string to only INVALID_MD5_TEXT ? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786348#comment-17786348 ] ASF GitHub Bot commented on HADOOP-18910: - anmolanmol1234 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1394181958 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -829,6 +829,13 @@ && appendSuccessCheckOp(op, path, throw e; } +catch (AzureBlobFileSystemException e) { + /** + * Any server issue will be returned as {@link AbfsRestOperationException} handled above. + */ + LOG.debug("Append request failed with non server issues"); Review Comment: We can add path and offset info as well here, will be difficult to understand append failed for which path. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785990#comment-17785990 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1810730410 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 0s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 16s | | trunk passed | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 32s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 38s | | trunk passed | | +1 :green_heart: | javadoc | 0m 36s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 33s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 2s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 27s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 33m 46s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 27s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | compile | 0m 24s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 24s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 18s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 29s | | the patch passed | | +1 :green_heart: | javadoc | 0m 23s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 2s | | the patch passed | | +1 :green_heart: | shadedclient | 32m 49s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 1s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 125m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/17/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux a55e0fa5c1ba 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ff2d5abbf873d3b9bba7aebcb53ad02c913d3c82 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/17/testReport/ | | Max. process+thread count | 557 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-m
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785977#comment-17785977 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1810687703 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 55s | | trunk passed | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 31s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 36s | | trunk passed | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 33s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 1s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 0s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 33m 18s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 27s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 27s | | the patch passed | | +1 :green_heart: | compile | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 17s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 28s | | the patch passed | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 2s | | the patch passed | | +1 :green_heart: | shadedclient | 33m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 58s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 125m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/16/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 05a1b198dbf2 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 45ccd087497855112c8cdb2e1bb8f880b73ba038 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/16/testReport/ | | Max. process+thread count | 727 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-m
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785890#comment-17785890 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1810198092 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 7s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 49m 0s | | trunk passed | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 32s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 36s | | trunk passed | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 32s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | trunk passed | | +1 :green_heart: | shadedclient | 37m 53s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 38m 13s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 20s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/14/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7) | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 6s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 52s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 3s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 141m 16s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/14/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 5bc3c2154a64 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 7caa63f505acb7ca66c94e5c2e95452b2134d07e | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apac
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785875#comment-17785875 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1810129241 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 29m 55s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/15/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 0m 23s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 22s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 20s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 26s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 22s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 0m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 2s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 20m 15s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 20s | | the patch passed | | +1 :green_heart: | compile | 0m 20s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 20s | | the patch passed | | +1 :green_heart: | compile | 0m 17s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 17s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 13s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 21s | | the patch passed | | +1 :green_heart: | javadoc | 0m 18s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 17s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 0m 44s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 9s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 39s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 25s | | The patch does not generate ASF License warnings. | | | | 80m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/15/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 1c600ab6efd6 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f5e0ae6d47cbdae32c91e5706c2ca71ceac0ab8a | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/15/testReport/ | | Max. process+thread count | 556 (vs. ulimit of 5500) |
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785843#comment-17785843 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1809989042 AGGREGATED TEST RESULT HNS-OAuth [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider" [INFO] [ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41 HNS-SharedKey [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider" [INFO] [ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41 NonHNS-SharedKey [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider" [INFO] [ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 11 [INFO] Results: [INFO] [WARNING] Tests run: 595, Failures: 0, Errors: 0, Skipped: 277 [INFO] Results: [INFO] [WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 44 AppendBlob-HNS-OAuth [INFO] Results: [INFO] [ERROR] Failures: [ERROR] TestAccountConfiguration.testConfigPropNotFound:386->testMissingConfigKey:399 Expected a org.apache.hadoop.fs.azurebfs.contracts.exceptions.TokenAccessProviderException to be thrown, but got the result: : "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider" [INFO] [ERROR] Tests run: 141, Failures: 1, Errors: 0, Skipped: 5 [INFO] Results: [INFO] [WARNING] Tests run: 339, Failures: 0, Errors: 0, Skipped: 41 Time taken: 26 mins 28 secs. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785842#comment-17785842 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1809988857 Regarding the exception handling, I am in agreement to with what @steveloughran has suggested. The whole catch block will work only for the case of AbfsRestOperationException and will blow up if it somehow ends up being not AbfsRestOperationException. It's better to catch the AbfsRestOperationException only for that processing and handle AzureBlobFileSystemEception (which are not AbfsRestOperationException) separately. I have made code changes in this line. Let me know if this looks good @saxenapranav Thanks for review > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785840#comment-17785840 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1392403224 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -879,9 +880,8 @@ private boolean checkUserError(int responseStatusCode) { * @return boolean whether exception is due to MD5Mismatch or not */ protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { -return ((AbfsRestOperationException) e).getStatusCode() -== HttpURLConnection.HTTP_BAD_REQUEST -&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); +AzureServiceErrorCode storageErrorCode = ((AbfsRestOperationException) e).getErrorCode(); Review Comment: In line with steve's suggestion here. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785839#comment-17785839 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1392402611 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -798,6 +806,11 @@ public AbfsRestOperation append(final String path, final byte[] buffer, if (!op.hasResult()) { throw e; } + Review Comment: Inline with this ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -861,6 +874,16 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { Review Comment: Taken > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17785645#comment-17785645 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1391613074 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -798,6 +806,11 @@ public AbfsRestOperation append(final String path, final byte[] buffer, if (!op.hasResult()) { throw e; } + Review Comment: on the topic of ex parsing, L797 will blow up with a ClassCastException if the exception caught is anything other than a AbfsRestOperationException. so the type of exception caught can be changed to AbfsRestOperationException ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -879,9 +880,8 @@ private boolean checkUserError(int responseStatusCode) { * @return boolean whether exception is due to MD5Mismatch or not */ protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { -return ((AbfsRestOperationException) e).getStatusCode() -== HttpURLConnection.HTTP_BAD_REQUEST -&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); +AzureServiceErrorCode storageErrorCode = ((AbfsRestOperationException) e).getErrorCode(); Review Comment: see my comment above about making the catch on L787 a AbfsRestOperationException > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783473#comment-17783473 ] ASF GitHub Bot commented on HADOOP-18910: - saxenapranav commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1384375876 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -861,6 +874,16 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { Review Comment: lets have private. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783472#comment-17783472 ] ASF GitHub Bot commented on HADOOP-18910: - saxenapranav commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1384373412 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -879,9 +880,8 @@ private boolean checkUserError(int responseStatusCode) { * @return boolean whether exception is due to MD5Mismatch or not */ protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { -return ((AbfsRestOperationException) e).getStatusCode() -== HttpURLConnection.HTTP_BAD_REQUEST -&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); +AzureServiceErrorCode storageErrorCode = ((AbfsRestOperationException) e).getErrorCode(); Review Comment: check if it can be case`e` is instanceof AbfsRestOperationException. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783108#comment-17783108 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1794163045 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 45m 40s | | trunk passed | | +1 :green_heart: | compile | 0m 37s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 4s | | trunk passed | | +1 :green_heart: | shadedclient | 35m 16s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 35m 34s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 19s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/12/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7) | | +1 :green_heart: | mvnsite | 0m 29s | | the patch passed | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 38s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 1s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 132m 44s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/12/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux df4282a751fc 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 42fa529128377d010bc56c711053c3ca5e74beb0 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.a
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783105#comment-17783105 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1794159415 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 14s | | trunk passed | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 35s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 9s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 27s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 33m 49s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 21s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/13/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 2 new + 7 unchanged - 0 fixed = 9 total (was 7) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 2s | | the patch passed | | +1 :green_heart: | shadedclient | 33m 13s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 3s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 127m 53s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/13/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux a6e5b4123ba8 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 42fa529128377d010bc56c711053c3ca5e74beb0 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.a
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783098#comment-17783098 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1794139979 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 39s | | trunk passed | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 37s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 34s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 6s | | trunk passed | | +1 :green_heart: | shadedclient | 34m 11s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 34m 33s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 29s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 20s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/11/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 20s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 3s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 130m 44s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/11/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 7c04362400ca 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 9bd995a28593d874c16338fa09d52df7ce8c3a41 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.a
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783082#comment-17783082 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1382765456 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,259 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.mockito.Mockito; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException; +import org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters; +import org.apache.hadoop.fs.azurebfs.services.AbfsClient; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.MD5_ERROR_SERVER_MESSAGE; +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; +import static org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters.Mode.APPEND_MODE; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; +import static org.mockito.ArgumentMatchers.any; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { + + public ITestAzureBlobFileSystemChecksum() throws Exception { +super(); + } + + @Test + public void testWriteReadWithChecksum() throws Exception { +testWriteReadWithChecksumInternal(true); +testWriteReadWithChecksumInternal(false); + } + + @Test + public void testAppendWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); Review Comment: Thanks for pointing this out... Made code changes not to take a new reference of filesystem. Instead used base classes createNewFilesystem that saves reference into abfs and will automatically get cleaned up in teardown() > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this w
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783081#comment-17783081 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1794043163 Thank you @steveloughran for reviewing this PR again. I have addressed your comments. Please let me know if anything else is required. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783046#comment-17783046 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1793839119 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 17m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 45m 6s | | trunk passed | | +1 :green_heart: | compile | 0m 42s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 58s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 34m 18s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 20s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/10/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 10 new + 7 unchanged - 0 fixed = 17 total (was 7) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 28s | | the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | -1 :x: | spotbugs | 1m 5s | [/new-spotbugs-hadoop-tools_hadoop-azure.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/10/artifact/out/new-spotbugs-hadoop-tools_hadoop-azure.html) | hadoop-tools/hadoop-azure generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 33m 59s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 3s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 147m 34s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-tools/hadoop-azure | | | Nullcheck of abfsRestOperationException at line 38 of value previously dereferenced in new org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException(AbfsRestOperationException) At AbfsInvalidChecksumException.java:38 of value previously dereferenced in new org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException(AbfsRestOperationException) At AbfsInvalidChecksumException.java:[line 37] | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/10/artifact/out/Dockerfile | | GIT
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783007#comment-17783007 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1382595258 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,259 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.mockito.Mockito; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException; +import org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters; +import org.apache.hadoop.fs.azurebfs.services.AbfsClient; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.MD5_ERROR_SERVER_MESSAGE; +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; +import static org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters.Mode.APPEND_MODE; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; +import static org.mockito.ArgumentMatchers.any; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { + + public ITestAzureBlobFileSystemChecksum() throws Exception { +super(); + } + + @Test + public void testWriteReadWithChecksum() throws Exception { +testWriteReadWithChecksumInternal(true); +testWriteReadWithChecksumInternal(false); + } + + @Test + public void testAppendWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient client = fs.getAbfsStore().getClient(); +Path path = path("testPath"); Review Comment: I have added method name along with the string "testPath" like this: `Path testPath = path("testPath" + getMethodName());` > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with th
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783002#comment-17783002 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1382594103 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,259 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.mockito.Mockito; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException; +import org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters; +import org.apache.hadoop.fs.azurebfs.services.AbfsClient; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.MD5_ERROR_SERVER_MESSAGE; +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; +import static org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters.Mode.APPEND_MODE; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; +import static org.mockito.ArgumentMatchers.any; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { + + public ITestAzureBlobFileSystemChecksum() throws Exception { +super(); + } + + @Test + public void testWriteReadWithChecksum() throws Exception { +testWriteReadWithChecksumInternal(true); +testWriteReadWithChecksumInternal(false); + } + + @Test + public void testAppendWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient client = fs.getAbfsStore().getClient(); +Path path = path("testPath"); +fs.create(path); +byte[] data= generateRandomBytes(4 * ONE_MB); + +appendWithOffsetHelper(client, path, data, fs, 0); +appendWithOffsetHelper(client, path, data, fs, 1 * ONE_MB); +appendWithOffsetHelper(client, path, data, fs, 2 * ONE_MB); +appendWithOffsetHelper(client, path, data, fs, 4 * ONE_MB - 1); + } + + @Test + public void testReadWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient client = fs.getAbfsStore().getClient(); +fs.getAbfsStore().setClient(client); +Path path = path("testPath"); Review Comment: Using path() to generate unique path based on UUID > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://le
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783001#comment-17783001 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1382593808 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,259 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.mockito.Mockito; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException; +import org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters; +import org.apache.hadoop.fs.azurebfs.services.AbfsClient; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.MD5_ERROR_SERVER_MESSAGE; +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; +import static org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters.Mode.APPEND_MODE; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; +import static org.mockito.ArgumentMatchers.any; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { + + public ITestAzureBlobFileSystemChecksum() throws Exception { +super(); + } + + @Test + public void testWriteReadWithChecksum() throws Exception { +testWriteReadWithChecksumInternal(true); +testWriteReadWithChecksumInternal(false); + } + + @Test + public void testAppendWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient client = fs.getAbfsStore().getClient(); +Path path = path("testPath"); Review Comment: methodPath() is used in NativeAzureFIleSystem. Here we are using AbstractAbfsIntegrationTest.path() to generate a path using UUID string which guarantees there is no conflict. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783000#comment-17783000 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1382593149 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,259 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; +import org.mockito.Mockito; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsInvalidChecksumException; +import org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException; +import org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters; +import org.apache.hadoop.fs.azurebfs.services.AbfsClient; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.AbfsHttpConstants.MD5_ERROR_SERVER_MESSAGE; +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; +import static org.apache.hadoop.fs.azurebfs.contracts.services.AppendRequestParameters.Mode.APPEND_MODE; +import static org.apache.hadoop.test.LambdaTestUtils.intercept; +import static org.mockito.ArgumentMatchers.any; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { + + public ITestAzureBlobFileSystemChecksum() throws Exception { +super(); + } + + @Test + public void testWriteReadWithChecksum() throws Exception { +testWriteReadWithChecksumInternal(true); +testWriteReadWithChecksumInternal(false); + } + + @Test + public void testAppendWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient client = fs.getAbfsStore().getClient(); +Path path = path("testPath"); +fs.create(path); +byte[] data= generateRandomBytes(4 * ONE_MB); + +appendWithOffsetHelper(client, path, data, fs, 0); +appendWithOffsetHelper(client, path, data, fs, 1 * ONE_MB); +appendWithOffsetHelper(client, path, data, fs, 2 * ONE_MB); +appendWithOffsetHelper(client, path, data, fs, 4 * ONE_MB - 1); + } + + @Test + public void testReadWithChecksumAtDifferentOffsets() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient client = fs.getAbfsStore().getClient(); +fs.getAbfsStore().setClient(client); +Path path = path("testPath"); + +byte[] data = generateRandomBytes(16 * ONE_MB); +FSDataOutputStream out = fs.create(path); +out.write(data); +out.hflush(); +out.close(); + +readWithOffsetAndPositionHelper(client, path, data, fs, 0, 0); +readWithOffsetAndPositionHelper(client, path, data, fs, 4 * ONE_MB, 0); +readWithOffsetAndPositionHelper(client, path, data, fs, 4 * ONE_MB, 1 * ONE_MB); +readWithOffsetAndPositionHelper(client, path, data, fs, 8 * ONE_MB, 2 * ONE_MB); +readWithOffsetAndPositionHelper(client, path, data, fs, 15 * ONE_MB, 4 * ONE_MB - 1); + } + + @Test + public void testWriteReadWithChecksumAndOptions() throws Exception { +testWriteReadWithChecksumAndOptionsInternal(true); +testWriteReadWithChecksumAndOptionsInternal(false); + } + + @Test + public void testAbfsInvalidChecksumExceptionInAppend() throws Exception { +AzureBlobFileSystem fs = getConfiguredFileSystem(4 * ONE_MB, 4 * ONE_MB, true); +AbfsClient spiedClient = Mockito.s
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781630#comment-17781630 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378467294 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/AbfsRuntimeException.java: ## @@ -0,0 +1,54 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.hadoop.fs.azurebfs.contracts.exceptions; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.azurebfs.contracts.services.AzureServiceErrorCode; + +/** + * Exception to wrap invalid checksum verification on client side. Review Comment: Hmmm Sounds good. Will take this. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781628#comment-17781628 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378466038 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1447,97 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders to be updated with checksum header + * @param reqParams for getting offset and length + * @param buffer for getting input data for MD5 computation + * @throws AbfsRestOperationException if Md5 computation fails + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +String md5Hash = computeMD5Hash(buffer, reqParams.getoffset(), +reqParams.getLength()); +requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException if Md5Mismatch + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int) result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +String md5HashComputed = computeMD5Hash(buffer, bufferOffset, +numberOfBytesRead); +String md5HashActual = result.getResponseHeader(CONTENT_MD5); Review Comment: No this won't be returned to the caller. Caller will get the data in case Md5 matches else an exception will be thrown. Taken local variable reference here for logging. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781615#comment-17781615 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378439863 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: This is how it works. Following things are returned by server: 1. Status Code: 400 2. Error Code: Md5Mismatch 3. Error Message: The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId: Time:2023-11-01T05:43:38.0231383Z 4. Status Description: The MD5 value specified in the request did not match with the MD5 value calculated by the server. From these we create an object of AbfsRestOperationException which has following fileds: 1. Status Code: 400 2. errorCode: AzureServiceErrorCode.MD5_MISMATCH. (A constant defined in latest commit) 3. errorMessage: The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId: Time:2023-11-01T05:43:38.0231383Z This AbfsRestOperationException's parent AzureBlobFileSystem also gets created with following fields: 1. message: "Operation Failed" + statusDescription + statuscode + method + url + errorCode + errorMessage. 2. innerException: null So e.getMessage() will resolve to AzureBloFileSystemException's message which will contain a lot of other things as well. e.getErrorMessage() will resolve to AbfsRestOperationException's message which will not have storage error code. Correct way will be to use e.getErrorCode() which will resolve to AbfsRestOperationException's errorCode which is exaclty Md5Mismatch > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apach
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781614#comment-17781614 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378439863 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: This is how it works. Following things are returned by server: 1. Status Code: 400 2. Error Code: Md5Mismatch 3. Error Message: The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId:605ff975-001f-0050-5d86-0c16aa00 Time:2023-11-01T05:43:38.0231383Z 4. Status Description: The MD5 value specified in the request did not match with the MD5 value calculated by the server. From these we create an object of AbfsRestOperationException which has following fileds: 1. Status Code: 400 2. errorCode: AzureServiceErrorCode.MD5_MISMATCH. (A constant defined in latest commit) 3. errorMessage: The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId:605ff975-001f-0050-5d86-0c16aa00 Time:2023-11-01T05:43:38.0231383Z This AbfsRestOperationException's parent AzureBlobFileSystem also gets created with following fields: 1. message: "Operation Failed" + statusDescription + statuscode + method + url + errorCode + errorMessage. 2. innerException: null So e.getMessage() will resolve to AzureBloFileSystemException's message which will contain a lot of other things as well. e.getErrorMessage() will resolve to AbfsRestOperationException's message which will not have storage error code. Correct way will be to use e.getErrorCode() which will resolve to AbfsRestOperationException's errorCode which is exaclty Md5Mismatch > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apa
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781601#comment-17781601 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378415451 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: e.getErrorMessage(): The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId: Time:2023-11-01T05:43:38.0231383Z e.getMessage(): Operation failed: "The MD5 value specified in the request did not match with the MD5 value calculated by the server.", 400, PUT, https://accountName.dfs.core.windows.net/abfs-testcontainer-d887bab1-4bd2-4709-a701-e5dc004fe743/test/testPathbc4069e5d586?action=append&position=0&timeout=90, Md5Mismatch, "The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId:605ff975-001f-0050-5d86-0c16aa00 Time:2023-11-01T05:43:38.0231383Z" > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17781600#comment-17781600 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1378415451 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: e.getErrorMessage(): The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId:605ff975-001f-0050-5d86-0c16aa00 Time:2023-11-01T05:43:38.0231383Z e.getMessage(): Operation failed: "The MD5 value specified in the request did not match with the MD5 value calculated by the server.", 400, PUT, https://anujtesthns.dfs.core.windows.net/abfs-testcontainer-d887bab1-4bd2-4709-a701-e5dc004fe743/test/testPathbc4069e5d586?action=append&position=0&timeout=90, Md5Mismatch, "The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId:605ff975-001f-0050-5d86-0c16aa00 Time:2023-11-01T05:43:38.0231383Z" > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773405#comment-17773405 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1350497313 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1447,97 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders to be updated with checksum header + * @param reqParams for getting offset and length + * @param buffer for getting input data for MD5 computation + * @throws AbfsRestOperationException if Md5 computation fails + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +String md5Hash = computeMD5Hash(buffer, reqParams.getoffset(), +reqParams.getLength()); +requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException if Md5Mismatch + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int) result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +String md5HashComputed = computeMD5Hash(buffer, bufferOffset, +numberOfBytesRead); +String md5HashActual = result.getResponseHeader(CONTENT_MD5); Review Comment: is this ever not going be returned? ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1447,97 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders to be updated with checksum header + * @param reqParams for getting offset and length + * @param buffer for getting input data for MD5 computation + * @throws AbfsRestOperationException if Md5 computation fails + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +String md5Hash = computeMD5Hash(buffer, reqParams.getoffset(), +reqParams.getLength()); +requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException if Md5Mismatch + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int) result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +String md5HashComputed = computeMD5Hash(buffer, bufferOffset, +numberOfBytesRead); +String md5HashActual = result.getResponseHeader(CONTENT_MD5); +if (!md5HashComputed.equals(md5HashActual)) { + throw new AbfsInvalidChecksumException(result.getRequestId()); +} + } + + /** + * Conditions check for allowing checksum support for read operation. + * As per the azure documentation following conditions should be met before + * Sending MD5 Hash in request headers. + * {@link https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read";>} + * 1. Range header shou
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773393#comment-17773393 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1350488012 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: which do you think will be least brittle? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773391#comment-17773391 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1350484904 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { +dataToBeWritten = buffer; + } else { +System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, +reqParams.getLength()); + } + + byte[] md5Bytes = md5Digest.digest(dataToBeWritten); + String md5Hash = Base64.getEncoder().encodeToString(md5Bytes); + requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); +} catch (NoSuchAlgorithmException ex) { + throw new AbfsRuntimeException(ex); +} + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int)result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +byte[] dataRead = new byte[numberOfBytesRead]; + +if (bufferOffset == 0 && numberOfBytesRead == buffer.length) { + dataRead = buffer; +} else { + System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead); +} + +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); Review Comment: ok > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773392#comment-17773392 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1350485508 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { +dataToBeWritten = buffer; + } else { +System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, +reqParams.getLength()); + } + + byte[] md5Bytes = md5Digest.digest(dataToBeWritten); + String md5Hash = Base64.getEncoder().encodeToString(md5Bytes); + requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); +} catch (NoSuchAlgorithmException ex) { + throw new AbfsRuntimeException(ex); +} + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int)result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +byte[] dataRead = new byte[numberOfBytesRead]; + +if (bufferOffset == 0 && numberOfBytesRead == buffer.length) { + dataRead = buffer; +} else { + System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead); +} + +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); Review Comment: if its tested its fine. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773390#comment-17773390 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1350484158 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -861,6 +875,12 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } + private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { Review Comment: static? just because its easier to test standalone. if you don't need that, don't worry > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770104#comment-17770104 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1739474822 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 55s | | trunk passed | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 40s | | trunk passed | | +1 :green_heart: | javadoc | 0m 39s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 33s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 6s | | trunk passed | | +1 :green_heart: | shadedclient | 35m 9s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 35m 30s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 19s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/9/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 4 new + 7 unchanged - 0 fixed = 11 total (was 7) | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 39s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 3s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 133m 35s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/9/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 27ac266ec6cd 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ed0ab72ce5153442074897eb7c5df50720d8476b | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/j
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770094#comment-17770094 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1739446892 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 45m 52s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/8/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 33s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 39s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 34s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 5s | | trunk passed | | +1 :green_heart: | shadedclient | 34m 10s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 34m 32s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 20s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/8/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 6 new + 7 unchanged - 0 fixed = 13 total (was 7) | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | -1 :x: | javadoc | 0m 28s | [/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/8/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt) | hadoop-tools_hadoop-azure-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) | | -1 :x: | javadoc | 0m 27s | [/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/8/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt) | hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05 with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) | | +1 :green_heart: | spotbugs | 1m 5s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 23s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 0s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 36s | | The patch does not generate ASF License warnings. | | | | 131m 12s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAP
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770047#comment-17770047 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1739110837 @steveloughran We found a way to avoid doing explicit array copies in MD5 computation in case of non-zero offsets. MessageDigest class of java allows us to do so using update and digest mechanism. MessageDigest.update() has a version where we can specify buffer, offset and length and after setting this, if we call digest function, it will compute MD5Hash of the data updated. This also points to the fact that same object of MessageDigest class cannot be shared among different appends. Still, we think it's better to have MD5 computation in parallel in client.append() only instead of doing it sequentially in ABFSOutputStream while creating Datablocks. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770020#comment-17770020 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1738998559 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 44s | | trunk passed | | +1 :green_heart: | compile | 0m 43s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 44s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 34m 21s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 34m 44s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 21s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/7/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 4 new + 7 unchanged - 0 fixed = 11 total (was 7) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | -1 :x: | javadoc | 0m 27s | [/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/7/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04.txt) | hadoop-tools_hadoop-azure-jdkUbuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) | | -1 :x: | javadoc | 0m 27s | [/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/7/artifact/out/results-javadoc-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05.txt) | hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_382-8u382-ga-1~20.04.1-b05 with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 generated 1 new + 15 unchanged - 0 fixed = 16 total (was 15) | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 34m 27s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 3s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 131m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/7/artifact/out/Dockerfile | | GITHUB PR
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769980#comment-17769980 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339883668 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: There were two different messages: e.getErrorMessage(): require type casting. e.getMessage(): does not require. e.getMessage makes more sense to use as it contains the whole server returned message. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769973#comment-17769973 ] ASF GitHub Bot commented on HADOOP-18910: - anmolanmol1234 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339863566 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -875,10 +873,15 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } - private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { + /** + * To check if the failure exception returned by server is due to MD5 Mismatch + * @param e Exception returned by AbfsRestOperation + * @return boolean whether exception is due to MD5Mismatch or not + */ + protected boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { return ((AbfsRestOperationException) e).getStatusCode() == HttpURLConnection.HTTP_BAD_REQUEST -&& ((AbfsRestOperationException) e).getErrorMessage().contains(MD5_ERROR); +&& e.getMessage().contains(MD5_ERROR_SERVER_MESSAGE); Review Comment: does it not require type casting here ? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769970#comment-17769970 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1738835515 > I've actually been thinking of adding a similar option to the s3a client for third party non-https support. > > In my head though, the generation of the upload MD5 hash could be done as data is written to the buffer/file in the org.apache.hadoop.fs.store.DataBlocks class, in the `org.apache.hadoop.fs.store.DataBlocks.DataBlock.write(byte[] buffer, int offset, int length)` call > > * the data comes in as an array: no need to reload/copy > * its often intermingled with other work, so no end-of-block delays. > * if the application is mixing compute with write, it may not add any delay. > > I would suggest you add it there as it means I'd switch to that class for the s3a output stream and pick up your work too: no duplicate code and better test coverage. > > I guess the issue here is that abfs client appends in individual post requests, there's not enough of a match between DataBlock size and the http requests, except for very small files. Correct? > > Propose you use a DurationTracker to actually count time spent processing md5 headers; can add a new IOStatistic to the store for this. This allows for the cost of enabling to be measured/reported. Hi steve. thanks for the review. For your query regarding whether the MD5 computation should be moved to Datablocks.write, 1. I think it won't help us reduce the cost of array copy. Today in production code when we call append from output stream, we always send the offset 0 and length as the length of Data block. So, there is in a way direct mapping of datablocks to append calls and this check is added in client.append() to avoid array copy if offset is 0. Array copy was added only if anytime in future we end up sending non-zero offset, we might need to do array copy. If that happens, then it would be wrong to compute MD5 hash in Datablocks,write() as whole datablock data won't be appended. 2. Also, Datablocks.write() call is done in Main thread and it would be computationally expensive to compute MD5 Hash of whole data in main thread as compared to the current approach where it is computed parallelly by worker threads doing appends. Let me know your thoughts on this. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769964#comment-17769964 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339835737 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { +dataToBeWritten = buffer; + } else { +System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, +reqParams.getLength()); + } + + byte[] md5Bytes = md5Digest.digest(dataToBeWritten); + String md5Hash = Base64.getEncoder().encodeToString(md5Bytes); + requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); +} catch (NoSuchAlgorithmException ex) { + throw new AbfsRuntimeException(ex); +} + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int)result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +byte[] dataRead = new byte[numberOfBytesRead]; + +if (bufferOffset == 0 && numberOfBytesRead == buffer.length) { + dataRead = buffer; +} else { + System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead); +} + +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); Review Comment: Have added unit tests for its usage and isolated it for code reusability. Can you please elaborate if we still need this to be static and why? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769965#comment-17769965 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339816363 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -861,6 +875,12 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } + private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { Review Comment: Added test to verify that this works and we return proper Exceptions to the caller in case of MD5Mismatch. Can you please elaborate why we need to make this static? Added javadoc > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769963#comment-17769963 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339833668 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -88,7 +91,8 @@ public class AbfsClient implements Closeable { public static final Logger LOG = LoggerFactory.getLogger(AbfsClient.class); public static final String HUNDRED_CONTINUE_USER_AGENT = SINGLE_WHITE_SPACE + HUNDRED_CONTINUE + SEMICOLON; - + public static final String MD5_ERROR = "The MD5 value specified in the request " Review Comment: Taken > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769962#comment-17769962 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339816363 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -861,6 +875,12 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } + private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { Review Comment: Added test to verify that this works and we return proper Exceptions to the caller in case of MD5Mismatch. Not sure if we still want to make this static. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769961#comment-17769961 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339814907 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -761,6 +764,11 @@ public AbfsRestOperation append(final String path, final byte[] buffer, requestHeaders.add(new AbfsHttpHeader(USER_AGENT, userAgentRetry)); } +// Add MD5 Hash of request content as request header if feature is enabled +if (isChecksumValidationEnabled()) { Review Comment: Yes > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769960#comment-17769960 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1339814115 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { +dataToBeWritten = buffer; + } else { +System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, +reqParams.getLength()); + } + + byte[] md5Bytes = md5Digest.digest(dataToBeWritten); + String md5Hash = Base64.getEncoder().encodeToString(md5Bytes); + requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); +} catch (NoSuchAlgorithmException ex) { + throw new AbfsRuntimeException(ex); +} + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int)result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +byte[] dataRead = new byte[numberOfBytesRead]; + +if (bufferOffset == 0 && numberOfBytesRead == buffer.length) { + dataRead = buffer; +} else { + System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead); +} + +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); Review Comment: Makes sense. Will create a new function to compute MD5 Hash. I tried using the same md5Digest object for all appends spawned by a stream but looks like MessageDigest objects are not thread safe and they should not be shared between different parallel threads. https://github.com/pmd/pmd/issues/1862 > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additi
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17769521#comment-17769521 ] ASF GitHub Bot commented on HADOOP-18910: - steveloughran commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1338307275 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -88,7 +91,8 @@ public class AbfsClient implements Closeable { public static final Logger LOG = LoggerFactory.getLogger(AbfsClient.class); public static final String HUNDRED_CONTINUE_USER_AGENT = SINGLE_WHITE_SPACE + HUNDRED_CONTINUE + SEMICOLON; - + public static final String MD5_ERROR = "The MD5 value specified in the request " Review Comment: should be placed in AbfsConstants, so it's more visible, javadocs to explain purpose. and that it comes with a 400 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { +dataToBeWritten = buffer; + } else { +System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, +reqParams.getLength()); + } + + byte[] md5Bytes = md5Digest.digest(dataToBeWritten); + String md5Hash = Base64.getEncoder().encodeToString(md5Bytes); + requestHeaders.add(new AbfsHttpHeader(CONTENT_MD5, md5Hash)); +} catch (NoSuchAlgorithmException ex) { + throw new AbfsRuntimeException(ex); +} + } + + /** + * To verify the checksum information received from server for the data read + * @param buffer stores the data received from server + * @param result HTTP Operation Result + * @param bufferOffset Position where data returned by server is saved in buffer + * @throws AbfsRestOperationException + */ + private void verifyCheckSumForRead(final byte[] buffer, + final AbfsHttpOperation result, final int bufferOffset) + throws AbfsRestOperationException { +// Number of bytes returned by server could be less than or equal to what +// caller requests. In case it is less, extra bytes will be initialized to 0 +// Server returned MD5 Hash will be computed on what server returned. +// We need to get exact data that server returned and compute its md5 hash +// Computed hash should be equal to what server returned +int numberOfBytesRead = (int)result.getBytesReceived(); +if (numberOfBytesRead == 0) { + return; +} +byte[] dataRead = new byte[numberOfBytesRead]; + +if (bufferOffset == 0 && numberOfBytesRead == buffer.length) { + dataRead = buffer; +} else { + System.arraycopy(buffer, bufferOffset, dataRead, 0, numberOfBytesRead); +} + +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); Review Comment: this creation/exception handling is repeated a lot. pull it out. maybe even create one per input stream ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -861,6 +875,12 @@ private boolean checkUserError(int responseStatusCode) { && responseStatusCode < HttpURLConnection.HTTP_INTERNAL_ERROR); } + private boolean isMd5ChecksumError(final AzureBlobFileSystemException e) { Review Comment: if you make this static and package private you can add a unit test to verify that it is handled +add javadocs ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1444,102 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + + if (reqParams.getoffset() == 0 && reqParams.getLength() == buffer.length) { +dataToBeWritten = buffer; + } else { +System.arraycopy(bu
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767654#comment-17767654 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1730009129 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 53s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 55s | | trunk passed | | +1 :green_heart: | compile | 0m 43s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 39s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 39s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 10s | | trunk passed | | +1 :green_heart: | shadedclient | 34m 5s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 34m 29s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 21s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/6/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 7 new + 7 unchanged - 0 fixed = 14 total (was 7) | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 3s | | the patch passed | | +1 :green_heart: | shadedclient | 33m 34s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 2s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 41s | | The patch does not generate ASF License warnings. | | | | 130m 54s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 0e78fe0da613 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / c09259da1bc3062f938ee1cf023974f5a32d02d6 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/j
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767507#comment-17767507 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1729332375 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 53s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 36s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | javadoc | 0m 43s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 10s | | trunk passed | | +1 :green_heart: | shadedclient | 33m 56s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 34m 19s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 30s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 22s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/5/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 4 new + 7 unchanged - 0 fixed = 11 total (was 7) | | +1 :green_heart: | mvnsite | 0m 32s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 33m 40s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 5s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 128m 55s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 9159b8f38b1e 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d4220129ea33f3af713f7f2d16b8846ec5ba0940 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/j
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767465#comment-17767465 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332694254 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1432,91 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, reqParams.getLength()); Review Comment: Makes sense. Taken > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767463#comment-17767463 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332692105 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -761,6 +764,11 @@ public AbfsRestOperation append(final String path, final byte[] buffer, requestHeaders.add(new AbfsHttpHeader(USER_AGENT, userAgentRetry)); } +// Add MD5 Hash of request content as request header if feature is enabled +if (isChecksumValidationEnabled()) { Review Comment: Yes, server retrurned exception message clearly states that MD5 Hash does not match. Example: Operation failed: "The MD5 value specified in the request did not match with the MD5 value calculated by the server.", 400, PUT, _URL_, Md5Mismatch, "The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId: _rid_ Time:2023-09-21T08:33:25.6135970Z" But if an invalid MD5 Hash is sent (Which does not represent any data), server message will be: Operation failed: "The value for one of the HTTP headers is not in the correct format.", 400, PUT, _URL_, InvalidHeaderValue, "The value for one of the HTTP headers is not in the correct format. RequestId: _rId_ Time:2023-09-21T08:35:29.2777203Z" > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767462#comment-17767462 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332692105 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -761,6 +764,11 @@ public AbfsRestOperation append(final String path, final byte[] buffer, requestHeaders.add(new AbfsHttpHeader(USER_AGENT, userAgentRetry)); } +// Add MD5 Hash of request content as request header if feature is enabled +if (isChecksumValidationEnabled()) { Review Comment: Yes, server retrurned exception message clearly states that MD5 Hash does not match. Example: Operation failed: "The MD5 value specified in the request did not match with the MD5 value calculated by the server.", 400, PUT,, Md5Mismatch, "The MD5 value specified in the request did not match with the MD5 value calculated by the server. RequestId: Time:2023-09-21T08:33:25.6135970Z" But if an invalid MD5 Hash is sent (Which does not represent any data), server message will be: Operation failed: "The value for one of the HTTP headers is not in the correct format.", 400, PUT, , InvalidHeaderValue, "The value for one of the HTTP headers is not in the correct format. RequestId: Time:2023-09-21T08:35:29.2777203Z" > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767425#comment-17767425 ] ASF GitHub Bot commented on HADOOP-18910: - hadoop-yetus commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1728992959 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 45m 38s | | trunk passed | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 33s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 35s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 29s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 20s | [/results-checkstyle-hadoop-tools_hadoop-azure.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/4/artifact/out/results-checkstyle-hadoop-tools_hadoop-azure.txt) | hadoop-tools/hadoop-azure: The patch generated 5 new + 7 unchanged - 0 fixed = 12 total (was 7) | | +1 :green_heart: | mvnsite | 0m 30s | | the patch passed | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 26s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 36m 55s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 6s | | hadoop-azure in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 136m 27s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6069 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 72e7f0a29a63 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 03a7453209b2daa3c5ecf60ab097e2933b1f50e1 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/4/testReport/ | | Max. process+thread count | 691 (vs. ulimit of 5500) | | modules | C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6069/4/console |
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767415#comment-17767415 ] ASF GitHub Bot commented on HADOOP-18910: - snvijaya commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332562001 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -1412,6 +1432,91 @@ private void appendIfNotEmpty(StringBuilder sb, String regEx, } } + /** + * Add MD5 hash as request header to the append request + * @param requestHeaders + * @param reqParams + * @param buffer + */ + private void addCheckSumHeaderForWrite(List requestHeaders, + final AppendRequestParameters reqParams, final byte[] buffer) + throws AbfsRestOperationException { +try { + MessageDigest md5Digest = MessageDigest.getInstance(MD5); + byte[] dataToBeWritten = new byte[reqParams.getLength()]; + System.arraycopy(buffer, reqParams.getoffset(), dataToBeWritten, 0, reqParams.getLength()); Review Comment: Arraycopies are costly. Given caller has the info for buffer offset and length, it might be possible to identify cases where whole buffer is relevant or if its a subset. If possible executing array copy only when subset is relevant and not the whole buffer. Same for the case of read checksum validation. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767406#comment-17767406 ] ASF GitHub Bot commented on HADOOP-18910: - snvijaya commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332534729 ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/ConfigurationKeys.java: ## @@ -241,6 +241,9 @@ public final class ConfigurationKeys { /** Add extra resilience to rename failures, at the expense of performance. */ public static final String FS_AZURE_ABFS_RENAME_RESILIENCE = "fs.azure.enable.rename.resilience"; + /** Add extra layer of verification of the integrity of the request content during transport. */ + public static final String FS_AZURE_ABFS_ENABLE_CHECKSUM_VALIDATION = "fs.azure.enable.checksum.validation"; Review Comment: Add documenation for the config in https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md Also highlight that this will have perf impact due to client and server md5 recomputations. ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/contracts/exceptions/InvalidChecksumException.java: ## @@ -0,0 +1,44 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +package org.apache.hadoop.fs.azurebfs.contracts.exceptions; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.classification.InterfaceStability; +import org.apache.hadoop.fs.azurebfs.contracts.services.AzureServiceErrorCode; + +/** + * Exception to wrap invalid checksum verification on client side. + */ +@InterfaceAudience.Public +@InterfaceStability.Evolving +public class InvalidChecksumException extends AbfsRestOperationException { Review Comment: Rename to AbfsInvalidChecksumException ## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java: ## @@ -761,6 +764,11 @@ public AbfsRestOperation append(final String path, final byte[] buffer, requestHeaders.add(new AbfsHttpHeader(USER_AGENT, userAgentRetry)); } +// Add MD5 Hash of request content as request header if feature is enabled +if (isChecksumValidationEnabled()) { Review Comment: In case of appends, as per REST API doc, server will fail: "If the two hashes do not match, the operation will fail with error code 400 (Bad Request)." Are there indications in server error code response header to determine its due to MD5 mismatch and can get converted to AbfsInvalidChecksumException too ? > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767403#comment-17767403 ] ASF GitHub Bot commented on HADOOP-18910: - anmolanmol1234 commented on PR #6069: URL: https://github.com/apache/hadoop/pull/6069#issuecomment-1728935016 LGTM !! > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767395#comment-17767395 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332485837 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,130 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { Review Comment: Right now any read call coming through this function in good case will have offset 0 only. There are existing tests in ITestABfsInputStream for this function and this PR enabled checksum support for those tests as well. Moreover, as per your suggestion I have added tests for client.read() sanity. Production code changes only apply to client.read() and client.append(). We have tests for them working with different positions and offsets. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18910) ABFS: Adding Support for MD5 Hash based integrity verification of the request content during transport
[ https://issues.apache.org/jira/browse/HADOOP-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767385#comment-17767385 ] ASF GitHub Bot commented on HADOOP-18910: - anujmodi2021 commented on code in PR #6069: URL: https://github.com/apache/hadoop/pull/6069#discussion_r1332485837 ## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemChecksum.java: ## @@ -0,0 +1,130 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.fs.azurebfs; + +import java.security.SecureRandom; +import java.util.Arrays; +import java.util.HashSet; + +import org.assertj.core.api.Assertions; +import org.junit.Test; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FSDataOutputStream; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.impl.OpenFileParameters; + +import static org.apache.hadoop.fs.azurebfs.constants.ConfigurationKeys.FS_AZURE_BUFFERED_PREAD_DISABLE; +import static org.apache.hadoop.fs.azurebfs.constants.FileSystemConfigurations.ONE_MB; + +/** + * Test For Verifying Checksum Related Operations + */ +public class ITestAzureBlobFileSystemChecksum extends AbstractAbfsIntegrationTest { Review Comment: Any read call coming through this function will also have offset 0 only. There are existing tests in ITestABfsInputStream for this function and this PR enabled checksum support for those tests as well. Moreover, as per your suggestion I have added tests for client.read() sanity. Production code changes only apply to client.read() and client.append(). We have tests for them working with different positions and offsets. > ABFS: Adding Support for MD5 Hash based integrity verification of the request > content during transport > --- > > Key: HADOOP-18910 > URL: https://issues.apache.org/jira/browse/HADOOP-18910 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > > Azure Storage Supports Content-MD5 Request Headers in Both Read and Append > APIs. > Read: [Path - Read - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read] > Append: [Path - Update - REST API (Azure Storage Services) | Microsoft > Learn|https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/update] > This change is to make client-side changes to support them. In Read request, > we will send the appropriate header in response to which server will return > the MD5 Hash of the data it sends back. On Client we will tally this with the > MD5 hash computed from the data received. > In Append request, we will compute the MD5 Hash of the data that we are > sending to the server and specify that in appropriate header. Server on > finding that header will tally this with the MD5 hash it will compute on the > data received. > This whole Checksum Validation Support is guarded behind a config, Config is > by default disabled because with the use of "https" integrity of data is > preserved anyways. This is introduced as an additional data integrity check > which will have a performance impact as well. > Users can decide if they want to enable this or not by setting the following > config to *"true"* or *"false"* respectively. *Config: > "fs.azure.enable.checksum.validation"* -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org