[jira] [Assigned] (HADOOP-19087) Release Hadoop 3.4.1
[ https://issues.apache.org/jira/browse/HADOOP-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-19087: -- Assignee: Mukund Thakur > Release Hadoop 3.4.1 > > > Key: HADOOP-19087 > URL: https://issues.apache.org/jira/browse/HADOOP-19087 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > > Release a minor update to hadoop 3.4.0 with > * packaging enhancements > * updated dependencies (where viable) > * fixes for critical issues found after 3.4.0 released > * low-risk feature enhancements (those which don't impact schedule...) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17826) ABFS: Transient failure of TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting
[ https://issues.apache.org/jira/browse/HADOOP-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866137#comment-17866137 ] Mukund Thakur commented on HADOOP-17826: I saw this today as well. cc [~pranavs] [~snvijaya] > ABFS: Transient failure of > TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting > -- > > Key: HADOOP-17826 > URL: https://issues.apache.org/jira/browse/HADOOP-17826 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure, test >Affects Versions: 3.4.0 >Reporter: Sumangala Patki >Priority: Major > > Transient failure of the below test observed for HNS OAuth, AppendBlob HNS > OAuth and Non-HNS SharedKey combinations. The value denoted by "actual value" > below varies across failures, and exceeds the upper limit of the expected > range. > _TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting:171->fuzzyValidate:49 > The actual value 10 is not within the expected range: [5.60, 8.40]._ > Verified failure with client and server in the same region to rule out > network issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18610) ABFS OAuth2 Token Provider to support Azure Workload Identity for AKS
[ https://issues.apache.org/jira/browse/HADOOP-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18610: --- Fix Version/s: 3.3.9 3.4.1 > ABFS OAuth2 Token Provider to support Azure Workload Identity for AKS > - > > Key: HADOOP-18610 > URL: https://issues.apache.org/jira/browse/HADOOP-18610 > Project: Hadoop Common > Issue Type: Improvement > Components: tools >Affects Versions: 3.3.4 >Reporter: Haifeng Chen >Assignee: Anuj Modi >Priority: Critical > Labels: pull-request-available > Fix For: 3.3.9, 3.4.1 > > Attachments: HADOOP-18610-preview.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > In Jan 2023, Microsoft Azure AKS replaced its original pod-managed identity > with with [Azure Active Directory (Azure AD) workload > identities|https://learn.microsoft.com/en-us/azure/active-directory/develop/workload-identities-overview] > (preview), which integrate with the Kubernetes native capabilities to > federate with any external identity providers. This approach is simpler to > use and deploy. > Refer to > [https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview|https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview.] > and [https://azure.github.io/azure-workload-identity/docs/introduction.html] > for more details. > The basic use scenario is to access Azure cloud resources (such as cloud > storage) from Kubernetes (such as AKS) workload using Azure managed identity > federated with Kubernetes service account. The credential environment > variables in pod projected by Azure AD workload identity are like following: > AZURE_AUTHORITY_HOST: (Injected by the webhook, > [https://login.microsoftonline.com/]) > AZURE_CLIENT_ID: (Injected by the webhook) > AZURE_TENANT_ID: (Injected by the webhook) > AZURE_FEDERATED_TOKEN_FILE: (Injected by the webhook, > /var/run/secrets/azure/tokens/azure-identity-token) > The token in the file pointed by AZURE_FEDERATED_TOKEN_FILE is a JWT (JASON > Web Token) client assertion token which we can use to request to > AZURE_AUTHORITY_HOST (url is AZURE_AUTHORITY_HOST + tenantId + > "/oauth2/v2.0/token") for a AD token which can be used to directly access > the Azure cloud resources. > This approach is very common and similar among cloud providers such as AWS > and GCP. Hadoop AWS integration has WebIdentityTokenCredentialProvider to > handle the same case. > The existing MsiTokenProvider can only handle the managed identity associated > with Azure VM instance. We need to implement a WorkloadIdentityTokenProvider > which handle Azure Workload Identity case. For this, we need to add one > method (getTokenUsingJWTAssertion) in AzureADAuthenticator which will be used > by WorkloadIdentityTokenProvider. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path
[ https://issues.apache.org/jira/browse/HADOOP-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19196. Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Bulk delete api doesn't take the path to delete as the base path > > > Key: HADOOP-19196 > URL: https://issues.apache.org/jira/browse/HADOOP-19196 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.5.0, 3.4.1 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > If you use the path of the file you intend to delete as the base path, you > get an error. This is because the validation requires the list to be of > children, but the base path itself should be valid. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19137: --- Fix Version/s: 3.4.1 > [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if > Customer-provided-key configs given. > -- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19137. Resolution: Fixed > [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if > Customer-provided-key configs given. > -- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19137: --- Fix Version/s: 3.5.0 > [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if > Customer-provided-key configs given. > -- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19137: --- Summary: [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given. (was: [ABFS]:Extra getAcl call while calling the very first API of FileSystem) > [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if > Customer-provided-key configs given. > -- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18679) Add API for bulk/paged delete of files and objects
[ https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18679: --- Description: iceberg and hbase could benefit from being able to give a list of individual files to delete -files which may be scattered round the bucket for better read peformance. Add some new optional interface for an object store which allows a caller to submit a list of paths to files to delete, where the expectation is * if a path is a file: delete * if a path is a dir, outcome undefined For s3 that'd let us build these into DeleteRequest objects, and submit, without any probes first. {quote}Cherrypicking {quote} when cherrypicking, you must include * followup commit #6854 * https://issues.apache.org/jira/browse/HADOOP-19196 * test fixes HADOOP-19814 and HADOOP-19188 was: iceberg and hbase could benefit from being able to give a list of individual files to delete -files which may be scattered round the bucket for better read peformance. Add some new optional interface for an object store which allows a caller to submit a list of paths to files to delete, where the expectation is * if a path is a file: delete * if a path is a dir, outcome undefined For s3 that'd let us build these into DeleteRequest objects, and submit, without any probes first. bq. Cherrypicking when cherrypicking, you must include * followup commit #6854 * test fixes HADOOP-19814 and HADOOP-19188 > Add API for bulk/paged delete of files and objects > -- > > Key: HADOOP-18679 > URL: https://issues.apache.org/jira/browse/HADOOP-18679 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > iceberg and hbase could benefit from being able to give a list of individual > files to delete -files which may be scattered round the bucket for better > read peformance. > Add some new optional interface for an object store which allows a caller to > submit a list of paths to files to delete, where > the expectation is > * if a path is a file: delete > * if a path is a dir, outcome undefined > For s3 that'd let us build these into DeleteRequest objects, and submit, > without any probes first. > {quote}Cherrypicking > {quote} > when cherrypicking, you must include > * followup commit #6854 > * https://issues.apache.org/jira/browse/HADOOP-19196 > * test fixes HADOOP-19814 and HADOOP-19188 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path
[ https://issues.apache.org/jira/browse/HADOOP-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852954#comment-17852954 ] Mukund Thakur commented on HADOOP-19196: good catch. > Bulk delete api doesn't take the path to delete as the base path > > > Key: HADOOP-19196 > URL: https://issues.apache.org/jira/browse/HADOOP-19196 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.5.0, 3.4.1 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Minor > > If you use the path of the file you intend to delete as the base path, you > get an error. This is because the validation requires the list to be of > children, but the base path itself should be valid. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms
[ https://issues.apache.org/jira/browse/HADOOP-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19190. Resolution: Fixed > Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes > when bucket not encrypted with sse-kms > > > Key: HADOOP-19190 > URL: https://issues.apache.org/jira/browse/HADOOP-19190 > Project: Hadoop Common > Issue Type: Test > Components: fs/s3 >Affects Versions: 3.4.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.1 > > > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 > s <<< FAILURE! -- in > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings > [ERROR] > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes > -- Time elapsed: 5.065 s <<< FAILURE! > org.junit.ComparisonFailure: [Server side encryption algorithm must match] > expected:<"[aws:kms]"> but was:<"[AES256]"> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138) > at > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms
[ https://issues.apache.org/jira/browse/HADOOP-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19190: --- Fix Version/s: 3.4.1 > Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes > when bucket not encrypted with sse-kms > > > Key: HADOOP-19190 > URL: https://issues.apache.org/jira/browse/HADOOP-19190 > Project: Hadoop Common > Issue Type: Test > Components: fs/s3 >Affects Versions: 3.4.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.1 > > > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 > s <<< FAILURE! -- in > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings > [ERROR] > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes > -- Time elapsed: 5.065 s <<< FAILURE! > org.junit.ComparisonFailure: [Server side encryption algorithm must match] > expected:<"[aws:kms]"> but was:<"[AES256]"> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138) > at > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.
[ https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851217#comment-17851217 ] Mukund Thakur commented on HADOOP-19013: yes, you are right. thanks https://github.com/apache/hadoop/pull/6859/files > fs.getXattrs(path) for S3FS doesn't have > x-amz-server-side-encryption-aws-kms-key-id header. > > > Key: HADOOP-19013 > URL: https://issues.apache.org/jira/browse/HADOOP-19013 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > Once a path while uploading has been encrypted with SSE-KMS with a key id and > then later when we try to read the attributes of the same file, it doesn't > contain the key id information as an attribute. should we add it? > > while cherry-picking please include > https://issues.apache.org/jira/browse/HADOOP-19190 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.
[ https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19013: --- Description: Once a path while uploading has been encrypted with SSE-KMS with a key id and then later when we try to read the attributes of the same file, it doesn't contain the key id information as an attribute. should we add it? while cherry-picking please include https://issues.apache.org/jira/browse/HADOOP-19190 was:Once a path while uploading has been encrypted with SSE-KMS with a key id and then later when we try to read the attributes of the same file, it doesn't contain the key id information as an attribute. should we add it? > fs.getXattrs(path) for S3FS doesn't have > x-amz-server-side-encryption-aws-kms-key-id header. > > > Key: HADOOP-19013 > URL: https://issues.apache.org/jira/browse/HADOOP-19013 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > Once a path while uploading has been encrypted with SSE-KMS with a key id and > then later when we try to read the attributes of the same file, it doesn't > contain the key id information as an attribute. should we add it? > > while cherry-picking please include > https://issues.apache.org/jira/browse/HADOOP-19190 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms
Mukund Thakur created HADOOP-19190: -- Summary: Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms Key: HADOOP-19190 URL: https://issues.apache.org/jira/browse/HADOOP-19190 Project: Hadoop Common Issue Type: Test Components: fs/s3 Affects Versions: 3.4.1 Reporter: Mukund Thakur Assignee: Mukund Thakur [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 s <<< FAILURE! -- in org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings [ERROR] org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes -- Time elapsed: 5.065 s <<< FAILURE! org.junit.ComparisonFailure: [Server side encryption algorithm must match] expected:<"[aws:kms]"> but was:<"[AES256]"> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138) at org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added
[ https://issues.apache.org/jira/browse/HADOOP-19188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19188: --- Fix Version/s: 3.4.1 > TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added > -- > > Key: HADOOP-19188 > URL: https://issues.apache.org/jira/browse/HADOOP-19188 > Project: Hadoop Common > Issue Type: Bug > Components: fs, test >Affects Versions: 3.5.0, 3.4.1 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > oh, we need to update a couple of tests so they know not to worry about the > new interface/method. The details are in the javadocs of FileSystem. > Interesting these snuck through yetus, though they fail in PRs based atop > #6726 > {code} > [ERROR] Failures: > [ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem > [ERROR] Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were > not overridden correctly - see log > [ERROR] Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were > not overridden correctly - see log > [ERROR] Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were > not overridden correctly - see log > [INFO] > [ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented > [ERROR] Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 > methods were not overridden correctly - see log > [ERROR] Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 > methods were not overridden correctly - see log > [ERROR] Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 > methods were not overridden correctly - see log > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18679) Add API for bulk/paged delete of files and objects
[ https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18679. Resolution: Fixed > Add API for bulk/paged delete of files and objects > -- > > Key: HADOOP-18679 > URL: https://issues.apache.org/jira/browse/HADOOP-18679 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > iceberg and hbase could benefit from being able to give a list of individual > files to delete -files which may be scattered round the bucket for better > read peformance. > Add some new optional interface for an object store which allows a caller to > submit a list of paths to files to delete, where > the expectation is > * if a path is a file: delete > * if a path is a dir, outcome undefined > For s3 that'd let us build these into DeleteRequest objects, and submit, > without any probes first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing
[ https://issues.apache.org/jira/browse/HADOOP-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19184. Fix Version/s: 3.4.1 Resolution: Fixed > TestStagingCommitter.testJobCommitFailure failing > -- > > Key: HADOOP-19184 > URL: https://issues.apache.org/jira/browse/HADOOP-19184 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.1 > > > {code:java} > [INFO] > [ERROR] Failures: > [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects > compared to deleted paths > org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{ > requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, > deletes=0}] > Expecting: > > <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", > > "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", > > "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", > > "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45", > > "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]> > to contain exactly in any order: > <[]> > but the following elements were unexpected: > > <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", > > "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", > > "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", > > "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18679) Add API for bulk/paged delete of files and objects
[ https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18679: --- Fix Version/s: 3.4.1 > Add API for bulk/paged delete of files and objects > -- > > Key: HADOOP-18679 > URL: https://issues.apache.org/jira/browse/HADOOP-18679 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > iceberg and hbase could benefit from being able to give a list of individual > files to delete -files which may be scattered round the bucket for better > read peformance. > Add some new optional interface for an object store which allows a caller to > submit a list of paths to files to delete, where > the expectation is > * if a path is a file: delete > * if a path is a dir, outcome undefined > For s3 that'd let us build these into DeleteRequest objects, and submit, > without any probes first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing
[ https://issues.apache.org/jira/browse/HADOOP-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19184: --- Description: {code:java} [INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to deleted paths org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{ requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, deletes=0}] Expecting: <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45", "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]> to contain exactly in any order: <[]> but the following elements were unexpected: <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code} was:[INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to deleted paths org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4\{ requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, deletes=0}] Expecting: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to contain exactly in any order: <[]> but the following elements were unexpected: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> > TestStagingCommitter.testJobCommitFailure failing > -- > > Key: HADOOP-19184 > URL: https://issues.apache.org/jira/browse/HADOOP-19184 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Critical > > {code:java} > [INFO] > [ERROR] Failures: > [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects > compared to deleted paths > org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{ > requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, > deletes=0}] > Expecting: > > <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", > > "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", > > "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", > > "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45", > > "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]> > to contain exactly in any order: > <[]> > but the following elements were unexpected: > > <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", > > "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", > > "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", > > "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing
Mukund Thakur created HADOOP-19184: -- Summary: TestStagingCommitter.testJobCommitFailure failing Key: HADOOP-19184 URL: https://issues.apache.org/jira/browse/HADOOP-19184 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Mukund Thakur Assignee: Mukund Thakur [INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to deleted paths org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4\{ requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, deletes=0}] Expecting: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to contain exactly in any order: <[]> but the following elements were unexpected: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.
[ https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19013. Resolution: Fixed > fs.getXattrs(path) for S3FS doesn't have > x-amz-server-side-encryption-aws-kms-key-id header. > > > Key: HADOOP-19013 > URL: https://issues.apache.org/jira/browse/HADOOP-19013 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > Once a path while uploading has been encrypted with SSE-KMS with a key id and > then later when we try to read the attributes of the same file, it doesn't > contain the key id information as an attribute. should we add it? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.
[ https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19013: --- Fix Version/s: 3.4.1 > fs.getXattrs(path) for S3FS doesn't have > x-amz-server-side-encryption-aws-kms-key-id header. > > > Key: HADOOP-19013 > URL: https://issues.apache.org/jira/browse/HADOOP-19013 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > Once a path while uploading has been encrypted with SSE-KMS with a key id and > then later when we try to read the attributes of the same file, it doesn't > contain the key id information as an attribute. should we add it? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19177) TestS3ACachingBlockManager fails intermittently in Yetus
Mukund Thakur created HADOOP-19177: -- Summary: TestS3ACachingBlockManager fails intermittently in Yetus Key: HADOOP-19177 URL: https://issues.apache.org/jira/browse/HADOOP-19177 Project: Hadoop Common Issue Type: Test Components: fs/s3 Affects Versions: 3.4.0 Reporter: Mukund Thakur {code:java} [ERROR] org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingOfGet -- Time elapsed: 60.45 s <<< ERROR! java.lang.IllegalStateException: waitForCaching: expected: 1, actual: 0, read errors: 0, caching errors: 1 at org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.waitForCaching(TestS3ACachingBlockManager.java:465) at org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingOfGetHelper(TestS3ACachingBlockManager.java:435) at org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingOfGet(TestS3ACachingBlockManager.java:398) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:750) [INFO] [INFO] Results: [INFO] [ERROR] Errors: [ERROR] org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingFailureOfGet [ERROR] Run 1: TestS3ACachingBlockManager.testCachingFailureOfGet:405->testCachingOfGetHelper:435->waitForCaching:465 IllegalState waitForCaching: expected: 1, actual: 0, read errors: 0, caching errors: 1 [ERROR] Run 2: TestS3ACachingBlockManager.testCachingFailureOfGet:405->testCachingOfGetHelper:435->waitForCaching:465 IllegalState waitForCaching: expected: 1, actual: 0, read errors: 0, caching errors: 1 [ERROR] Run 3: TestS3ACachingBlockManager.testCachingFailureOfGet:405->testCachingOfGetHelper:435->waitForCaching:465 IllegalState waitForCaching: expected: 1, actual: 0, read errors: 0, caching errors: 1 {code} Discovered in [https://github.com/apache/hadoop/pull/6646#issuecomment-2111558054] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19150) Test ITestAbfsRestOperationException#testAuthFailException is broken.
[ https://issues.apache.org/jira/browse/HADOOP-19150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19150. Fix Version/s: 3.4.1 Resolution: Fixed > Test ITestAbfsRestOperationException#testAuthFailException is broken. > -- > > Key: HADOOP-19150 > URL: https://issues.apache.org/jira/browse/HADOOP-19150 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Mukund Thakur >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > {code:java} > intercept(Exception.class, > () -> { > fs.getFileStatus(new Path("/")); > }); {code} > Intercept shouldn't be used as there are assertions in catch statements. > > CC [~ste...@apache.org] [~anujmodi2021] [~asrani_anmol] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19150) Test ITestAbfsRestOperationException#testAuthFailException is broken.
Mukund Thakur created HADOOP-19150: -- Summary: Test ITestAbfsRestOperationException#testAuthFailException is broken. Key: HADOOP-19150 URL: https://issues.apache.org/jira/browse/HADOOP-19150 Project: Hadoop Common Issue Type: Sub-task Reporter: Mukund Thakur {code:java} intercept(Exception.class, () -> { fs.getFileStatus(new Path("/")); }); {code} Intercept shouldn't be used as there are assertions in catch statements. CC [~ste...@apache.org] [~anujmodi2021] [~asrani_anmol] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19149) ABFS: Implement ThreadLocal for ObjectMapper in AzureHttpOperation via config option with static shared instance as an alternative.
Mukund Thakur created HADOOP-19149: -- Summary: ABFS: Implement ThreadLocal for ObjectMapper in AzureHttpOperation via config option with static shared instance as an alternative. Key: HADOOP-19149 URL: https://issues.apache.org/jira/browse/HADOOP-19149 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Mukund Thakur Assignee: Mukund Thakur While doing internal tests on Hive TPCDS queries we have seen many instances of ObjectMapper have been created in an Application Master thus sharing a thread local object mapper instances will improve the performance. CC [~ste...@apache.org] [~harshit.gupta] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
[ https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837408#comment-17837408 ] Mukund Thakur commented on HADOOP-18296: {quote}Mukund, do we actually need to coalesce ranges on local fs reads? because it is all local. we can just push out a list of independent regions. {quote} We are not merging during default vectored read and Raw local FS read implementation. Although we are merging during the checksum FS. {quote}we do still need to deal with failures by adding the ability to return buffers to any pool on failure. {quote} if the read failed for any range, future.get() will throw an exception, and thus the caller can return it to the pool. As per the design, the management of buffers in a pool is being handled by the caller of API. > Memory fragmentation in ChecksumFileSystem Vectored IO implementation. > -- > > Key: HADOOP-18296 > URL: https://issues.apache.org/jira/browse/HADOOP-18296 > Project: Hadoop Common > Issue Type: Sub-task > Components: common >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Priority: Minor > Labels: fs > > As we have implemented merging of ranges in the ChecksumFSInputChecker > implementation of vectored IO api, it can lead to memory fragmentation. Let > me explain by example. > > Suppose client requests for 3 ranges. > 0-500, 700-1000 and 1200-1500. > Now because of merging, all the above ranges will get merged into one and we > will allocate a big byte buffer of 0-1500 size but return sliced byte buffers > for the desired ranges. > Now once the client is done reading all the ranges, it will only be able to > free the memory for requested ranges and memory of the gaps will never be > released for eg here (500-700 and 1000-1200). > > Note this only happens for direct byte buffers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
[ https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836272#comment-17836272 ] Mukund Thakur commented on HADOOP-18296: Yes, it is. Although direct buffers are not used in Orc/Parquet. thinking if we should throw an Exception if the user is calling readVectored on direct buffers something like {code:java} class ChecksumFSInputChecker { ... ... @Override public void readVectored(List ranges, IntFunction allocate) throws IOException { if (allocate.apply(0).isDirect()) { throw new UnsupportedOperationException("Direct buffer is not supported"); } } }{code} cc [~ste...@apache.org] > Memory fragmentation in ChecksumFileSystem Vectored IO implementation. > -- > > Key: HADOOP-18296 > URL: https://issues.apache.org/jira/browse/HADOOP-18296 > Project: Hadoop Common > Issue Type: Sub-task > Components: common >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Priority: Minor > Labels: fs > > As we have implemented merging of ranges in the ChecksumFSInputChecker > implementation of vectored IO api, it can lead to memory fragmentation. Let > me explain by example. > > Suppose client requests for 3 ranges. > 0-500, 700-1000 and 1200-1500. > Now because of merging, all the above ranges will get merged into one and we > will allocate a big byte buffer of 0-1500 size but return sliced byte buffers > for the desired ranges. > Now once the client is done reading all the ranges, it will only be able to > free the memory for requested ranges and memory of the gaps will never be > released for eg here (500-700 and 1000-1200). > > Note this only happens for direct byte buffers. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17826) ABFS: Transient failure of TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting
[ https://issues.apache.org/jira/browse/HADOOP-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833737#comment-17833737 ] Mukund Thakur commented on HADOOP-17826: I am seeing this now . {code:java} [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.862 s <<< FAILURE! - in org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer [ERROR] testManySuccessAndErrorsAndWaiting(org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer) Time elapsed: 1.154 s <<< FAILURE! java.lang.AssertionError: The actual value 9 is not within the expected range: [5.60, 8.40]. at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.assertTrue(Assert.java:42) at org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer.fuzzyValidate(TestAbfsClientThrottlingAnalyzer.java:64) at org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting(TestAbfsClientThrottlingAnalyzer.java:181) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code} > ABFS: Transient failure of > TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting > -- > > Key: HADOOP-17826 > URL: https://issues.apache.org/jira/browse/HADOOP-17826 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure, test >Affects Versions: 3.4.0 >Reporter: Sumangala Patki >Priority: Major > > Transient failure of the below test observed for HNS OAuth, AppendBlob HNS > OAuth and Non-HNS SharedKey combinations. The value denoted by "actual value" > below varies across failures, and exceeds the upper limit of the expected > range. > _TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting:171->fuzzyValidate:49 > The actual value 10 is not within the expected range: [5.60, 8.40]._ > Verified failure with client and server in the same region to rule out > network issues. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19110) ITestExponentialRetryPolicy failing in branch-3.4
Mukund Thakur created HADOOP-19110: -- Summary: ITestExponentialRetryPolicy failing in branch-3.4 Key: HADOOP-19110 URL: https://issues.apache.org/jira/browse/HADOOP-19110 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.4.0 Reporter: Mukund Thakur Assignee: Anuj Modi {code:java} [ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 91.416 s <<< FAILURE! - in org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy [ERROR] testThrottlingIntercept(org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy) Time elapsed: 0.622 s <<< ERROR! Failure to initialize configuration for dummy.dfs.core.windows.net key ="null": Invalid configuration value detected for fs.azure.account.key at org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:53) at org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:646) at org.apache.hadoop.fs.azurebfs.services.ITestAbfsClient.createTestClientFromCurrentContext(ITestAbfsClient.java:339) at org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy.testThrottlingIntercept(ITestExponentialRetryPolicy.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
[ https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825849#comment-17825849 ] Mukund Thakur commented on HADOOP-19106: It fails because [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAuthorization.java#L360] returns null. and this only gets initialized when authType is SAS [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L1733] > [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE > - > > Key: HADOOP-19106 > URL: https://issues.apache.org/jira/browse/HADOOP-19106 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Assignee: Anuj Modi >Priority: Major > > When below config set to true all of the tests fails else it skips. > > fs.azure.test.namespace.enabled > true > > > [*ERROR*] > testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization) > Time elapsed: 0.064 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273) > at > org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
[ https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825838#comment-17825838 ] Mukund Thakur commented on HADOOP-19106: It does fail for me with the same config mentioned in [https://github.com/apache/hadoop/pull/6069#issuecomment-1965105331] + fs.azure.test.namespace.enabled=true. > [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE > - > > Key: HADOOP-19106 > URL: https://issues.apache.org/jira/browse/HADOOP-19106 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Assignee: Anuj Modi >Priority: Major > > When below config set to true all of the tests fails else it skips. > > fs.azure.test.namespace.enabled > true > > > [*ERROR*] > testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization) > Time elapsed: 0.064 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273) > at > org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18854) add options to disable range merging of vectored io
[ https://issues.apache.org/jira/browse/HADOOP-18854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825443#comment-17825443 ] Mukund Thakur commented on HADOOP-18854: There is already an option to disable merging {code:java} fs.s3a.vectored.read.max.merged.size 1M What is the largest merged read size in bytes such that we group ranges together during vectored read. Setting this value to 0 will disable merging of ranges. {code} > add options to disable range merging of vectored io > --- > > Key: HADOOP-18854 > URL: https://issues.apache.org/jira/browse/HADOOP-18854 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/s3 >Affects Versions: 3.3.5, 3.3.6 >Reporter: Steve Loughran >Priority: Major > > I'm seeing test failures in my PARQUET-2171 pr because assertions about the > #of bytes read isn't holding -small files are being read and the vector range > merging is pulling in the whole file. > ``` > [ERROR] TestInputOutputFormat.testReadWriteWithCounter:338 bytestotal != > bytesread expected:<5510> but was:<11020> > ``` > I think for parquet i will add an option to disable vector io, but really the > filesystems which support it should allow for merging to be disabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
[ https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825418#comment-17825418 ] Mukund Thakur commented on HADOOP-19106: CC [~snvijaya] [~pranavs] > [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE > - > > Key: HADOOP-19106 > URL: https://issues.apache.org/jira/browse/HADOOP-19106 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Priority: Major > > When below config set to true all of the tests fails else it skips. > > fs.azure.test.namespace.enabled > true > > > [*ERROR*] > testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization) > Time elapsed: 0.064 s <<< ERROR! > java.lang.NullPointerException > at > org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273) > at > org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
Mukund Thakur created HADOOP-19106: -- Summary: [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE Key: HADOOP-19106 URL: https://issues.apache.org/jira/browse/HADOOP-19106 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.4.0 Reporter: Mukund Thakur When below config set to true all of the tests fails else it skips. fs.azure.test.namespace.enabled true [*ERROR*] testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization) Time elapsed: 0.064 s <<< ERROR! java.lang.NullPointerException at org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273) at org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18759) [ABFS][Backoff-Optimization] Have a Static retry policy for connection timeout failures
[ https://issues.apache.org/jira/browse/HADOOP-18759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18759: --- Fix Version/s: 3.5.0 > [ABFS][Backoff-Optimization] Have a Static retry policy for connection > timeout failures > --- > > Key: HADOOP-18759 > URL: https://issues.apache.org/jira/browse/HADOOP-18759 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.4 >Reporter: Anuj Modi >Assignee: Anuj Modi >Priority: Major > Fix For: 3.5.0 > > > Today when a request fails with connection timeout, it falls back into the > loop for exponential retry. Unlike Azure Storage, there are no guarantees of > success on exponentially retried request or recommendations for ideal retry > policies for Azure network or any other general failures. Faster failure and > retry might be more beneficial for such generic connection timeout failures. > This PR introduces a new Static Retry Policy which will currently be used > only for Connection Timeout failures. It means all the requests failing with > Connection Timeout errors will be retried after a constant retry(sleep) > interval independent of how many times that request has failed. Max Retry > Count check will still be in place. > Following Configurations will be introduced in the change: > # "fs.azure.static.retry.for.connection.timeout.enabled" - default: true, > true: static retry will be used for CT, false: Exponential retry will be used. > # "fs.azure.static.retry.interval" - default: 1000ms. > This also introduces a new field in x-ms-client-request-id only for the > requests that are being retried after connection timeout failure. New filed > will tell what retry policy was used to get the sleep interval before making > this request. > Header "x-ms-client-request-id " right now has only the retryCount and > retryReason this particular API call is. For ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT. > Moving ahead for retryReason "CT" it will have retry policy abbreviation as > well. > For ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT_E. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19015: --- Fix Version/s: 3.3.7 > Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting > for connection from pool > -- > > Key: HADOOP-19015 > URL: https://issues.apache.org/jira/browse/HADOOP-19015 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.7, 3.5.0, 3.4.1 > > > Getting errors in jobs which can be fixed by increasing this > 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on > s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0: > software.amazon.awssdk.core.exception.SdkClientException: Unable to execute > HTTP request: Timeout waiting for connection from pool at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.
[ https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801945#comment-17801945 ] Mukund Thakur commented on HADOOP-19013: Well, this is an attribute. So setting it would be nice. Not mandatory though. I think copy in S3A already does the updation of the kms-key during a copy operation. > fs.getXattrs(path) for S3FS doesn't have > x-amz-server-side-encryption-aws-kms-key-id header. > > > Key: HADOOP-19013 > URL: https://issues.apache.org/jira/browse/HADOOP-19013 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > > Once a path while uploading has been encrypted with SSE-KMS with a key id and > then later when we try to read the attributes of the same file, it doesn't > contain the key id information as an attribute. should we add it? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19015: --- Component/s: fs/s3 > Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting > for connection from pool > -- > > Key: HADOOP-19015 > URL: https://issues.apache.org/jira/browse/HADOOP-19015 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > > Getting errors in jobs which can be fixed by increasing this > 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on > s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0: > software.amazon.awssdk.core.exception.SdkClientException: Unable to execute > HTTP request: Timeout waiting for connection from pool at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-19015: --- Affects Version/s: 3.4.0 > Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting > for connection from pool > -- > > Key: HADOOP-19015 > URL: https://issues.apache.org/jira/browse/HADOOP-19015 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.4.0 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > > Getting errors in jobs which can be fixed by increasing this > 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on > s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0: > software.amazon.awssdk.core.exception.SdkClientException: Unable to execute > HTTP request: Timeout waiting for connection from pool at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool
Mukund Thakur created HADOOP-19015: -- Summary: Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool Key: HADOOP-19015 URL: https://issues.apache.org/jira/browse/HADOOP-19015 Project: Hadoop Common Issue Type: Sub-task Reporter: Mukund Thakur Getting errors in jobs which can be fixed by increasing this 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: java.lang.RuntimeException: java.io.IOException: org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-19015: -- Assignee: Mukund Thakur > Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting > for connection from pool > -- > > Key: HADOOP-19015 > URL: https://issues.apache.org/jira/browse/HADOOP-19015 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > > Getting errors in jobs which can be fixed by increasing this > 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on > s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0: > software.amazon.awssdk.core.exception.SdkClientException: Unable to execute > HTTP request: Timeout waiting for connection from pool at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.
Mukund Thakur created HADOOP-19013: -- Summary: fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header. Key: HADOOP-19013 URL: https://issues.apache.org/jira/browse/HADOOP-19013 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6 Reporter: Mukund Thakur Assignee: Mukund Thakur Once a path while uploading has been encrypted with SSE-KMS with a key id and then later when we try to read the attributes of the same file, it doesn't contain the key id information as an attribute. should we add it? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-11867) Add a high-performance vectored read API.
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779666#comment-17779666 ] Mukund Thakur commented on HADOOP-11867: Hey [~yuanbo] Vectored IO is intelligent. It merges the nearby ranges and thus reduces the number of outgoing HTTP calls to object storage. > Add a high-performance vectored read API. > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3, hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Mukund Thakur >Priority: Major > Labels: performance, pull-request-available > Fix For: 3.3.5 > > Time Spent: 13h > Remaining Estimate: 0h > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.
[ https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18929. Fix Version/s: 3.3.9 Resolution: Fixed > Build failure while trying to create apache 3.3.7 release locally. > -- > > Key: HADOOP-18929 > URL: https://issues.apache.org/jira/browse/HADOOP-18929 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: PJ Fanning >Priority: Critical > Labels: pull-request-available > Fix For: 3.3.9 > > > {noformat} > [ESC[1;34mINFOESC[m] ESC[1m---< > ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m > >ESC[m > [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants > for Test 3.3.9-SNAPSHOT [105/111]ESC[m > [ESC[1;34mINFOESC[m] ESC[1m[ pom > ]-ESC[m > [ESC[1;34mINFOESC[m] > [ESC[1;34mINFOESC[m] ESC[1m--- > ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m > ESC[1m(enforce-banned-dependencies)ESC[m @ > ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m > [ESC[1;34mINFOESC[m] Adding ignorable dependency: > org.apache.hadoop:hadoop-annotations:null > [ESC[1;34mINFOESC[m] Adding ignore: * > [ESC[1;33mWARNINGESC[m] Rule 1: > org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message: > Duplicate classes found: > Found in: > org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile > org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile > Duplicate classes: > META-INF/versions/9/module-info.class > {noformat} > CC [~ste...@apache.org] [~weichu] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.
[ https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-18929: -- Assignee: PJ Fanning > Build failure while trying to create apache 3.3.7 release locally. > -- > > Key: HADOOP-18929 > URL: https://issues.apache.org/jira/browse/HADOOP-18929 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: PJ Fanning >Priority: Critical > Labels: pull-request-available > > {noformat} > [ESC[1;34mINFOESC[m] ESC[1m---< > ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m > >ESC[m > [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants > for Test 3.3.9-SNAPSHOT [105/111]ESC[m > [ESC[1;34mINFOESC[m] ESC[1m[ pom > ]-ESC[m > [ESC[1;34mINFOESC[m] > [ESC[1;34mINFOESC[m] ESC[1m--- > ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m > ESC[1m(enforce-banned-dependencies)ESC[m @ > ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m > [ESC[1;34mINFOESC[m] Adding ignorable dependency: > org.apache.hadoop:hadoop-annotations:null > [ESC[1;34mINFOESC[m] Adding ignore: * > [ESC[1;33mWARNINGESC[m] Rule 1: > org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message: > Duplicate classes found: > Found in: > org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile > org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile > Duplicate classes: > META-INF/versions/9/module-info.class > {noformat} > CC [~ste...@apache.org] [~weichu] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18890) remove okhttp usage
[ https://issues.apache.org/jira/browse/HADOOP-18890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774965#comment-17774965 ] Mukund Thakur commented on HADOOP-18890: Yes. I see you have already merged. > remove okhttp usage > --- > > Key: HADOOP-18890 > URL: https://issues.apache.org/jira/browse/HADOOP-18890 > Project: Hadoop Common > Issue Type: Improvement > Components: build, common >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > * relates to HADOOP-18496 > * simplifies the dependencies if hadoop doesn't use multiple 3rd party libs > to make http calls > * okhttp brings in other dependencies like the kotlin runtime > * hadoop already uses apache httpclient in some places -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.
[ https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773811#comment-17773811 ] Mukund Thakur commented on HADOOP-18929: Oh okay. A quick followup PR will do. Thanks > Build failure while trying to create apache 3.3.7 release locally. > -- > > Key: HADOOP-18929 > URL: https://issues.apache.org/jira/browse/HADOOP-18929 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Priority: Critical > > {noformat} > [ESC[1;34mINFOESC[m] ESC[1m---< > ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m > >ESC[m > [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants > for Test 3.3.9-SNAPSHOT [105/111]ESC[m > [ESC[1;34mINFOESC[m] ESC[1m[ pom > ]-ESC[m > [ESC[1;34mINFOESC[m] > [ESC[1;34mINFOESC[m] ESC[1m--- > ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m > ESC[1m(enforce-banned-dependencies)ESC[m @ > ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m > [ESC[1;34mINFOESC[m] Adding ignorable dependency: > org.apache.hadoop:hadoop-annotations:null > [ESC[1;34mINFOESC[m] Adding ignore: * > [ESC[1;33mWARNINGESC[m] Rule 1: > org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message: > Duplicate classes found: > Found in: > org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile > org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile > Duplicate classes: > META-INF/versions/9/module-info.class > {noformat} > CC [~ste...@apache.org] [~weichu] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-18895) upgrade to commons-compress 1.24.0 due to CVE
[ https://issues.apache.org/jira/browse/HADOOP-18895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reopened HADOOP-18895: > upgrade to commons-compress 1.24.0 due to CVE > - > > Key: HADOOP-18895 > URL: https://issues.apache.org/jira/browse/HADOOP-18895 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > Includes some important bug fixes including > https://lists.apache.org/thread/g9lrsz8j9nrgltcoc7v6cpkopg07czc9 - > CVE-2023-42503 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18895) upgrade to commons-compress 1.24.0 due to CVE
[ https://issues.apache.org/jira/browse/HADOOP-18895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773793#comment-17773793 ] Mukund Thakur commented on HADOOP-18895: We need to revert this as it is causing https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17773753#comment-17773753 > upgrade to commons-compress 1.24.0 due to CVE > - > > Key: HADOOP-18895 > URL: https://issues.apache.org/jira/browse/HADOOP-18895 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.9 > > > Includes some important bug fixes including > https://lists.apache.org/thread/g9lrsz8j9nrgltcoc7v6cpkopg07czc9 - > CVE-2023-42503 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.
[ https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773753#comment-17773753 ] Mukund Thakur commented on HADOOP-18929: Thanks, [~ayushtkn] for checking quickly. Let me revert and try. > Build failure while trying to create apache 3.3.7 release locally. > -- > > Key: HADOOP-18929 > URL: https://issues.apache.org/jira/browse/HADOOP-18929 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Priority: Critical > > {noformat} > [ESC[1;34mINFOESC[m] ESC[1m---< > ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m > >ESC[m > [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants > for Test 3.3.9-SNAPSHOT [105/111]ESC[m > [ESC[1;34mINFOESC[m] ESC[1m[ pom > ]-ESC[m > [ESC[1;34mINFOESC[m] > [ESC[1;34mINFOESC[m] ESC[1m--- > ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m > ESC[1m(enforce-banned-dependencies)ESC[m @ > ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m > [ESC[1;34mINFOESC[m] Adding ignorable dependency: > org.apache.hadoop:hadoop-annotations:null > [ESC[1;34mINFOESC[m] Adding ignore: * > [ESC[1;33mWARNINGESC[m] Rule 1: > org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message: > Duplicate classes found: > Found in: > org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile > org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile > Duplicate classes: > META-INF/versions/9/module-info.class > {noformat} > CC [~ste...@apache.org] [~weichu] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.
Mukund Thakur created HADOOP-18929: -- Summary: Build failure while trying to create apache 3.3.7 release locally. Key: HADOOP-18929 URL: https://issues.apache.org/jira/browse/HADOOP-18929 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.3.6 Reporter: Mukund Thakur {noformat} [ESC[1;34mINFOESC[m] ESC[1m---< ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m >ESC[m [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants for Test 3.3.9-SNAPSHOT [105/111]ESC[m [ESC[1;34mINFOESC[m] ESC[1m[ pom ]-ESC[m [ESC[1;34mINFOESC[m] [ESC[1;34mINFOESC[m] ESC[1m--- ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m ESC[1m(enforce-banned-dependencies)ESC[m @ ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m [ESC[1;34mINFOESC[m] Adding ignorable dependency: org.apache.hadoop:hadoop-annotations:null [ESC[1;34mINFOESC[m] Adding ignore: * [ESC[1;33mWARNINGESC[m] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message: Duplicate classes found: Found in: org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile Duplicate classes: META-INF/versions/9/module-info.class {noformat} CC [~ste...@apache.org] [~weichu] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18845) Add ability to configure ConnectionTTL of http connections while creating S3 Client.
[ https://issues.apache.org/jira/browse/HADOOP-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18845. Resolution: Fixed > Add ability to configure ConnectionTTL of http connections while creating S3 > Client. > > > Key: HADOOP-18845 > URL: https://issues.apache.org/jira/browse/HADOOP-18845 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.6 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18845) Add ability to configure ConnectionTTL of http connections while creating S3 Client.
Mukund Thakur created HADOOP-18845: -- Summary: Add ability to configure ConnectionTTL of http connections while creating S3 Client. Key: HADOOP-18845 URL: https://issues.apache.org/jira/browse/HADOOP-18845 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.3.6 Reporter: Mukund Thakur Assignee: Mukund Thakur Fix For: 3.3.9 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18763) Upgrade aws-java-sdk to 1.12.367+
[ https://issues.apache.org/jira/browse/HADOOP-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18763. Fix Version/s: 3.3.6 Resolution: Fixed > Upgrade aws-java-sdk to 1.12.367+ > - > > Key: HADOOP-18763 > URL: https://issues.apache.org/jira/browse/HADOOP-18763 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.3.6 > > > aws sdk bundle < 1.12.367 uses a vulnerable versions of netty which is > pulling in high severity CVE and creating unhappiness in security scans, even > if s3a doesn't use that lib. > The safe version for netty is netty:4.1.86.Final and this is used by > aws-java-adk:1.12.367+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18763) Upgrade aws-java-sdk to 1.12.367+
[ https://issues.apache.org/jira/browse/HADOOP-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-18763: -- Assignee: Viraj Jasani (was: Mukund Thakur) > Upgrade aws-java-sdk to 1.12.367+ > - > > Key: HADOOP-18763 > URL: https://issues.apache.org/jira/browse/HADOOP-18763 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > aws sdk bundle < 1.12.367 uses a vulnerable versions of netty which is > pulling in high severity CVE and creating unhappiness in security scans, even > if s3a doesn't use that lib. > The safe version for netty is netty:4.1.86.Final and this is used by > aws-java-adk:1.12.367+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18763) Upgrade aws-java-sdk to 1.12.367+
[ https://issues.apache.org/jira/browse/HADOOP-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-18763: -- Assignee: Mukund Thakur > Upgrade aws-java-sdk to 1.12.367+ > - > > Key: HADOOP-18763 > URL: https://issues.apache.org/jira/browse/HADOOP-18763 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > > aws sdk bundle < 1.12.367 uses a vulnerable versions of netty which is > pulling in high severity CVE and creating unhappiness in security scans, even > if s3a doesn't use that lib. > The safe version for netty is netty:4.1.86.Final and this is used by > aws-java-adk:1.12.367+ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17852) ABFS: Test with 100MB buffer size in ITestAbfsReadWriteAndSeek times out
[ https://issues.apache.org/jira/browse/HADOOP-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725998#comment-17725998 ] Mukund Thakur commented on HADOOP-17852: seeing this in one of our customer prod cluster. {code:java} "Executor task launch worker for task 329" #94 daemon prio=5 os_prio=0 cpu=17344.66ms elapsed=2109.99s tid=0x7f7750026000 nid=0x6586 waiting on condition [0x7f77414fa000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method) - parking to wait for <0x0006c14aea10> (a com.google.common.util.concurrent.TrustedListenableFutureTask) at java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:523) at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.waitForAppendsToComplete(AbfsOutputStream.java:602) - locked <0x000512a667c8> (a org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushWrittenBytesToService(AbfsOutputStream.java:621) - locked <0x000512a667c8> (a org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushInternal(AbfsOutputStream.java:536) - locked <0x000512a667c8> (a org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream) at org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.close(AbfsOutputStream.java:495) - locked <0x000512a667c8> (a org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105) at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:64) at org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:829) at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:122) at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:252) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1368) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:253) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:174) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:413) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419) at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128) at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628) at java.lang.Thread.run(java.base@11.0.13/Thread.java:829) {code} CC [~snvijaya] [~ste...@apache.org] > ABFS: Test with 100MB buffer size in ITestAbfsReadWriteAndSeek times out > - > > Key: HADOOP-17852 > URL: https://issues.apache.org/jira/browse/HADOOP-17852 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Sneha Vijayarajan >Assignee: Sneha Vijayarajan >Priority: Minor > > testReadAndWriteWithDifferentBufferSizesAndSeek with buffer size above 100 MB > is failing with timeout. It is delaying the whole test run by 15-30 mins. > [ERROR] >
[jira] [Commented] (HADOOP-18637) S3A to support upload of files greater than 2 GB using DiskBlocks
[ https://issues.apache.org/jira/browse/HADOOP-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691825#comment-17691825 ] Mukund Thakur commented on HADOOP-18637: As discussed offline, following changes will be required. * Introduce a new config to disable multipart upload everywhere and enable just a large file upload. * Error in public S3AFS.createMultipartUploader based on above config. * Error in staging committer based on above config. * Error in magic committer based on above config. * Error in write operations helper based on above config. * Add hasCapability(isMultiPartAllowed, path) use config. * If multipart upload is disabled we only upload via Disk. Add check for this. > S3A to support upload of files greater than 2 GB using DiskBlocks > - > > Key: HADOOP-18637 > URL: https://issues.apache.org/jira/browse/HADOOP-18637 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > > Use S3A Diskblocks to support the upload of files greater than 2 GB using > DiskBlocks. Currently, the max upload size of a single block is ~2GB. > cc: [~mthakur] [~ste...@apache.org] [~mehakmeet] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18103) High performance vectored read API in Hadoop
[ https://issues.apache.org/jira/browse/HADOOP-18103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18103. Fix Version/s: 3.3.5 (was: 3.4.0) Resolution: Fixed > High performance vectored read API in Hadoop > > > Key: HADOOP-18103 > URL: https://issues.apache.org/jira/browse/HADOOP-18103 > Project: Hadoop Common > Issue Type: New Feature > Components: common, fs, fs/adl, fs/s3 >Affects Versions: 3.3.4 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: perfomance, pull-request-available > Fix For: 3.3.5 > > Attachments: Vectored Read API for Hadoop FS.pdf > > Time Spent: 1.5h > Remaining Estimate: 0h > > Add support for multiple ranged vectored read api in PositionedReadable. The > default iterates through the ranges to read each synchronously, but the > intent is that FSDataInputStream subclasses can make more efficient readers > especially object stores implementation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18507) VectorIO FileRange type to support a "reference" field
[ https://issues.apache.org/jira/browse/HADOOP-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18507. Fix Version/s: 3.3.5 Resolution: Fixed > VectorIO FileRange type to support a "reference" field > -- > > Key: HADOOP-18507 > URL: https://issues.apache.org/jira/browse/HADOOP-18507 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > > to use in libraries, it is really good to be able to connect a FileRange back > to the application/library level structure (chunk/split data, usually). > Proposed: add an {{Object reference)) field which can be given arbitrary data > or null, and queried for by app. it is not used in the API at all -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
[ https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18460: --- Fix Version/s: (was: 3.4.0) > ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing > - > > Key: HADOOP-18460 > URL: https://issues.apache.org/jira/browse/HADOOP-18460 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0, 3.3.5 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.5 > > > seeing a test failure in both parallel and single test case runs of > {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Reopened] (HADOOP-11867) Add a high-performance vectored read API.
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reopened HADOOP-11867: > Add a high-performance vectored read API. > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3, hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Mukund Thakur >Priority: Major > Labels: performance, pull-request-available > Fix For: 3.3.5 > > Time Spent: 13h > Remaining Estimate: 0h > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-11867) Add a high-performance vectored read API.
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-11867. Resolution: Fixed > Add a high-performance vectored read API. > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3, hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Mukund Thakur >Priority: Major > Labels: performance, pull-request-available > Fix For: 3.3.5 > > Time Spent: 13h > Remaining Estimate: 0h > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18320) Improve S3A delegations token documentation
[ https://issues.apache.org/jira/browse/HADOOP-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18320. Fix Version/s: 3.3.9 Resolution: Fixed > Improve S3A delegations token documentation > --- > > Key: HADOOP-18320 > URL: https://issues.apache.org/jira/browse/HADOOP-18320 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Ahmar Suhail >Assignee: Ahmar Suhail >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.9 > > Time Spent: 20m > Remaining Estimate: 0h > > The current [delegations token > documentation|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md] > has some typos, this task tracks fixing those. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-11867) Add a high-performance vectored read API.
[ https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-11867: --- Fix Version/s: 3.3.5 > Add a high-performance vectored read API. > - > > Key: HADOOP-11867 > URL: https://issues.apache.org/jira/browse/HADOOP-11867 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/azure, fs/s3, hdfs-client >Affects Versions: 3.0.0 >Reporter: Gopal Vijayaraghavan >Assignee: Mukund Thakur >Priority: Major > Labels: performance, pull-request-available > Fix For: 3.3.5 > > Time Spent: 13h > Remaining Estimate: 0h > > The most significant way to read from a filesystem in an efficient way is to > let the FileSystem implementation handle the seek behaviour underneath the > API to be the most efficient as possible. > A better approach to the seek problem is to provide a sequence of read > locations as part of a single call, while letting the system schedule/plan > the reads ahead of time. > This is exceedingly useful for seek-heavy readers on HDFS, since this allows > for potentially optimizing away the seek-gaps within the FSDataInputStream > implementation. > For seek+read systems with even more latency than locally-attached disks, > something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would > take of the seeks internally while reading chunk.remaining() bytes into each > chunk (which may be {{slice()}}ed off a bigger buffer). > The base implementation can stub in this as a sequence of seeks + read() into > ByteBuffers, without forcing each FS implementation to override this in any > way. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18104) Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads
[ https://issues.apache.org/jira/browse/HADOOP-18104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18104: --- Fix Version/s: 3.3.5 > Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads > > > Key: HADOOP-18104 > URL: https://issues.apache.org/jira/browse/HADOOP-18104 > Project: Hadoop Common > Issue Type: Sub-task > Components: common, fs >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18107) Vectored IO support for large S3 files.
[ https://issues.apache.org/jira/browse/HADOOP-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18107: --- Fix Version/s: 3.3.5 > Vectored IO support for large S3 files. > > > Key: HADOOP-18107 > URL: https://issues.apache.org/jira/browse/HADOOP-18107 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 4h > Remaining Estimate: 0h > > This effort would mostly be adding more tests for large files under scale > tests and see if any new issue surfaces. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18105) Implement a variant of ElasticByteBufferPool which uses weak references for garbage collection.
[ https://issues.apache.org/jira/browse/HADOOP-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18105: --- Fix Version/s: 3.3.5 > Implement a variant of ElasticByteBufferPool which uses weak references for > garbage collection. > --- > > Key: HADOOP-18105 > URL: https://issues.apache.org/jira/browse/HADOOP-18105 > Project: Hadoop Common > Issue Type: Sub-task > Components: common, fs >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 4h > Remaining Estimate: 0h > > Currently in hadoop codebase, we have two classes which implements byte > buffers pooling. > One is ElasticByteBufferPool which doesn't use weak references and thus could > cause memory leaks in production environment. > Other is DirectBufferPool which uses weak references but doesn't support > caller's preference for either on-heap or off-heap buffers. > > The idea is to create an improved version of ElasticByteBufferPool by > subclassing it ( as it is marked as public and stable and used widely in hdfs > ) with essential functionalities required for effective buffer pooling. This > is important for the parent Vectored IO work. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18106) Handle memory fragmentation in S3 Vectored IO implementation.
[ https://issues.apache.org/jira/browse/HADOOP-18106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18106: --- Fix Version/s: 3.3.5 > Handle memory fragmentation in S3 Vectored IO implementation. > - > > Key: HADOOP-18106 > URL: https://issues.apache.org/jira/browse/HADOOP-18106 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 4.5h > Remaining Estimate: 0h > > As we have implemented merging of ranges in the S3AInputStream implementation > of vectored IO api, it can lead to memory fragmentation. Let me explain by > example. > > Suppose client requests for 3 ranges. > 0-500, 700-1000 and 1200-1500. > Now because of merging, all the above ranges will get merged into one and we > will allocate a big byte buffer of 0-1500 size but return sliced byte buffers > for the desired ranges. > Now once the client is done reading all the ranges, it will only be able to > free the memory for requested ranges and memory of the gaps will never be > released for eg here (500-700 and 1000-1200). > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18227) Add input stream IOstats for vectored IO api in S3A.
[ https://issues.apache.org/jira/browse/HADOOP-18227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18227: --- Fix Version/s: 3.3.5 (was: 3.4.0) > Add input stream IOstats for vectored IO api in S3A. > > > Key: HADOOP-18227 > URL: https://issues.apache.org/jira/browse/HADOOP-18227 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18355) Update previous index properly while validating overlapping ranges.
[ https://issues.apache.org/jira/browse/HADOOP-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18355: --- Fix Version/s: 3.3.5 (was: 3.4.0) > Update previous index properly while validating overlapping ranges. > > > Key: HADOOP-18355 > URL: https://issues.apache.org/jira/browse/HADOOP-18355 > Project: Hadoop Common > Issue Type: Sub-task > Components: common, fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 1h > Remaining Estimate: 0h > > [https://github.com/apache/hadoop/blob/a55ace7bc0c173f609b51e46cb0d4d8bcda3d79d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java#L201] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18392) Propagate vectored s3a input stream stats to file system stats.
[ https://issues.apache.org/jira/browse/HADOOP-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18392: --- Fix Version/s: 3.3.5 (was: 3.4.0) > Propagate vectored s3a input stream stats to file system stats. > --- > > Key: HADOOP-18392 > URL: https://issues.apache.org/jira/browse/HADOOP-18392 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18407) Improve vectored IO api spec.
[ https://issues.apache.org/jira/browse/HADOOP-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18407: --- Fix Version/s: 3.3.5 (was: 3.4.0) > Improve vectored IO api spec. > -- > > Key: HADOOP-18407 > URL: https://issues.apache.org/jira/browse/HADOOP-18407 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs, fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.5 > > > Let's add more details to the vectored IO api spec for better clarity. > * the position returned by getPos(); is undefined afterwards. > * note that if a file is changed during a read, the output is again > undefined. some ranges may be old data, some may be new, *and some may be both > * note that while reads are active, normal fs api calls may block. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18391) Improve VectoredReadUtils#readVectored() for direct buffers
[ https://issues.apache.org/jira/browse/HADOOP-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18391: --- Fix Version/s: 3.3.5 (was: 3.4.0) > Improve VectoredReadUtils#readVectored() for direct buffers > --- > > Key: HADOOP-18391 > URL: https://issues.apache.org/jira/browse/HADOOP-18391 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > > harden the VectoredReadUtils methods for consistent and more robust use, > especially in those filesystems which don't have the api. > VectoredReadUtils.readInDirectBuffer should allocate a max buffer size, .e.g > 4mb, then do repeated reads and copies; this ensures that you don't OOM with > many threads doing ranged requests. other libs do this. > readVectored to call validateNonOverlappingAndReturnSortedRanges before > iterating > this ensures the abfs/s3a requirements are always met, and that because > ranges will be read in order, prefetching by other clients will keep their > performance good. > readVectored to add special handling for 0 byte ranges -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18073) Upgrade AWS SDK to v2
[ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646318#comment-17646318 ] Mukund Thakur commented on HADOOP-18073: Looks good to me. Please re-run all the tests here [https://github.com/ahmarsuhail/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java] just to be sure. Also think about this https://issues.apache.org/jira/browse/HADOOP-17338 . An old related issue as the response of getObject has changed. > Upgrade AWS SDK to v2 > - > > Key: HADOOP-18073 > URL: https://issues.apache.org/jira/browse/HADOOP-18073 > Project: Hadoop Common > Issue Type: Task > Components: auth, fs/s3 >Affects Versions: 3.3.1 >Reporter: xiaowei sun >Assignee: Ahmar Suhail >Priority: Major > Labels: pull-request-available > Attachments: Upgrading S3A to SDKV2.pdf > > > This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java > V1 to AWS SDK for Java V2. > Original use case: > {quote}We would like to access s3 with AWS SSO, which is supported in > software.amazon.awssdk:sdk-core:2.*. > In particular, from > [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], > when to set 'fs.s3a.aws.credentials.provider', it must be > "com.amazonaws.auth.AWSCredentialsProvider". We would like to support > "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which > supports AWS SSO, so users only need to authenticate once. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18311) Upgrade dependencies to address several CVEs
[ https://issues.apache.org/jira/browse/HADOOP-18311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18311: --- Target Version/s: 3.3.9 [~svaughan] Are we still planning to do this for 3.3.5 release as we will be releasing that in a week. > Upgrade dependencies to address several CVEs > > > Key: HADOOP-18311 > URL: https://issues.apache.org/jira/browse/HADOOP-18311 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.3.3, 3.3.4 >Reporter: Steve Vaughan >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > The following CVEs can be addressed by upgrading dependencies within the > build. This includes a replacement of HTrace with a noop implementation. > * CVE-2018-7489 > * CVE-2020-10663 > * CVE-2020-28491 > * CVE-2020-35490 > * CVE-2020-35491 > * CVE-2020-36518 > * PRISMA-2021-0182 > This addresses all of the CVEs from 3.3.3 except for ones that would require > upgrading Netty to 4.x. I'll be submitting a pull request for 3.3.4. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18311) Upgrade dependencies to address several CVEs
[ https://issues.apache.org/jira/browse/HADOOP-18311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18311: --- Fix Version/s: (was: 3.3.5) > Upgrade dependencies to address several CVEs > > > Key: HADOOP-18311 > URL: https://issues.apache.org/jira/browse/HADOOP-18311 > Project: Hadoop Common > Issue Type: Improvement > Components: common >Affects Versions: 3.3.3, 3.3.4 >Reporter: Steve Vaughan >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > The following CVEs can be addressed by upgrading dependencies within the > build. This includes a replacement of HTrace with a noop implementation. > * CVE-2018-7489 > * CVE-2020-10663 > * CVE-2020-28491 > * CVE-2020-35490 > * CVE-2020-35491 > * CVE-2020-36518 > * PRISMA-2021-0182 > This addresses all of the CVEs from 3.3.3 except for ones that would require > upgrading Netty to 4.x. I'll be submitting a pull request for 3.3.4. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18408) [ABFS]: ITestAbfsManifestCommitProtocol fails on nonHNS configuration
[ https://issues.apache.org/jira/browse/HADOOP-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18408. Resolution: Fixed > [ABFS]: ITestAbfsManifestCommitProtocol fails on nonHNS configuration > -- > > Key: HADOOP-18408 > URL: https://issues.apache.org/jira/browse/HADOOP-18408 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Reporter: Pranav Saxena >Assignee: Sree Bhattacharyya >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0, 3.3.5 > > > ITestAbfsRenameStageFailure fails for NonHNS-SharedKey configuration. > Failure: > [ERROR] > ITestAbfsRenameStageFailure>TestRenameStageFailure.testResilienceAsExpected:126 > [resilient commit support] expected:<[tru]e> but was:<[fals]e> > RCA: > ResilientCommit looks for whether etags are preserved in rename, if not then > it throws an exception and the flag for resilientCommitByRename stays null, > leading ultimately to the test failure > Mitigation: > Since, etags are not preserved in the case of rename in nonHNS account, > required value for rename resilience should be False, as resilient commits > cannot be made. Thus, requiring a True value for requireRenameResilience for > nonHNS account is not a valid case. Hence, as part of this task, we shall set > correct value of False for requireRenameResilience for nonHNS account. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-7370) Optimize pread on ChecksumFileSystem
[ https://issues.apache.org/jira/browse/HADOOP-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17640787#comment-17640787 ] Mukund Thakur edited comment on HADOOP-7370 at 11/29/22 3:40 PM: - Vectored IO superceedes this feature. https://issues.apache.org/jira/browse/HADOOP-18103 was (Author: mthakur): Vectored IO superceedes this feature. > Optimize pread on ChecksumFileSystem > > > Key: HADOOP-7370 > URL: https://issues.apache.org/jira/browse/HADOOP-7370 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > Attachments: checksumfs-pread-0.20.txt > > > Currently the implementation of positional read in ChecksumFileSystem is > verify inefficient - it actually re-opens the underlying file and checksum > file, then seeks and uses normal read. Instead, it can push down positional > read directly to the underlying FS and verify checksum. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-7370) Optimize pread on ChecksumFileSystem
[ https://issues.apache.org/jira/browse/HADOOP-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-7370. --- Resolution: Won't Fix Vectored IO superceedes this feature. > Optimize pread on ChecksumFileSystem > > > Key: HADOOP-7370 > URL: https://issues.apache.org/jira/browse/HADOOP-7370 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Todd Lipcon >Assignee: Todd Lipcon >Priority: Major > Attachments: checksumfs-pread-0.20.txt > > > Currently the implementation of positional read in ChecksumFileSystem is > verify inefficient - it actually re-opens the underlying file and checksum > file, then seeks and uses normal read. Instead, it can push down positional > read directly to the underlying FS and verify checksum. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18482) ITestS3APrefetchingInputStream does not skip if no CSV test file available
[ https://issues.apache.org/jira/browse/HADOOP-18482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620619#comment-17620619 ] Mukund Thakur commented on HADOOP-18482: No this is not required for 3.3.5 as prefetching is not in 3.3.5. > ITestS3APrefetchingInputStream does not skip if no CSV test file available > -- > > Key: HADOOP-18482 > URL: https://issues.apache.org/jira/browse/HADOOP-18482 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.4 >Reporter: Daniel Carl Jones >Assignee: Daniel Carl Jones >Priority: Minor > Labels: pull-request-available > > We should use S3ATestUtils.getCSVTestFile(conf) to skip if the property is > empty (single space). > Today, when I set _fs.s3a.scale.test.csvfile_ to empty space, all but this > test that rely on the file are skipped. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
[ https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611265#comment-17611265 ] Mukund Thakur commented on HADOOP-18460: {code:java} /** * Get the s3 object for S3 server for a specified range. * Also checks if the vectored io operation has been stopped before and after * the http get request such that we don't waste time populating the buffers. * @param operationName name of the operation for which get object on S3 is called. * @param position position of the object to be read from S3. * @param length length from position of the object to be read from S3. * @return result s3 object. * @throws IOException exception if any. */ private S3Object getS3ObjectAndValidateNotNull(final String operationName, final long position, final int length) throws IOException { checkIfVectoredIOStopped(); S3Object objectRange = getS3Object(operationName, position, length); if (objectRange.getObjectContent() == null) { throw new PathIOException(uri, "Null IO stream received during " + operationName); } checkIfVectoredIOStopped(); return objectRange; } {code} We I made this change but while making I think there is one issue in this: Suppose we interrupt after getting the stream from S3, we will never be closing the S3Object thus leading to memory leak ? > ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing > - > > Key: HADOOP-18460 > URL: https://issues.apache.org/jira/browse/HADOOP-18460 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > > seeing a test failure in both parallel and single test case runs of > {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18463) Add an integration test to process data asynchronously during vectored read.
[ https://issues.apache.org/jira/browse/HADOOP-18463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18463. Fix Version/s: 3.3.5 Resolution: Fixed > Add an integration test to process data asynchronously during vectored read. > > > Key: HADOOP-18463 > URL: https://issues.apache.org/jira/browse/HADOOP-18463 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.5 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure
[ https://issues.apache.org/jira/browse/HADOOP-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18347. Fix Version/s: 3.3.5 Resolution: Fixed > Restrict vectoredIO threadpool to reduce memory pressure > > > Key: HADOOP-18347 > URL: https://issues.apache.org/jira/browse/HADOOP-18347 > Project: Hadoop Common > Issue Type: Sub-task > Components: common, fs, fs/adl, fs/s3 >Reporter: Rajesh Balamohan >Assignee: Mukund Thakur >Priority: Major > Labels: performance, pull-request-available > Fix For: 3.3.5 > > > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967 > Currently, it fetches all the ranges with unbounded threadpool. This will not > cause memory pressures with standard benchmarks like TPCDS. However, when > large number of ranges are present with large files, this could potentially > spike up memory usage of the task. Limiting the threadpool size could reduce > the memory usage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18470) Release hadoop 3.3.5
[ https://issues.apache.org/jira/browse/HADOOP-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610146#comment-17610146 ] Mukund Thakur commented on HADOOP-18470: https://github.com/apache/hadoop/tree/branch-3.3.5 > Release hadoop 3.3.5 > > > Key: HADOOP-18470 > URL: https://issues.apache.org/jira/browse/HADOOP-18470 > Project: Hadoop Common > Issue Type: New Feature > Components: build >Affects Versions: 3.3.5 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18470) Release hadoop 3.3.5
[ https://issues.apache.org/jira/browse/HADOOP-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-18470: -- Assignee: Mukund Thakur > Release hadoop 3.3.5 > > > Key: HADOOP-18470 > URL: https://issues.apache.org/jira/browse/HADOOP-18470 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18470) Release hadoop 3.3.5
Mukund Thakur created HADOOP-18470: -- Summary: Release hadoop 3.3.5 Key: HADOOP-18470 URL: https://issues.apache.org/jira/browse/HADOOP-18470 Project: Hadoop Common Issue Type: New Feature Reporter: Mukund Thakur -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
[ https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608894#comment-17608894 ] Mukund Thakur commented on HADOOP-18460: Actually no not over there but here [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L927] and [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1034] The reason for not putting inside populateBuffer() is the buffer allocation has already been done and it won't be released if we throw interrupted exception while populating the buffer. > ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing > - > > Key: HADOOP-18460 > URL: https://issues.apache.org/jira/browse/HADOOP-18460 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > > seeing a test failure in both parallel and single test case runs of > {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18463) Add an integration test process data asynchronously during vectored read.
Mukund Thakur created HADOOP-18463: -- Summary: Add an integration test process data asynchronously during vectored read. Key: HADOOP-18463 URL: https://issues.apache.org/jira/browse/HADOOP-18463 Project: Hadoop Common Issue Type: Sub-task Reporter: Mukund Thakur Assignee: Mukund Thakur -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18455) s3a prefetching Executor should be closed
[ https://issues.apache.org/jira/browse/HADOOP-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18455. Resolution: Fixed > s3a prefetching Executor should be closed > - > > Key: HADOOP-18455 > URL: https://issues.apache.org/jira/browse/HADOOP-18455 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > This is the follow-up work for HADOOP-18186. The new executor service we use > for s3a prefetching should be closed while shutting down the file system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18455) s3a prefetching Executor should be closed
[ https://issues.apache.org/jira/browse/HADOOP-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607936#comment-17607936 ] Mukund Thakur commented on HADOOP-18455: merged into trunk as of now. > s3a prefetching Executor should be closed > - > > Key: HADOOP-18455 > URL: https://issues.apache.org/jira/browse/HADOOP-18455 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > > This is the follow-up work for HADOOP-18186. The new executor service we use > for s3a prefetching should be closed while shutting down the file system. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
[ https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607393#comment-17607393 ] Mukund Thakur commented on HADOOP-18460: I think adding checkIfVectoredIOStopped() at [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1057] should fix the issue. > ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing > - > > Key: HADOOP-18460 > URL: https://issues.apache.org/jira/browse/HADOOP-18460 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > > seeing a test failure in both parallel and single test case runs of > {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
[ https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607393#comment-17607393 ] Mukund Thakur edited comment on HADOOP-18460 at 9/20/22 8:40 PM: - Unable to reproduce this consistently but I think adding checkIfVectoredIOStopped() at [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1057] should fix the issue. was (Author: mthakur): I think adding checkIfVectoredIOStopped() at [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1057] should fix the issue. > ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing > - > > Key: HADOOP-18460 > URL: https://issues.apache.org/jira/browse/HADOOP-18460 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Major > > seeing a test failure in both parallel and single test case runs of > {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer)) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.
[ https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602464#comment-17602464 ] Mukund Thakur commented on HADOOP-18439: pushed to branch-3.3 > Fix VectoredIO for LocalFileSystem when checksum is enabled. > > > Key: HADOOP-18439 > URL: https://issues.apache.org/jira/browse/HADOOP-18439 > Project: Hadoop Common > Issue Type: Sub-task > Components: common >Affects Versions: 3.3.9 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > > While merging the ranges in CheckSumFs, they are rounded up based on the > value of checksum bytes size > which leads to some ranges crossing the EOF thus they need to be fixed else > it will cause EOFException during actual reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.
[ https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18439. Resolution: Fixed > Fix VectoredIO for LocalFileSystem when checksum is enabled. > > > Key: HADOOP-18439 > URL: https://issues.apache.org/jira/browse/HADOOP-18439 > Project: Hadoop Common > Issue Type: Sub-task > Components: common >Affects Versions: 3.3.9 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9 > > > While merging the ranges in CheckSumFs, they are rounded up based on the > value of checksum bytes size > which leads to some ranges crossing the EOF thus they need to be fixed else > it will cause EOFException during actual reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.
[ https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur updated HADOOP-18439: --- Fix Version/s: 3.3.9 > Fix VectoredIO for LocalFileSystem when checksum is enabled. > > > Key: HADOOP-18439 > URL: https://issues.apache.org/jira/browse/HADOOP-18439 > Project: Hadoop Common > Issue Type: Sub-task > Components: common >Affects Versions: 3.3.9 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9 > > > While merging the ranges in CheckSumFs, they are rounded up based on the > value of checksum bytes size > which leads to some ranges crossing the EOF thus they need to be fixed else > it will cause EOFException during actual reads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure
[ https://issues.apache.org/jira/browse/HADOOP-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur reassigned HADOOP-18347: -- Assignee: Mukund Thakur > Restrict vectoredIO threadpool to reduce memory pressure > > > Key: HADOOP-18347 > URL: https://issues.apache.org/jira/browse/HADOOP-18347 > Project: Hadoop Common > Issue Type: Sub-task > Components: common, fs, fs/adl, fs/s3 >Reporter: Rajesh Balamohan >Assignee: Mukund Thakur >Priority: Major > Labels: performance > > https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967 > Currently, it fetches all the ranges with unbounded threadpool. This will not > cause memory pressures with standard benchmarks like TPCDS. However, when > large number of ranges are present with large files, this could potentially > spike up memory usage of the task. Limiting the threadpool size could reduce > the memory usage. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18447) Vectored IO: Threadpool should be closed on interrupts or during close calls
[ https://issues.apache.org/jira/browse/HADOOP-18447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601976#comment-17601976 ] Mukund Thakur commented on HADOOP-18447: Currently the threadpool is shared unbounded one but will be moved bounded one. But we are terminating the running vectored IO operations when the stream is closed [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L114] https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L611 > Vectored IO: Threadpool should be closed on interrupts or during close calls > > > Key: HADOOP-18447 > URL: https://issues.apache.org/jira/browse/HADOOP-18447 > Project: Hadoop Common > Issue Type: Sub-task > Components: common, fs, fs/adl, fs/s3 >Reporter: Rajesh Balamohan >Priority: Major > Labels: performance, stability > Attachments: Screenshot 2022-09-08 at 9.22.07 AM.png > > > Vectored IO threadpool should be closed on any interrupts or during > S3AFileSystem/S3AInputStream close() calls. > E.g Query which got cancelled in the middle of the run. However, in > background (e.g LLAP) vectored IO threads continued to run. > > !Screenshot 2022-09-08 at 9.22.07 AM.png|width=537,height=164! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org