[jira] [Assigned] (HADOOP-19087) Release Hadoop 3.4.1

2024-07-17 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-19087:
--

Assignee: Mukund Thakur

> Release Hadoop 3.4.1
> 
>
> Key: HADOOP-19087
> URL: https://issues.apache.org/jira/browse/HADOOP-19087
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>
> Release a minor update to hadoop 3.4.0 with
> * packaging enhancements
> * updated dependencies (where viable)
> * fixes for critical issues found after 3.4.0 released
> * low-risk feature enhancements (those which don't impact schedule...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17826) ABFS: Transient failure of TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting

2024-07-15 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17866137#comment-17866137
 ] 

Mukund Thakur commented on HADOOP-17826:


I saw this today as well. 

cc [~pranavs]  [~snvijaya] 

> ABFS: Transient failure of 
> TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting
> --
>
> Key: HADOOP-17826
> URL: https://issues.apache.org/jira/browse/HADOOP-17826
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.4.0
>Reporter: Sumangala Patki
>Priority: Major
>
> Transient failure of the below test observed for HNS OAuth, AppendBlob HNS 
> OAuth and Non-HNS SharedKey combinations. The value denoted by "actual value" 
> below varies across failures, and exceeds the upper limit of the expected 
> range.
> _TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting:171->fuzzyValidate:49
>  The actual value 10 is not within the expected range: [5.60, 8.40]._
> Verified failure with client and server in the same region to rule out 
> network issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18610) ABFS OAuth2 Token Provider to support Azure Workload Identity for AKS

2024-06-18 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18610:
---
Fix Version/s: 3.3.9
   3.4.1

> ABFS OAuth2 Token Provider to support Azure Workload Identity for AKS
> -
>
> Key: HADOOP-18610
> URL: https://issues.apache.org/jira/browse/HADOOP-18610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.3.4
>Reporter: Haifeng Chen
>Assignee: Anuj Modi
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.4.1
>
> Attachments: HADOOP-18610-preview.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In Jan 2023, Microsoft Azure AKS replaced its original pod-managed identity 
> with with [Azure Active Directory (Azure AD) workload 
> identities|https://learn.microsoft.com/en-us/azure/active-directory/develop/workload-identities-overview]
>  (preview), which integrate with the Kubernetes native capabilities to 
> federate with any external identity providers. This approach is simpler to 
> use and deploy.
> Refer to 
> [https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview|https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview.]
>  and [https://azure.github.io/azure-workload-identity/docs/introduction.html] 
> for more details.
> The basic use scenario is to access Azure cloud resources (such as cloud 
> storage) from Kubernetes (such as AKS) workload using Azure managed identity 
> federated with Kubernetes service account. The credential environment 
> variables in pod projected by Azure AD workload identity are like following:
> AZURE_AUTHORITY_HOST: (Injected by the webhook, 
> [https://login.microsoftonline.com/])
> AZURE_CLIENT_ID: (Injected by the webhook)
> AZURE_TENANT_ID: (Injected by the webhook)
> AZURE_FEDERATED_TOKEN_FILE: (Injected by the webhook, 
> /var/run/secrets/azure/tokens/azure-identity-token)
> The token in the file pointed by AZURE_FEDERATED_TOKEN_FILE is a JWT (JASON 
> Web Token) client assertion token which we can use to request to 
> AZURE_AUTHORITY_HOST (url is  AZURE_AUTHORITY_HOST + tenantId + 
> "/oauth2/v2.0/token")  for a AD token which can be used to directly access 
> the Azure cloud resources.
> This approach is very common and similar among cloud providers such as AWS 
> and GCP. Hadoop AWS integration has WebIdentityTokenCredentialProvider to 
> handle the same case.
> The existing MsiTokenProvider can only handle the managed identity associated 
> with Azure VM instance. We need to implement a WorkloadIdentityTokenProvider 
> which handle Azure Workload Identity case. For this, we need to add one 
> method (getTokenUsingJWTAssertion) in AzureADAuthenticator which will be used 
> by WorkloadIdentityTokenProvider.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path

2024-06-11 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19196.

Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Bulk delete api doesn't take the path to delete as the base path
> 
>
> Key: HADOOP-19196
> URL: https://issues.apache.org/jira/browse/HADOOP-19196
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> If you use the path of the file you intend to delete as the base path, you 
> get an error. This is because the validation requires the list to be of 
> children, but the base path itself should be valid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.

2024-06-11 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19137:
---
Fix Version/s: 3.4.1

> [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if 
> Customer-provided-key configs given.
> --
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.

2024-06-11 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19137.

Resolution: Fixed

> [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if 
> Customer-provided-key configs given.
> --
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.

2024-06-10 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19137:
---
Fix Version/s: 3.5.0

> [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if 
> Customer-provided-key configs given.
> --
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.

2024-06-10 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19137:
---
Summary: [ABFS]Prevent ABFS initialization for non-hierarchal-namespace 
account if Customer-provided-key configs given.  (was: [ABFS]:Extra getAcl call 
while calling the very first API of FileSystem)

> [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if 
> Customer-provided-key configs given.
> --
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18679) Add API for bulk/paged delete of files and objects

2024-06-06 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18679:
---
Description: 
iceberg and hbase could benefit from being able to give a list of individual 
files to delete -files which may be scattered round the bucket for better read 
peformance.

Add some new optional interface for an object store which allows a caller to 
submit a list of paths to files to delete, where
the expectation is
 * if a path is a file: delete
 * if a path is a dir, outcome undefined
For s3 that'd let us build these into DeleteRequest objects, and submit, 
without any probes first.

{quote}Cherrypicking
{quote}
when cherrypicking, you must include
 * followup commit #6854
 * https://issues.apache.org/jira/browse/HADOOP-19196
 * test fixes HADOOP-19814 and HADOOP-19188

  was:
iceberg and hbase could benefit from being able to give a list of individual 
files to delete -files which may be scattered round the bucket for better read 
peformance. 

Add some new optional interface for an object store which allows a caller to 
submit a list of paths to files to delete, where
the expectation is
* if a path is a file: delete
* if a path is a dir, outcome undefined
For s3 that'd let us build these into DeleteRequest objects, and submit, 
without any probes first.

bq. Cherrypicking

when cherrypicking, you must include

* followup commit #6854
* test fixes HADOOP-19814 and HADOOP-19188



> Add API for bulk/paged delete of files and objects
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance.
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
>  * if a path is a file: delete
>  * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.
> {quote}Cherrypicking
> {quote}
> when cherrypicking, you must include
>  * followup commit #6854
>  * https://issues.apache.org/jira/browse/HADOOP-19196
>  * test fixes HADOOP-19814 and HADOOP-19188



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path

2024-06-06 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17852954#comment-17852954
 ] 

Mukund Thakur commented on HADOOP-19196:


good catch. 

> Bulk delete api doesn't take the path to delete as the base path
> 
>
> Key: HADOOP-19196
> URL: https://issues.apache.org/jira/browse/HADOOP-19196
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>
> If you use the path of the file you intend to delete as the base path, you 
> get an error. This is because the validation requires the list to be of 
> children, but the base path itself should be valid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms

2024-06-03 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19190.

Resolution: Fixed

> Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes 
> when bucket not encrypted with sse-kms
> 
>
> Key: HADOOP-19190
> URL: https://issues.apache.org/jira/browse/HADOOP-19190
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs/s3
>Affects Versions: 3.4.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 
> s <<< FAILURE! -- in 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings
> [ERROR] 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes
>  -- Time elapsed: 5.065 s <<< FAILURE!
> org.junit.ComparisonFailure: [Server side encryption algorithm must match] 
> expected:<"[aws:kms]"> but was:<"[AES256]">
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms

2024-06-03 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19190:
---
Fix Version/s: 3.4.1

> Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes 
> when bucket not encrypted with sse-kms
> 
>
> Key: HADOOP-19190
> URL: https://issues.apache.org/jira/browse/HADOOP-19190
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs/s3
>Affects Versions: 3.4.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 
> s <<< FAILURE! -- in 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings
> [ERROR] 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes
>  -- Time elapsed: 5.065 s <<< FAILURE!
> org.junit.ComparisonFailure: [Server side encryption algorithm must match] 
> expected:<"[aws:kms]"> but was:<"[AES256]">
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2024-05-31 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17851217#comment-17851217
 ] 

Mukund Thakur commented on HADOOP-19013:


yes, you are right. thanks 

https://github.com/apache/hadoop/pull/6859/files

> fs.getXattrs(path) for S3FS doesn't have 
> x-amz-server-side-encryption-aws-kms-key-id header.
> 
>
> Key: HADOOP-19013
> URL: https://issues.apache.org/jira/browse/HADOOP-19013
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Once a path while uploading has been encrypted with SSE-KMS with a key id and 
> then later when we try to read the attributes of the same file, it doesn't 
> contain the key id information as an attribute. should we add it?
>  
> while cherry-picking please include 
> https://issues.apache.org/jira/browse/HADOOP-19190



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2024-05-31 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19013:
---
Description: 
Once a path while uploading has been encrypted with SSE-KMS with a key id and 
then later when we try to read the attributes of the same file, it doesn't 
contain the key id information as an attribute. should we add it?

 

while cherry-picking please include 
https://issues.apache.org/jira/browse/HADOOP-19190

  was:Once a path while uploading has been encrypted with SSE-KMS with a key id 
and then later when we try to read the attributes of the same file, it doesn't 
contain the key id information as an attribute. should we add it?


> fs.getXattrs(path) for S3FS doesn't have 
> x-amz-server-side-encryption-aws-kms-key-id header.
> 
>
> Key: HADOOP-19013
> URL: https://issues.apache.org/jira/browse/HADOOP-19013
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Once a path while uploading has been encrypted with SSE-KMS with a key id and 
> then later when we try to read the attributes of the same file, it doesn't 
> contain the key id information as an attribute. should we add it?
>  
> while cherry-picking please include 
> https://issues.apache.org/jira/browse/HADOOP-19190



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms

2024-05-31 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19190:
--

 Summary: Skip 
ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when 
bucket not encrypted with sse-kms
 Key: HADOOP-19190
 URL: https://issues.apache.org/jira/browse/HADOOP-19190
 Project: Hadoop Common
  Issue Type: Test
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Mukund Thakur
Assignee: Mukund Thakur


[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 s 
<<< FAILURE! -- in 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings
[ERROR] 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes
 -- Time elapsed: 5.065 s <<< FAILURE!
org.junit.ComparisonFailure: [Server side encryption algorithm must match] 
expected:<"[aws:kms]"> but was:<"[AES256]">
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138)
at 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added

2024-05-30 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19188:
---
Fix Version/s: 3.4.1

> TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added
> --
>
> Key: HADOOP-19188
> URL: https://issues.apache.org/jira/browse/HADOOP-19188
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, test
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> oh, we need to update a couple of tests so they know not to worry about the 
> new interface/method. The details are in the javadocs of FileSystem.
> Interesting these snuck through yetus, though they fail in PRs based atop 
> #6726
> {code}
> [ERROR] Failures: 
> [ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem
> [ERROR]   Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [ERROR]   Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [ERROR]   Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [INFO] 
> [ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented
> [ERROR]   Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> [ERROR]   Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> [ERROR]   Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18679) Add API for bulk/paged delete of files and objects

2024-05-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18679.

Resolution: Fixed

> Add API for bulk/paged delete of files and objects
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing

2024-05-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19184.

Fix Version/s: 3.4.1
   Resolution: Fixed

> TestStagingCommitter.testJobCommitFailure failing 
> --
>
> Key: HADOOP-19184
> URL: https://issues.apache.org/jira/browse/HADOOP-19184
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> {code:java}
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestStagingCommitter.testJobCommitFailure:662 [Committed objects 
> compared to deleted paths 
> org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{
>  requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
> deletes=0}] 
> Expecting:
>   
> <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
>     
> "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
>     
> "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
>     
> "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",
>     
> "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]>
> to contain exactly in any order:
>   <[]>
> but the following elements were unexpected:
>   
> <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
>     
> "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
>     
> "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
>     
> "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18679) Add API for bulk/paged delete of files and objects

2024-05-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18679:
---
Fix Version/s: 3.4.1

> Add API for bulk/paged delete of files and objects
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing

2024-05-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19184:
---
Description: 
{code:java}
[INFO] 
[ERROR] Failures: 
[ERROR]   TestStagingCommitter.testJobCommitFailure:662 [Committed objects 
compared to deleted paths 
org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{ 
requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
deletes=0}] 
Expecting:
  <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
    "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
    "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
    "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",
    "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]>
to contain exactly in any order:
  <[]>
but the following elements were unexpected:
  <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
    "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
    "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
    
"s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code}

  was:[INFO] [ERROR] Failures: [ERROR] 
TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to 
deleted paths 
org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4\{
 requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
deletes=0}] Expecting: 
<["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", 
"s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", 
"s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", 
"s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", 
"s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to 
contain exactly in any order: <[]> but the following elements were unexpected: 
<["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", 
"s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", 
"s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", 
"s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", 
"s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]>


> TestStagingCommitter.testJobCommitFailure failing 
> --
>
> Key: HADOOP-19184
> URL: https://issues.apache.org/jira/browse/HADOOP-19184
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Critical
>
> {code:java}
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestStagingCommitter.testJobCommitFailure:662 [Committed objects 
> compared to deleted paths 
> org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{
>  requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
> deletes=0}] 
> Expecting:
>   
> <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
>     
> "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
>     
> "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
>     
> "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",
>     
> "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]>
> to contain exactly in any order:
>   <[]>
> but the following elements were unexpected:
>   
> <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
>     
> "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
>     
> "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
>     
> "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing

2024-05-22 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19184:
--

 Summary: TestStagingCommitter.testJobCommitFailure failing 
 Key: HADOOP-19184
 URL: https://issues.apache.org/jira/browse/HADOOP-19184
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Mukund Thakur
Assignee: Mukund Thakur


[INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 
[Committed objects compared to deleted paths 
org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4\{
 requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
deletes=0}] Expecting: 
<["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", 
"s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", 
"s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", 
"s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", 
"s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to 
contain exactly in any order: <[]> but the following elements were unexpected: 
<["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", 
"s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", 
"s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", 
"s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", 
"s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2024-05-15 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19013.

Resolution: Fixed

> fs.getXattrs(path) for S3FS doesn't have 
> x-amz-server-side-encryption-aws-kms-key-id header.
> 
>
> Key: HADOOP-19013
> URL: https://issues.apache.org/jira/browse/HADOOP-19013
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Once a path while uploading has been encrypted with SSE-KMS with a key id and 
> then later when we try to read the attributes of the same file, it doesn't 
> contain the key id information as an attribute. should we add it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2024-05-15 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19013:
---
Fix Version/s: 3.4.1

> fs.getXattrs(path) for S3FS doesn't have 
> x-amz-server-side-encryption-aws-kms-key-id header.
> 
>
> Key: HADOOP-19013
> URL: https://issues.apache.org/jira/browse/HADOOP-19013
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> Once a path while uploading has been encrypted with SSE-KMS with a key id and 
> then later when we try to read the attributes of the same file, it doesn't 
> contain the key id information as an attribute. should we add it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19177) TestS3ACachingBlockManager fails intermittently in Yetus

2024-05-15 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19177:
--

 Summary: TestS3ACachingBlockManager fails intermittently in Yetus
 Key: HADOOP-19177
 URL: https://issues.apache.org/jira/browse/HADOOP-19177
 Project: Hadoop Common
  Issue Type: Test
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Mukund Thakur


{code:java}
[ERROR] 
org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingOfGet 
-- Time elapsed: 60.45 s <<< ERROR!
java.lang.IllegalStateException: waitForCaching: expected: 1, actual: 0, read 
errors: 0, caching errors: 1
at 
org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.waitForCaching(TestS3ACachingBlockManager.java:465)
at 
org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingOfGetHelper(TestS3ACachingBlockManager.java:435)
at 
org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingOfGet(TestS3ACachingBlockManager.java:398)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)

[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR] 
org.apache.hadoop.fs.s3a.prefetch.TestS3ACachingBlockManager.testCachingFailureOfGet
[ERROR]   Run 1: 
TestS3ACachingBlockManager.testCachingFailureOfGet:405->testCachingOfGetHelper:435->waitForCaching:465
 IllegalState waitForCaching: expected: 1, actual: 0, read errors: 0, caching 
errors: 1
[ERROR]   Run 2: 
TestS3ACachingBlockManager.testCachingFailureOfGet:405->testCachingOfGetHelper:435->waitForCaching:465
 IllegalState waitForCaching: expected: 1, actual: 0, read errors: 0, caching 
errors: 1
[ERROR]   Run 3: 
TestS3ACachingBlockManager.testCachingFailureOfGet:405->testCachingOfGetHelper:435->waitForCaching:465
 IllegalState waitForCaching: expected: 1, actual: 0, read errors: 0, caching 
errors: 1 {code}
Discovered in 
[https://github.com/apache/hadoop/pull/6646#issuecomment-2111558054] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19150) Test ITestAbfsRestOperationException#testAuthFailException is broken.

2024-04-29 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19150.

Fix Version/s: 3.4.1
   Resolution: Fixed

> Test ITestAbfsRestOperationException#testAuthFailException is broken. 
> --
>
> Key: HADOOP-19150
> URL: https://issues.apache.org/jira/browse/HADOOP-19150
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Mukund Thakur
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> {code:java}
> intercept(Exception.class,
> () -> {
>   fs.getFileStatus(new Path("/"));
> }); {code}
> Intercept shouldn't be used as there are assertions in catch statements. 
>  
> CC [~ste...@apache.org]  [~anujmodi2021] [~asrani_anmol] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19150) Test ITestAbfsRestOperationException#testAuthFailException is broken.

2024-04-16 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19150:
--

 Summary: Test 
ITestAbfsRestOperationException#testAuthFailException is broken. 
 Key: HADOOP-19150
 URL: https://issues.apache.org/jira/browse/HADOOP-19150
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Mukund Thakur


{code:java}
intercept(Exception.class,
() -> {
  fs.getFileStatus(new Path("/"));
}); {code}
Intercept shouldn't be used as there are assertions in catch statements. 

 

CC [~ste...@apache.org]  [~anujmodi2021] [~asrani_anmol] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19149) ABFS: Implement ThreadLocal for ObjectMapper in AzureHttpOperation via config option with static shared instance as an alternative.

2024-04-16 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19149:
--

 Summary: ABFS: Implement ThreadLocal for ObjectMapper in 
AzureHttpOperation via config option with static shared instance as an 
alternative.
 Key: HADOOP-19149
 URL: https://issues.apache.org/jira/browse/HADOOP-19149
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Mukund Thakur
Assignee: Mukund Thakur


While doing internal tests on Hive TPCDS queries we have seen many instances of 
ObjectMapper have been created in an Application Master thus sharing a thread 
local object mapper instances will improve the performance.  

 

CC [~ste...@apache.org]  [~harshit.gupta] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.

2024-04-15 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837408#comment-17837408
 ] 

Mukund Thakur commented on HADOOP-18296:


{quote}Mukund, do we actually need to coalesce ranges on local fs reads? 
because it is all local. we can just push out a list of independent regions.
{quote}
We are not merging during default vectored read and Raw local FS read 
implementation. Although we are merging during the checksum FS. 

 
{quote}we do still need to deal with failures by adding the ability to return 
buffers to any pool on failure.
{quote}
 
if the read failed for any range, future.get() will throw an exception, and 
thus the caller can return it to the pool. As per the design, the management of 
buffers in a pool is being handled by the caller of API. 

> Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
> --
>
> Key: HADOOP-18296
> URL: https://issues.apache.org/jira/browse/HADOOP-18296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Priority: Minor
>  Labels: fs
>
> As we have implemented merging of ranges in the ChecksumFSInputChecker 
> implementation of vectored IO api, it can lead to memory fragmentation. Let 
> me explain by example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  
> Note this only happens for direct byte buffers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18296) Memory fragmentation in ChecksumFileSystem Vectored IO implementation.

2024-04-11 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836272#comment-17836272
 ] 

Mukund Thakur commented on HADOOP-18296:


Yes, it is. Although direct buffers are not used in Orc/Parquet.  thinking if 
we should throw an Exception if the user is calling readVectored on direct 
buffers something like 

 

 
{code:java}
class ChecksumFSInputChecker {
...
...
@Override
public void readVectored(List ranges,
 IntFunction allocate) throws IOException {
  if (allocate.apply(0).isDirect()) {
throw new UnsupportedOperationException("Direct buffer is not supported");
  }
} 
}{code}
cc [~ste...@apache.org] 

 

 

> Memory fragmentation in ChecksumFileSystem Vectored IO implementation.
> --
>
> Key: HADOOP-18296
> URL: https://issues.apache.org/jira/browse/HADOOP-18296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Priority: Minor
>  Labels: fs
>
> As we have implemented merging of ranges in the ChecksumFSInputChecker 
> implementation of vectored IO api, it can lead to memory fragmentation. Let 
> me explain by example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  
> Note this only happens for direct byte buffers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17826) ABFS: Transient failure of TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting

2024-04-03 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833737#comment-17833737
 ] 

Mukund Thakur commented on HADOOP-17826:


I am seeing this now . 
{code:java}
[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 25.862 
s <<< FAILURE! - in 
org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer
[ERROR] 
testManySuccessAndErrorsAndWaiting(org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer)
  Time elapsed: 1.154 s  <<< FAILURE!
java.lang.AssertionError: The actual value 9 is not within the expected range: 
[5.60, 8.40].
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at 
org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer.fuzzyValidate(TestAbfsClientThrottlingAnalyzer.java:64)
at 
org.apache.hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting(TestAbfsClientThrottlingAnalyzer.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) {code}

> ABFS: Transient failure of 
> TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting
> --
>
> Key: HADOOP-17826
> URL: https://issues.apache.org/jira/browse/HADOOP-17826
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.4.0
>Reporter: Sumangala Patki
>Priority: Major
>
> Transient failure of the below test observed for HNS OAuth, AppendBlob HNS 
> OAuth and Non-HNS SharedKey combinations. The value denoted by "actual value" 
> below varies across failures, and exceeds the upper limit of the expected 
> range.
> _TestAbfsClientThrottlingAnalyzer.testManySuccessAndErrorsAndWaiting:171->fuzzyValidate:49
>  The actual value 10 is not within the expected range: [5.60, 8.40]._
> Verified failure with client and server in the same region to rule out 
> network issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19110) ITestExponentialRetryPolicy failing in branch-3.4

2024-03-13 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19110:
--

 Summary: ITestExponentialRetryPolicy failing in branch-3.4
 Key: HADOOP-19110
 URL: https://issues.apache.org/jira/browse/HADOOP-19110
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Mukund Thakur
Assignee: Anuj Modi


{code:java}
[ERROR] Tests run: 6, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 91.416 
s <<< FAILURE! - in 
org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy
[ERROR] 
testThrottlingIntercept(org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy)
  Time elapsed: 0.622 s  <<< ERROR!
Failure to initialize configuration for dummy.dfs.core.windows.net key ="null": 
Invalid configuration value detected for fs.azure.account.key
at 
org.apache.hadoop.fs.azurebfs.services.SimpleKeyProvider.getStorageAccountKey(SimpleKeyProvider.java:53)
at 
org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:646)
at 
org.apache.hadoop.fs.azurebfs.services.ITestAbfsClient.createTestClientFromCurrentContext(ITestAbfsClient.java:339)
at 
org.apache.hadoop.fs.azurebfs.services.ITestExponentialRetryPolicy.testThrottlingIntercept(ITestExponentialRetryPolicy.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE

2024-03-12 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825849#comment-17825849
 ] 

Mukund Thakur commented on HADOOP-19106:


It fails because 
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemAuthorization.java#L360]
  returns null. 

and this only gets initialized when authType is SAS 
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java#L1733]
 

 

 

> [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
> -
>
> Key: HADOOP-19106
> URL: https://issues.apache.org/jira/browse/HADOOP-19106
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Anuj Modi
>Priority: Major
>
> When below config set to true all of the tests fails else it skips.
> 
>     fs.azure.test.namespace.enabled
>     true
> 
>  
> [*ERROR*] 
> testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization)
>   Time elapsed: 0.064 s  <<< ERROR!
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE

2024-03-12 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825838#comment-17825838
 ] 

Mukund Thakur commented on HADOOP-19106:


It does fail for me with the same config mentioned in  
[https://github.com/apache/hadoop/pull/6069#issuecomment-1965105331] + 
fs.azure.test.namespace.enabled=true. 

 

> [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
> -
>
> Key: HADOOP-19106
> URL: https://issues.apache.org/jira/browse/HADOOP-19106
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Anuj Modi
>Priority: Major
>
> When below config set to true all of the tests fails else it skips.
> 
>     fs.azure.test.namespace.enabled
>     true
> 
>  
> [*ERROR*] 
> testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization)
>   Time elapsed: 0.064 s  <<< ERROR!
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18854) add options to disable range merging of vectored io

2024-03-11 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825443#comment-17825443
 ] 

Mukund Thakur commented on HADOOP-18854:


There is already an option to disable merging 
{code:java}

   fs.s3a.vectored.read.max.merged.size
   1M
   
  What is the largest merged read size in bytes such
  that we group ranges together during vectored read.
  Setting this value to 0 will disable merging of ranges.
   

 {code}

> add options to disable range merging of vectored io
> ---
>
> Key: HADOOP-18854
> URL: https://issues.apache.org/jira/browse/HADOOP-18854
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.5, 3.3.6
>Reporter: Steve Loughran
>Priority: Major
>
> I'm seeing test failures in my PARQUET-2171 pr because assertions about the 
> #of bytes read isn't holding -small files are being read and the vector range 
> merging is pulling in the whole file.
> ```
> [ERROR]   TestInputOutputFormat.testReadWriteWithCounter:338 bytestotal != 
> bytesread expected:<5510> but was:<11020>
> ```
> I think for parquet i will add an option to disable vector io, but really the 
> filesystems which support it should allow for merging to be disabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE

2024-03-11 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825418#comment-17825418
 ] 

Mukund Thakur commented on HADOOP-19106:


CC [~snvijaya]  [~pranavs] 

> [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE
> -
>
> Key: HADOOP-19106
> URL: https://issues.apache.org/jira/browse/HADOOP-19106
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Priority: Major
>
> When below config set to true all of the tests fails else it skips.
> 
>     fs.azure.test.namespace.enabled
>     true
> 
>  
> [*ERROR*] 
> testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization)
>   Time elapsed: 0.064 s  <<< ERROR!
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>  at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19106) [ABFS] All tests of. ITestAzureBlobFileSystemAuthorization fails with NPE

2024-03-11 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19106:
--

 Summary: [ABFS] All tests of. 
ITestAzureBlobFileSystemAuthorization fails with NPE
 Key: HADOOP-19106
 URL: https://issues.apache.org/jira/browse/HADOOP-19106
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Mukund Thakur


When below config set to true all of the tests fails else it skips.



    fs.azure.test.namespace.enabled

    true



 

[*ERROR*] 
testOpenFileAuthorized(org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization)
  Time elapsed: 0.064 s  <<< ERROR!

java.lang.NullPointerException

 at 
org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.runTest(ITestAzureBlobFileSystemAuthorization.java:273)

 at 
org.apache.hadoop.fs.azurebfs.ITestAzureBlobFileSystemAuthorization.testOpenFileAuthorized(ITestAzureBlobFileSystemAuthorization.java:132)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:498)

 at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)

 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)

 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)

 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)

 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)

 at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18759) [ABFS][Backoff-Optimization] Have a Static retry policy for connection timeout failures

2024-02-20 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18759:
---
Fix Version/s: 3.5.0

> [ABFS][Backoff-Optimization] Have a Static retry policy for connection 
> timeout failures
> ---
>
> Key: HADOOP-18759
> URL: https://issues.apache.org/jira/browse/HADOOP-18759
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.4
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
> Fix For: 3.5.0
>
>
> Today when a request fails with connection timeout, it falls back into the 
> loop for exponential retry. Unlike Azure Storage, there are no guarantees of 
> success on exponentially retried request or recommendations for ideal retry 
> policies for Azure network or any other general failures. Faster failure and 
> retry might be more beneficial for such generic connection timeout failures. 
> This PR introduces a new Static Retry Policy which will currently be used 
> only for Connection Timeout failures. It means all the requests failing with 
> Connection Timeout errors will be retried after a constant retry(sleep) 
> interval independent of how many times that request has failed. Max Retry 
> Count check will still be in place.
> Following Configurations will be introduced in the change:
>  # "fs.azure.static.retry.for.connection.timeout.enabled" - default: true, 
> true: static retry will be used for CT, false: Exponential retry will be used.
>  # "fs.azure.static.retry.interval" - default: 1000ms.
> This also introduces a new field in x-ms-client-request-id only for the 
> requests that are being retried after connection timeout failure. New filed 
> will tell what retry policy was used to get the sleep interval before making 
> this request.
> Header "x-ms-client-request-id " right now has only the retryCount and 
> retryReason this particular API call is. For ex:  
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT.
> Moving ahead for retryReason "CT" it will have retry policy abbreviation as 
> well.
> For ex:  
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_CT_E.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2024-01-25 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19015:
---
Fix Version/s: 3.3.7

> Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting 
> for connection from pool
> --
>
> Key: HADOOP-19015
> URL: https://issues.apache.org/jira/browse/HADOOP-19015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.7, 3.5.0, 3.4.1
>
>
> Getting errors in jobs which can be fixed by increasing this 
> 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
> s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
>  software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2024-01-02 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801945#comment-17801945
 ] 

Mukund Thakur commented on HADOOP-19013:


Well, this is an attribute. So setting it would be nice. Not mandatory though. 

I think copy in S3A already does the updation of the kms-key during a copy 
operation. 

> fs.getXattrs(path) for S3FS doesn't have 
> x-amz-server-side-encryption-aws-kms-key-id header.
> 
>
> Key: HADOOP-19013
> URL: https://issues.apache.org/jira/browse/HADOOP-19013
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>
> Once a path while uploading has been encrypted with SSE-KMS with a key id and 
> then later when we try to read the attributes of the same file, it doesn't 
> contain the key id information as an attribute. should we add it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2023-12-19 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19015:
---
Component/s: fs/s3

> Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting 
> for connection from pool
> --
>
> Key: HADOOP-19015
> URL: https://issues.apache.org/jira/browse/HADOOP-19015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>
> Getting errors in jobs which can be fixed by increasing this 
> 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
> s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
>  software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2023-12-19 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-19015:
---
Affects Version/s: 3.4.0

> Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting 
> for connection from pool
> --
>
> Key: HADOOP-19015
> URL: https://issues.apache.org/jira/browse/HADOOP-19015
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.4.0
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>
> Getting errors in jobs which can be fixed by increasing this 
> 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
> s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
>  software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2023-12-19 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19015:
--

 Summary: Increase fs.s3a.connection.maximum to 500 to minimize 
risk of Timeout waiting for connection from pool
 Key: HADOOP-19015
 URL: https://issues.apache.org/jira/browse/HADOOP-19015
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Mukund Thakur


Getting errors in jobs which can be fixed by increasing this 
2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
java.lang.RuntimeException: java.io.IOException: 
org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
 software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
HTTP request: Timeout waiting for connection from pool at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
at 
org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-19015) Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool

2023-12-19 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-19015:
--

Assignee: Mukund Thakur

> Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting 
> for connection from pool
> --
>
> Key: HADOOP-19015
> URL: https://issues.apache.org/jira/browse/HADOOP-19015
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>
> Getting errors in jobs which can be fixed by increasing this 
> 2023-12-14 17:35:56,602 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> org.apache.hadoop.net.ConnectTimeoutException: getFileStatus on 
> s3a://aaa/cc-hive-jzv5y6/warehouse/tablespace/managed/hive/student/delete_delta_012_012_0001/bucket_1_0:
>  software.amazon.awssdk.core.exception.SdkClientException: Unable to execute 
> HTTP request: Timeout waiting for connection from pool   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:206)
>   at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:152)
>   at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:297)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:84)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:70)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:70)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:40)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptible



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19013) fs.getXattrs(path) for S3FS doesn't have x-amz-server-side-encryption-aws-kms-key-id header.

2023-12-14 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19013:
--

 Summary: fs.getXattrs(path) for S3FS doesn't have 
x-amz-server-side-encryption-aws-kms-key-id header.
 Key: HADOOP-19013
 URL: https://issues.apache.org/jira/browse/HADOOP-19013
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6
Reporter: Mukund Thakur
Assignee: Mukund Thakur


Once a path while uploading has been encrypted with SSE-KMS with a key id and 
then later when we try to read the attributes of the same file, it doesn't 
contain the key id information as an attribute. should we add it?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-11867) Add a high-performance vectored read API.

2023-10-25 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779666#comment-17779666
 ] 

Mukund Thakur commented on HADOOP-11867:


Hey [~yuanbo]  Vectored IO is intelligent. It merges the nearby ranges and thus 
reduces the number of outgoing HTTP calls to object storage. 

> Add a high-performance vectored read API.
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3, hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.

2023-10-13 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18929.

Fix Version/s: 3.3.9
   Resolution: Fixed

> Build failure while trying to create apache 3.3.7 release locally.
> --
>
> Key: HADOOP-18929
> URL: https://issues.apache.org/jira/browse/HADOOP-18929
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: PJ Fanning
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>
> {noformat}
> [ESC[1;34mINFOESC[m] ESC[1m---< 
> ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m 
> >ESC[m
> [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants 
> for Test 3.3.9-SNAPSHOT [105/111]ESC[m
> [ESC[1;34mINFOESC[m] ESC[1m[ pom 
> ]-ESC[m
> [ESC[1;34mINFOESC[m] 
> [ESC[1;34mINFOESC[m] ESC[1m--- 
> ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m 
> ESC[1m(enforce-banned-dependencies)ESC[m @ 
> ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m
> [ESC[1;34mINFOESC[m] Adding ignorable dependency: 
> org.apache.hadoop:hadoop-annotations:null
> [ESC[1;34mINFOESC[m]   Adding ignore: *
> [ESC[1;33mWARNINGESC[m] Rule 1: 
> org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
> Duplicate classes found:
>   Found in:
>     org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile
>     org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile
>   Duplicate classes:
>     META-INF/versions/9/module-info.class
> {noformat}
> CC [~ste...@apache.org]  [~weichu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.

2023-10-13 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-18929:
--

Assignee: PJ Fanning

> Build failure while trying to create apache 3.3.7 release locally.
> --
>
> Key: HADOOP-18929
> URL: https://issues.apache.org/jira/browse/HADOOP-18929
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: PJ Fanning
>Priority: Critical
>  Labels: pull-request-available
>
> {noformat}
> [ESC[1;34mINFOESC[m] ESC[1m---< 
> ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m 
> >ESC[m
> [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants 
> for Test 3.3.9-SNAPSHOT [105/111]ESC[m
> [ESC[1;34mINFOESC[m] ESC[1m[ pom 
> ]-ESC[m
> [ESC[1;34mINFOESC[m] 
> [ESC[1;34mINFOESC[m] ESC[1m--- 
> ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m 
> ESC[1m(enforce-banned-dependencies)ESC[m @ 
> ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m
> [ESC[1;34mINFOESC[m] Adding ignorable dependency: 
> org.apache.hadoop:hadoop-annotations:null
> [ESC[1;34mINFOESC[m]   Adding ignore: *
> [ESC[1;33mWARNINGESC[m] Rule 1: 
> org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
> Duplicate classes found:
>   Found in:
>     org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile
>     org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile
>   Duplicate classes:
>     META-INF/versions/9/module-info.class
> {noformat}
> CC [~ste...@apache.org]  [~weichu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18890) remove okhttp usage

2023-10-13 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17774965#comment-17774965
 ] 

Mukund Thakur commented on HADOOP-18890:


Yes. I see you have already merged. 

> remove okhttp usage
> ---
>
> Key: HADOOP-18890
> URL: https://issues.apache.org/jira/browse/HADOOP-18890
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, common
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> * relates to HADOOP-18496
> * simplifies the dependencies if hadoop doesn't use multiple 3rd party libs 
> to make http calls
> * okhttp brings in other dependencies like the kotlin runtime
> * hadoop already uses apache httpclient in some places



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.

2023-10-10 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773811#comment-17773811
 ] 

Mukund Thakur commented on HADOOP-18929:


Oh okay. A quick followup PR will do. Thanks

> Build failure while trying to create apache 3.3.7 release locally.
> --
>
> Key: HADOOP-18929
> URL: https://issues.apache.org/jira/browse/HADOOP-18929
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Priority: Critical
>
> {noformat}
> [ESC[1;34mINFOESC[m] ESC[1m---< 
> ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m 
> >ESC[m
> [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants 
> for Test 3.3.9-SNAPSHOT [105/111]ESC[m
> [ESC[1;34mINFOESC[m] ESC[1m[ pom 
> ]-ESC[m
> [ESC[1;34mINFOESC[m] 
> [ESC[1;34mINFOESC[m] ESC[1m--- 
> ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m 
> ESC[1m(enforce-banned-dependencies)ESC[m @ 
> ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m
> [ESC[1;34mINFOESC[m] Adding ignorable dependency: 
> org.apache.hadoop:hadoop-annotations:null
> [ESC[1;34mINFOESC[m]   Adding ignore: *
> [ESC[1;33mWARNINGESC[m] Rule 1: 
> org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
> Duplicate classes found:
>   Found in:
>     org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile
>     org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile
>   Duplicate classes:
>     META-INF/versions/9/module-info.class
> {noformat}
> CC [~ste...@apache.org]  [~weichu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-18895) upgrade to commons-compress 1.24.0 due to CVE

2023-10-10 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reopened HADOOP-18895:


> upgrade to commons-compress 1.24.0 due to CVE
> -
>
> Key: HADOOP-18895
> URL: https://issues.apache.org/jira/browse/HADOOP-18895
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Includes some important bug fixes including 
> https://lists.apache.org/thread/g9lrsz8j9nrgltcoc7v6cpkopg07czc9 - 
> CVE-2023-42503



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18895) upgrade to commons-compress 1.24.0 due to CVE

2023-10-10 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773793#comment-17773793
 ] 

Mukund Thakur commented on HADOOP-18895:


We need to revert this as it is causing 
https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel=17773753#comment-17773753
 

> upgrade to commons-compress 1.24.0 due to CVE
> -
>
> Key: HADOOP-18895
> URL: https://issues.apache.org/jira/browse/HADOOP-18895
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Includes some important bug fixes including 
> https://lists.apache.org/thread/g9lrsz8j9nrgltcoc7v6cpkopg07czc9 - 
> CVE-2023-42503



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.

2023-10-10 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773753#comment-17773753
 ] 

Mukund Thakur commented on HADOOP-18929:


Thanks, [~ayushtkn]  for checking quickly. Let me revert and try. 

> Build failure while trying to create apache 3.3.7 release locally.
> --
>
> Key: HADOOP-18929
> URL: https://issues.apache.org/jira/browse/HADOOP-18929
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Priority: Critical
>
> {noformat}
> [ESC[1;34mINFOESC[m] ESC[1m---< 
> ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m 
> >ESC[m
> [ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants 
> for Test 3.3.9-SNAPSHOT [105/111]ESC[m
> [ESC[1;34mINFOESC[m] ESC[1m[ pom 
> ]-ESC[m
> [ESC[1;34mINFOESC[m] 
> [ESC[1;34mINFOESC[m] ESC[1m--- 
> ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m 
> ESC[1m(enforce-banned-dependencies)ESC[m @ 
> ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m
> [ESC[1;34mINFOESC[m] Adding ignorable dependency: 
> org.apache.hadoop:hadoop-annotations:null
> [ESC[1;34mINFOESC[m]   Adding ignore: *
> [ESC[1;33mWARNINGESC[m] Rule 1: 
> org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
> Duplicate classes found:
>   Found in:
>     org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile
>     org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile
>   Duplicate classes:
>     META-INF/versions/9/module-info.class
> {noformat}
> CC [~ste...@apache.org]  [~weichu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18929) Build failure while trying to create apache 3.3.7 release locally.

2023-10-10 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-18929:
--

 Summary: Build failure while trying to create apache 3.3.7 release 
locally.
 Key: HADOOP-18929
 URL: https://issues.apache.org/jira/browse/HADOOP-18929
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.3.6
Reporter: Mukund Thakur


{noformat}
[ESC[1;34mINFOESC[m] ESC[1m---< 
ESC[0;36morg.apache.hadoop:hadoop-client-check-test-invariantsESC[0;1m 
>ESC[m
[ESC[1;34mINFOESC[m] ESC[1mBuilding Apache Hadoop Client Packaging Invariants 
for Test 3.3.9-SNAPSHOT [105/111]ESC[m
[ESC[1;34mINFOESC[m] ESC[1m[ pom 
]-ESC[m
[ESC[1;34mINFOESC[m] 
[ESC[1;34mINFOESC[m] ESC[1m--- 
ESC[0;32mmaven-enforcer-plugin:3.0.0-M1:enforceESC[m 
ESC[1m(enforce-banned-dependencies)ESC[m @ 
ESC[36mhadoop-client-check-test-invariantsESC[0;1m ---ESC[m
[ESC[1;34mINFOESC[m] Adding ignorable dependency: 
org.apache.hadoop:hadoop-annotations:null
[ESC[1;34mINFOESC[m]   Adding ignore: *
[ESC[1;33mWARNINGESC[m] Rule 1: 
org.apache.maven.plugins.enforcer.BanDuplicateClasses failed with message:
Duplicate classes found:


  Found in:
    org.apache.hadoop:hadoop-client-minicluster:jar:3.3.9-SNAPSHOT:compile
    org.apache.hadoop:hadoop-client-runtime:jar:3.3.9-SNAPSHOT:compile
  Duplicate classes:
    META-INF/versions/9/module-info.class

{noformat}
CC [~ste...@apache.org]  [~weichu] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18845) Add ability to configure ConnectionTTL of http connections while creating S3 Client.

2023-08-25 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18845.

Resolution: Fixed

> Add ability to configure ConnectionTTL of http connections while creating S3 
> Client.
> 
>
> Key: HADOOP-18845
> URL: https://issues.apache.org/jira/browse/HADOOP-18845
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.6
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18845) Add ability to configure ConnectionTTL of http connections while creating S3 Client.

2023-08-09 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-18845:
--

 Summary: Add ability to configure ConnectionTTL of http 
connections while creating S3 Client.
 Key: HADOOP-18845
 URL: https://issues.apache.org/jira/browse/HADOOP-18845
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6
Reporter: Mukund Thakur
Assignee: Mukund Thakur
 Fix For: 3.3.9






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18763) Upgrade aws-java-sdk to 1.12.367+

2023-06-14 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18763.

Fix Version/s: 3.3.6
   Resolution: Fixed

> Upgrade aws-java-sdk to 1.12.367+
> -
>
> Key: HADOOP-18763
> URL: https://issues.apache.org/jira/browse/HADOOP-18763
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.6
>
>
> aws sdk bundle < 1.12.367 uses a vulnerable versions of netty which is 
> pulling in high severity CVE and creating unhappiness in security scans, even 
> if s3a doesn't use that lib. 
> The safe version for netty is netty:4.1.86.Final and this is used by 
> aws-java-adk:1.12.367+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18763) Upgrade aws-java-sdk to 1.12.367+

2023-06-13 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-18763:
--

Assignee: Viraj Jasani  (was: Mukund Thakur)

> Upgrade aws-java-sdk to 1.12.367+
> -
>
> Key: HADOOP-18763
> URL: https://issues.apache.org/jira/browse/HADOOP-18763
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> aws sdk bundle < 1.12.367 uses a vulnerable versions of netty which is 
> pulling in high severity CVE and creating unhappiness in security scans, even 
> if s3a doesn't use that lib. 
> The safe version for netty is netty:4.1.86.Final and this is used by 
> aws-java-adk:1.12.367+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18763) Upgrade aws-java-sdk to 1.12.367+

2023-06-13 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-18763:
--

Assignee: Mukund Thakur

> Upgrade aws-java-sdk to 1.12.367+
> -
>
> Key: HADOOP-18763
> URL: https://issues.apache.org/jira/browse/HADOOP-18763
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
>
> aws sdk bundle < 1.12.367 uses a vulnerable versions of netty which is 
> pulling in high severity CVE and creating unhappiness in security scans, even 
> if s3a doesn't use that lib. 
> The safe version for netty is netty:4.1.86.Final and this is used by 
> aws-java-adk:1.12.367+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17852) ABFS: Test with 100MB buffer size in ITestAbfsReadWriteAndSeek times out

2023-05-24 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725998#comment-17725998
 ] 

Mukund Thakur commented on HADOOP-17852:


seeing this in one of our customer prod cluster.
{code:java}
"Executor task launch worker for task 329" #94 daemon prio=5 os_prio=0 
cpu=17344.66ms elapsed=2109.99s tid=0x7f7750026000 nid=0x6586 waiting on 
condition  [0x7f77414fa000]
   java.lang.Thread.State: WAITING (parking)
    at jdk.internal.misc.Unsafe.park(java.base@11.0.13/Native Method)
    - parking to wait for  <0x0006c14aea10> (a 
com.google.common.util.concurrent.TrustedListenableFutureTask)
    at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.13/LockSupport.java:194)
    at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:523)
    at 
com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:86)
    at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.waitForAppendsToComplete(AbfsOutputStream.java:602)
    - locked <0x000512a667c8> (a 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream)
    at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushWrittenBytesToService(AbfsOutputStream.java:621)
    - locked <0x000512a667c8> (a 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream)
    at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushInternal(AbfsOutputStream.java:536)
    - locked <0x000512a667c8> (a 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream)
    at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.close(AbfsOutputStream.java:495)
    - locked <0x000512a667c8> (a 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream)
    at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:76)
    at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
    at 
org.apache.parquet.hadoop.util.HadoopPositionOutputStream.close(HadoopPositionOutputStream.java:64)
    at 
org.apache.parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:829)
    at 
org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:122)
    at 
org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:165)
    at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetOutputWriter.scala:42)
    at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:57)
    at 
org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:74)
    at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:252)
    at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
    at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1368)
    at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:253)
    at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:174)
    at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:413)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.13/ThreadPoolExecutor.java:1128)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.13/ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(java.base@11.0.13/Thread.java:829) {code}
CC [~snvijaya]  [~ste...@apache.org] 

> ABFS: Test with 100MB buffer size in ITestAbfsReadWriteAndSeek times out 
> -
>
> Key: HADOOP-17852
> URL: https://issues.apache.org/jira/browse/HADOOP-17852
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Minor
>
> testReadAndWriteWithDifferentBufferSizesAndSeek with buffer size above 100 MB 
> is failing with timeout. It is delaying the whole test run by 15-30 mins. 
> [ERROR] 
> 

[jira] [Commented] (HADOOP-18637) S3A to support upload of files greater than 2 GB using DiskBlocks

2023-02-21 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691825#comment-17691825
 ] 

Mukund Thakur commented on HADOOP-18637:


As discussed offline, following changes will be required. 
 * Introduce a new config to disable multipart upload everywhere and enable 
just a large file upload.
 * Error in public S3AFS.createMultipartUploader based on above config.
 * Error in staging committer based on above config.
 * Error in magic committer based on above config.
 * Error in write operations helper based on above config. 
 * Add hasCapability(isMultiPartAllowed, path) use config.
 * If multipart upload is disabled we only upload via Disk. Add check for this.

> S3A to support upload of files greater than 2 GB using DiskBlocks
> -
>
> Key: HADOOP-18637
> URL: https://issues.apache.org/jira/browse/HADOOP-18637
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>
> Use S3A Diskblocks to support the upload of files greater than 2 GB using 
> DiskBlocks. Currently, the max upload size of a single block is ~2GB. 
> cc: [~mthakur] [~ste...@apache.org] [~mehakmeet] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18103) High performance vectored read API in Hadoop

2023-01-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18103.

Fix Version/s: 3.3.5
   (was: 3.4.0)
   Resolution: Fixed

> High performance vectored read API in Hadoop
> 
>
> Key: HADOOP-18103
> URL: https://issues.apache.org/jira/browse/HADOOP-18103
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: common, fs, fs/adl, fs/s3
>Affects Versions: 3.3.4
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: perfomance, pull-request-available
> Fix For: 3.3.5
>
> Attachments: Vectored Read API for Hadoop FS.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Add support for multiple ranged vectored read api in PositionedReadable. The 
> default iterates through the ranges to read each synchronously, but the 
> intent is that FSDataInputStream subclasses can make more efficient readers 
> especially object stores implementation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18507) VectorIO FileRange type to support a "reference" field

2023-01-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18507.

Fix Version/s: 3.3.5
   Resolution: Fixed

> VectorIO FileRange type to support a "reference" field
> --
>
> Key: HADOOP-18507
> URL: https://issues.apache.org/jira/browse/HADOOP-18507
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>
> to use in libraries, it is really good to be able to connect a FileRange back 
> to the application/library level structure (chunk/split data, usually). 
> Proposed: add an {{Object reference)) field which can be given arbitrary data 
> or null, and queried for by app. it is not used in the API at all



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing

2023-01-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18460:
---
Fix Version/s: (was: 3.4.0)

> ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
> -
>
> Key: HADOOP-18460
> URL: https://issues.apache.org/jira/browse/HADOOP-18460
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.3.5
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>
> seeing a test failure in both parallel and single test case runs of 
> {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-11867) Add a high-performance vectored read API.

2023-01-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reopened HADOOP-11867:


> Add a high-performance vectored read API.
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3, hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-11867) Add a high-performance vectored read API.

2023-01-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-11867.

Resolution: Fixed

> Add a high-performance vectored read API.
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3, hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18320) Improve S3A delegations token documentation

2023-01-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18320.

Fix Version/s: 3.3.9
   Resolution: Fixed

> Improve S3A delegations token documentation
> ---
>
> Key: HADOOP-18320
> URL: https://issues.apache.org/jira/browse/HADOOP-18320
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Ahmar Suhail
>Assignee: Ahmar Suhail
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current [delegations token 
> documentation|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/delegation_tokens.md]
>  has some typos, this task tracks fixing those. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-11867) Add a high-performance vectored read API.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-11867:
---
Fix Version/s: 3.3.5

> Add a high-performance vectored read API.
> -
>
> Key: HADOOP-11867
> URL: https://issues.apache.org/jira/browse/HADOOP-11867
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3, hdfs-client
>Affects Versions: 3.0.0
>Reporter: Gopal Vijayaraghavan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18104) Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18104:
---
Fix Version/s: 3.3.5

> Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads
> 
>
> Key: HADOOP-18104
> URL: https://issues.apache.org/jira/browse/HADOOP-18104
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, fs
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18107) Vectored IO support for large S3 files.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18107:
---
Fix Version/s: 3.3.5

> Vectored IO support for large S3 files. 
> 
>
> Key: HADOOP-18107
> URL: https://issues.apache.org/jira/browse/HADOOP-18107
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> This effort would mostly be adding more tests for large files under scale 
> tests and see if any new issue surfaces. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18105) Implement a variant of ElasticByteBufferPool which uses weak references for garbage collection.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18105:
---
Fix Version/s: 3.3.5

> Implement a variant of ElasticByteBufferPool which uses weak references for 
> garbage collection.
> ---
>
> Key: HADOOP-18105
> URL: https://issues.apache.org/jira/browse/HADOOP-18105
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, fs
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently in hadoop codebase, we have two classes which implements byte 
> buffers pooling.
> One is ElasticByteBufferPool which doesn't use weak references and thus could 
> cause memory leaks in production environment. 
> Other is DirectBufferPool which uses weak references but doesn't support 
> caller's preference for either on-heap or off-heap buffers. 
>  
> The idea is to create an improved version of ElasticByteBufferPool by 
> subclassing it ( as it is marked as public and stable and used widely in hdfs 
> ) with essential functionalities required for effective buffer pooling. This 
> is important for the parent Vectored IO work.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18106) Handle memory fragmentation in S3 Vectored IO implementation.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18106:
---
Fix Version/s: 3.3.5

> Handle memory fragmentation in S3 Vectored IO implementation.
> -
>
> Key: HADOOP-18106
> URL: https://issues.apache.org/jira/browse/HADOOP-18106
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> As we have implemented merging of ranges in the S3AInputStream implementation 
> of vectored IO api, it can lead to memory fragmentation. Let me explain by 
> example.
>  
> Suppose client requests for 3 ranges. 
> 0-500, 700-1000 and 1200-1500.
> Now because of merging, all the above ranges will get merged into one and we 
> will allocate a big byte buffer of 0-1500 size but return sliced byte buffers 
> for the desired ranges.
> Now once the client is done reading all the ranges, it will only be able to 
> free the memory for requested ranges and memory of the gaps will never be 
> released for eg here (500-700 and 1000-1200).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18227) Add input stream IOstats for vectored IO api in S3A.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18227:
---
Fix Version/s: 3.3.5
   (was: 3.4.0)

> Add input stream IOstats for vectored IO api in S3A.
> 
>
> Key: HADOOP-18227
> URL: https://issues.apache.org/jira/browse/HADOOP-18227
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18355) Update previous index properly while validating overlapping ranges.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18355:
---
Fix Version/s: 3.3.5
   (was: 3.4.0)

> Update previous index properly while validating overlapping ranges. 
> 
>
> Key: HADOOP-18355
> URL: https://issues.apache.org/jira/browse/HADOOP-18355
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://github.com/apache/hadoop/blob/a55ace7bc0c173f609b51e46cb0d4d8bcda3d79d/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java#L201]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18392) Propagate vectored s3a input stream stats to file system stats.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18392:
---
Fix Version/s: 3.3.5
   (was: 3.4.0)

> Propagate vectored s3a input stream stats to file system stats.
> ---
>
> Key: HADOOP-18392
> URL: https://issues.apache.org/jira/browse/HADOOP-18392
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18407) Improve vectored IO api spec.

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18407:
---
Fix Version/s: 3.3.5
   (was: 3.4.0)

> Improve vectored IO api spec. 
> --
>
> Key: HADOOP-18407
> URL: https://issues.apache.org/jira/browse/HADOOP-18407
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>
> Let's add more details to the vectored IO api spec for better clarity. 
>  * the position returned by getPos(); is undefined afterwards.
>  * note that if a file is changed during a read, the output is again 
> undefined. some ranges may be old data, some may be new, *and some may be both
>  * note that while reads are active, normal fs api calls may block.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18391) Improve VectoredReadUtils#readVectored() for direct buffers

2022-12-22 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18391:
---
Fix Version/s: 3.3.5
   (was: 3.4.0)

> Improve VectoredReadUtils#readVectored() for direct buffers
> ---
>
> Key: HADOOP-18391
> URL: https://issues.apache.org/jira/browse/HADOOP-18391
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>
> harden the VectoredReadUtils methods for consistent and more robust use, 
> especially in those filesystems which don't have the api.
> VectoredReadUtils.readInDirectBuffer should allocate a max buffer size, .e.g 
> 4mb, then do repeated reads and copies; this ensures that you don't OOM with 
> many threads doing ranged requests. other libs do this.
> readVectored to call validateNonOverlappingAndReturnSortedRanges before 
> iterating
> this ensures the abfs/s3a requirements are always met, and that because 
> ranges will be read in order, prefetching by other clients will keep their 
> performance good.
> readVectored to add special handling for 0 byte ranges



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18073) Upgrade AWS SDK to v2

2022-12-12 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17646318#comment-17646318
 ] 

Mukund Thakur commented on HADOOP-18073:


Looks good to me. Please re-run all the tests here 
[https://github.com/ahmarsuhail/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/ITestS3AContractVectoredRead.java]
  just to be sure. 

Also think about this https://issues.apache.org/jira/browse/HADOOP-17338 . An 
old related issue as the response of getObject has changed.

> Upgrade AWS SDK to v2
> -
>
> Key: HADOOP-18073
> URL: https://issues.apache.org/jira/browse/HADOOP-18073
> Project: Hadoop Common
>  Issue Type: Task
>  Components: auth, fs/s3
>Affects Versions: 3.3.1
>Reporter: xiaowei sun
>Assignee: Ahmar Suhail
>Priority: Major
>  Labels: pull-request-available
> Attachments: Upgrading S3A to SDKV2.pdf
>
>
> This task tracks upgrading Hadoop's AWS connector S3A from AWS SDK for Java 
> V1 to AWS SDK for Java V2.
> Original use case:
> {quote}We would like to access s3 with AWS SSO, which is supported in 
> software.amazon.awssdk:sdk-core:2.*.
> In particular, from 
> [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html],
>  when to set 'fs.s3a.aws.credentials.provider', it must be 
> "com.amazonaws.auth.AWSCredentialsProvider". We would like to support 
> "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which 
> supports AWS SSO, so users only need to authenticate once.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18311) Upgrade dependencies to address several CVEs

2022-11-29 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18311:
---
Target Version/s: 3.3.9

[~svaughan]  Are we still planning to do this for 3.3.5 release as we will be 
releasing that in a week.

> Upgrade dependencies to address several CVEs
> 
>
> Key: HADOOP-18311
> URL: https://issues.apache.org/jira/browse/HADOOP-18311
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.3.3, 3.3.4
>Reporter: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The following CVEs can be addressed by upgrading dependencies within the 
> build.  This includes a replacement of HTrace with a noop implementation.
>  * CVE-2018-7489
>  * CVE-2020-10663
>  * CVE-2020-28491
>  * CVE-2020-35490
>  * CVE-2020-35491
>  * CVE-2020-36518
>  * PRISMA-2021-0182
> This addresses all of the CVEs from 3.3.3 except for ones that would require 
> upgrading Netty to 4.x.  I'll be submitting a pull request for 3.3.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18311) Upgrade dependencies to address several CVEs

2022-11-29 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18311:
---
Fix Version/s: (was: 3.3.5)

> Upgrade dependencies to address several CVEs
> 
>
> Key: HADOOP-18311
> URL: https://issues.apache.org/jira/browse/HADOOP-18311
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.3.3, 3.3.4
>Reporter: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The following CVEs can be addressed by upgrading dependencies within the 
> build.  This includes a replacement of HTrace with a noop implementation.
>  * CVE-2018-7489
>  * CVE-2020-10663
>  * CVE-2020-28491
>  * CVE-2020-35490
>  * CVE-2020-35491
>  * CVE-2020-36518
>  * PRISMA-2021-0182
> This addresses all of the CVEs from 3.3.3 except for ones that would require 
> upgrading Netty to 4.x.  I'll be submitting a pull request for 3.3.4.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18408) [ABFS]: ITestAbfsManifestCommitProtocol fails on nonHNS configuration

2022-11-29 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18408.

Resolution: Fixed

> [ABFS]: ITestAbfsManifestCommitProtocol  fails on nonHNS configuration
> --
>
> Key: HADOOP-18408
> URL: https://issues.apache.org/jira/browse/HADOOP-18408
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Reporter: Pranav Saxena
>Assignee: Sree Bhattacharyya
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>
> ITestAbfsRenameStageFailure fails for NonHNS-SharedKey configuration.
> Failure:
> [ERROR] 
> ITestAbfsRenameStageFailure>TestRenameStageFailure.testResilienceAsExpected:126
>  [resilient commit support] expected:<[tru]e> but was:<[fals]e>
> RCA:
> ResilientCommit looks for whether etags are preserved in rename, if not then 
> it throws an exception and the flag for resilientCommitByRename stays null, 
> leading ultimately to the test failure
> Mitigation:
> Since, etags are not preserved in the case of rename in nonHNS account, 
> required value for rename resilience should be False, as resilient commits 
> cannot be made. Thus, requiring a True value for requireRenameResilience for 
> nonHNS account is not a valid case. Hence, as part of this task, we shall set 
> correct value of False for requireRenameResilience for nonHNS account.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-7370) Optimize pread on ChecksumFileSystem

2022-11-29 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17640787#comment-17640787
 ] 

Mukund Thakur edited comment on HADOOP-7370 at 11/29/22 3:40 PM:
-

Vectored IO superceedes this feature. 
https://issues.apache.org/jira/browse/HADOOP-18103


was (Author: mthakur):
Vectored IO superceedes this feature. 

> Optimize pread on ChecksumFileSystem
> 
>
> Key: HADOOP-7370
> URL: https://issues.apache.org/jira/browse/HADOOP-7370
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: checksumfs-pread-0.20.txt
>
>
> Currently the implementation of positional read in ChecksumFileSystem is 
> verify inefficient - it actually re-opens the underlying file and checksum 
> file, then seeks and uses normal read. Instead, it can push down positional 
> read directly to the underlying FS and verify checksum.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-7370) Optimize pread on ChecksumFileSystem

2022-11-29 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-7370.
---
Resolution: Won't Fix

Vectored IO superceedes this feature. 

> Optimize pread on ChecksumFileSystem
> 
>
> Key: HADOOP-7370
> URL: https://issues.apache.org/jira/browse/HADOOP-7370
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: checksumfs-pread-0.20.txt
>
>
> Currently the implementation of positional read in ChecksumFileSystem is 
> verify inefficient - it actually re-opens the underlying file and checksum 
> file, then seeks and uses normal read. Instead, it can push down positional 
> read directly to the underlying FS and verify checksum.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18482) ITestS3APrefetchingInputStream does not skip if no CSV test file available

2022-10-19 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620619#comment-17620619
 ] 

Mukund Thakur commented on HADOOP-18482:


No this is not required for 3.3.5 as prefetching is not in 3.3.5. 

> ITestS3APrefetchingInputStream does not skip if no CSV test file available
> --
>
> Key: HADOOP-18482
> URL: https://issues.apache.org/jira/browse/HADOOP-18482
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.4
>Reporter: Daniel Carl Jones
>Assignee: Daniel Carl Jones
>Priority: Minor
>  Labels: pull-request-available
>
> We should use S3ATestUtils.getCSVTestFile(conf) to skip if the property is 
> empty (single space).
> Today, when I set _fs.s3a.scale.test.csvfile_ to empty space, all but this 
> test that rely on the file are skipped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing

2022-09-29 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611265#comment-17611265
 ] 

Mukund Thakur commented on HADOOP-18460:


{code:java}
/**
 * Get the s3 object for S3 server for a specified range.
 * Also checks if the vectored io operation has been stopped before and after
 * the http get request such that we don't waste time populating the buffers.
 * @param operationName name of the operation for which get object on S3 is 
called.
 * @param position position of the object to be read from S3.
 * @param length length from position of the object to be read from S3.
 * @return result s3 object.
 * @throws IOException exception if any.
 */
private S3Object getS3ObjectAndValidateNotNull(final String operationName,
   final long position,
   final int length) throws 
IOException {
  checkIfVectoredIOStopped();
  S3Object objectRange = getS3Object(operationName, position, length);
  if (objectRange.getObjectContent() == null) {
throw new PathIOException(uri,
"Null IO stream received during " + operationName);
  }
  checkIfVectoredIOStopped();
  return objectRange;
} {code}
We I made this change but while making I think there is one issue in this: 
Suppose we interrupt after getting the stream from S3, we will never be closing 
the S3Object thus leading to memory leak ? 

> ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
> -
>
> Key: HADOOP-18460
> URL: https://issues.apache.org/jira/browse/HADOOP-18460
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>
> seeing a test failure in both parallel and single test case runs of 
> {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18463) Add an integration test to process data asynchronously during vectored read.

2022-09-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18463.

Fix Version/s: 3.3.5
   Resolution: Fixed

> Add an integration test to process data asynchronously during vectored read.
> 
>
> Key: HADOOP-18463
> URL: https://issues.apache.org/jira/browse/HADOOP-18463
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.5
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure

2022-09-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18347.

Fix Version/s: 3.3.5
   Resolution: Fixed

> Restrict vectoredIO threadpool to reduce memory pressure
> 
>
> Key: HADOOP-18347
> URL: https://issues.apache.org/jira/browse/HADOOP-18347
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, fs, fs/adl, fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: performance, pull-request-available
> Fix For: 3.3.5
>
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967
> Currently, it fetches all the ranges with unbounded threadpool. This will not 
> cause memory pressures with standard benchmarks like TPCDS. However, when 
> large number of ranges are present with large files, this could potentially 
> spike up memory usage of the task. Limiting the threadpool size could reduce 
> the memory usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18470) Release hadoop 3.3.5

2022-09-27 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610146#comment-17610146
 ] 

Mukund Thakur commented on HADOOP-18470:


https://github.com/apache/hadoop/tree/branch-3.3.5

> Release hadoop 3.3.5
> 
>
> Key: HADOOP-18470
> URL: https://issues.apache.org/jira/browse/HADOOP-18470
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: build
>Affects Versions: 3.3.5
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18470) Release hadoop 3.3.5

2022-09-27 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-18470:
--

Assignee: Mukund Thakur

> Release hadoop 3.3.5
> 
>
> Key: HADOOP-18470
> URL: https://issues.apache.org/jira/browse/HADOOP-18470
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18470) Release hadoop 3.3.5

2022-09-27 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-18470:
--

 Summary: Release hadoop 3.3.5
 Key: HADOOP-18470
 URL: https://issues.apache.org/jira/browse/HADOOP-18470
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Mukund Thakur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing

2022-09-23 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608894#comment-17608894
 ] 

Mukund Thakur commented on HADOOP-18460:


Actually no not over there but here 
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L927]
 and 

[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1034]

The reason for not putting inside populateBuffer() is the buffer allocation has 
already been done and it won't be released if we throw interrupted exception 
while populating the buffer. 

> ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
> -
>
> Key: HADOOP-18460
> URL: https://issues.apache.org/jira/browse/HADOOP-18460
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>
> seeing a test failure in both parallel and single test case runs of 
> {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18463) Add an integration test process data asynchronously during vectored read.

2022-09-21 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-18463:
--

 Summary: Add an integration test process data asynchronously 
during vectored read.
 Key: HADOOP-18463
 URL: https://issues.apache.org/jira/browse/HADOOP-18463
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Mukund Thakur
Assignee: Mukund Thakur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18455) s3a prefetching Executor should be closed

2022-09-21 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18455.

Resolution: Fixed

> s3a prefetching Executor should be closed
> -
>
> Key: HADOOP-18455
> URL: https://issues.apache.org/jira/browse/HADOOP-18455
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> This is the follow-up work for HADOOP-18186. The new executor service we use 
> for s3a prefetching should be closed while shutting down the file system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18455) s3a prefetching Executor should be closed

2022-09-21 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607936#comment-17607936
 ] 

Mukund Thakur commented on HADOOP-18455:


merged into trunk as of now. 

> s3a prefetching Executor should be closed
> -
>
> Key: HADOOP-18455
> URL: https://issues.apache.org/jira/browse/HADOOP-18455
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>
> This is the follow-up work for HADOOP-18186. The new executor service we use 
> for s3a prefetching should be closed while shutting down the file system.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing

2022-09-20 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607393#comment-17607393
 ] 

Mukund Thakur commented on HADOOP-18460:


I think adding checkIfVectoredIOStopped() at  
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1057]
 should fix the issue. 

> ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
> -
>
> Key: HADOOP-18460
> URL: https://issues.apache.org/jira/browse/HADOOP-18460
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>
> seeing a test failure in both parallel and single test case runs of 
> {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-18460) ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing

2022-09-20 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17607393#comment-17607393
 ] 

Mukund Thakur edited comment on HADOOP-18460 at 9/20/22 8:40 PM:
-

Unable to reproduce this consistently but I think adding 
checkIfVectoredIOStopped() at  
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1057]
 should fix the issue. 


was (Author: mthakur):
I think adding checkIfVectoredIOStopped() at  
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L1057]
 should fix the issue. 

> ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer failing
> -
>
> Key: HADOOP-18460
> URL: https://issues.apache.org/jira/browse/HADOOP-18460
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Major
>
> seeing a test failure in both parallel and single test case runs of 
> {{ITestS3AContractVectoredRead.testStopVectoredIoOperationsUnbuffer))



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.

2022-09-09 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602464#comment-17602464
 ] 

Mukund Thakur commented on HADOOP-18439:


pushed to branch-3.3

> Fix VectoredIO for LocalFileSystem when checksum is enabled.
> 
>
> Key: HADOOP-18439
> URL: https://issues.apache.org/jira/browse/HADOOP-18439
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 3.3.9
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
>
> While merging the ranges in CheckSumFs, they are rounded up based on the 
> value of checksum bytes size
> which leads to some ranges crossing the EOF thus they need to be fixed else 
> it will cause EOFException during actual reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.

2022-09-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18439.

Resolution: Fixed

> Fix VectoredIO for LocalFileSystem when checksum is enabled.
> 
>
> Key: HADOOP-18439
> URL: https://issues.apache.org/jira/browse/HADOOP-18439
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 3.3.9
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>
> While merging the ranges in CheckSumFs, they are rounded up based on the 
> value of checksum bytes size
> which leads to some ranges crossing the EOF thus they need to be fixed else 
> it will cause EOFException during actual reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18439) Fix VectoredIO for LocalFileSystem when checksum is enabled.

2022-09-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur updated HADOOP-18439:
---
Fix Version/s: 3.3.9

> Fix VectoredIO for LocalFileSystem when checksum is enabled.
> 
>
> Key: HADOOP-18439
> URL: https://issues.apache.org/jira/browse/HADOOP-18439
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common
>Affects Versions: 3.3.9
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9
>
>
> While merging the ranges in CheckSumFs, they are rounded up based on the 
> value of checksum bytes size
> which leads to some ranges crossing the EOF thus they need to be fixed else 
> it will cause EOFException during actual reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18347) Restrict vectoredIO threadpool to reduce memory pressure

2022-09-09 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur reassigned HADOOP-18347:
--

Assignee: Mukund Thakur

> Restrict vectoredIO threadpool to reduce memory pressure
> 
>
> Key: HADOOP-18347
> URL: https://issues.apache.org/jira/browse/HADOOP-18347
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, fs, fs/adl, fs/s3
>Reporter: Rajesh Balamohan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: performance
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L964-L967
> Currently, it fetches all the ranges with unbounded threadpool. This will not 
> cause memory pressures with standard benchmarks like TPCDS. However, when 
> large number of ranges are present with large files, this could potentially 
> spike up memory usage of the task. Limiting the threadpool size could reduce 
> the memory usage.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18447) Vectored IO: Threadpool should be closed on interrupts or during close calls

2022-09-08 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601976#comment-17601976
 ] 

Mukund Thakur commented on HADOOP-18447:


Currently the threadpool is shared unbounded one but will be moved bounded one. 

But we are terminating the running vectored IO operations when the stream is 
closed 
[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L114]
 

https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInputStream.java#L611

> Vectored IO: Threadpool should be closed on interrupts or during close calls
> 
>
> Key: HADOOP-18447
> URL: https://issues.apache.org/jira/browse/HADOOP-18447
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: common, fs, fs/adl, fs/s3
>Reporter: Rajesh Balamohan
>Priority: Major
>  Labels: performance, stability
> Attachments: Screenshot 2022-09-08 at 9.22.07 AM.png
>
>
> Vectored IO threadpool should be closed on any interrupts or during 
> S3AFileSystem/S3AInputStream close() calls.
> E.g Query which got cancelled in the middle of the run. However, in 
> background (e.g LLAP) vectored IO threads continued to run.
>  
> !Screenshot 2022-09-08 at 9.22.07 AM.png|width=537,height=164!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   4   >