[jira] [Commented] (HADOOP-19323) ABFS: Add a new API in AzureBlobFileSystem to allow listing with startFrom

2024-11-07 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-19323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17896281#comment-17896281
 ] 

Sneha Vijayarajan commented on HADOOP-19323:


Hi [~csun] , The logic in the inner class is not reliable and hence should not 
be made public. Thanks for bringing this up, from Azure Storage we will look 
into this ask for a reliable solution.

Had a query though. As HDFS FileSystem base class does not have a 
listStatus(final Path f, final String startFrom) API, how can this API be 
called ? Can the AzureBlobFileSystem public API be directly called ?

> ABFS: Add a new API in AzureBlobFileSystem to allow listing with startFrom
> --
>
> Key: HADOOP-19323
> URL: https://issues.apache.org/jira/browse/HADOOP-19323
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>
> Currently the API to list from a certain starting point is hidden inside 
> {{{}AzureBlobFileSystemStore{}}}, and is hard to access. It'd be better to 
> surface this up to places like {{AzureBlobFileSystem}} to make it easier to 
> use. 
>  
> The API is useful in scenarios such as Delta Lake where transaction logs are 
> always indexed sequentially. With this API, the listing in Delta no longer 
> needs to listing the whole {{_delta_log}} directory but only a small subset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19179) ABFS: Support FNS Accounts over BlobEndpoint

2024-06-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-19179:
---
Attachment: Design Document for FNS Blob Support in ABFS Driver.pdf

> ABFS: Support FNS Accounts over BlobEndpoint
> 
>
> Key: HADOOP-19179
> URL: https://issues.apache.org/jira/browse/HADOOP-19179
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
> Attachments: Design Document for FNS Blob Support in ABFS Driver.pdf
>
>
> As a pre-requisite to deprecating WASB Driver, ABFS Driver will need to match 
> FNS account support as intended by WASB driver. This will provide an official 
> migrating means for customers still using the legacy driver to ABFS Driver. 
>  
> Parent Jira for WASB deprecation: [HADOOP-19178] WASB Driver Deprecation and 
> eventual removal - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19179) ABFS: Support FNS Accounts over BlobEndpoint

2024-06-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-19179:
---
Attachment: (was: Design Document for FNS Blob Support in ABFS 
Driver.pdf)

> ABFS: Support FNS Accounts over BlobEndpoint
> 
>
> Key: HADOOP-19179
> URL: https://issues.apache.org/jira/browse/HADOOP-19179
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
> Attachments: Design Document for FNS Blob Support in ABFS Driver.pdf
>
>
> As a pre-requisite to deprecating WASB Driver, ABFS Driver will need to match 
> FNS account support as intended by WASB driver. This will provide an official 
> migrating means for customers still using the legacy driver to ABFS Driver. 
>  
> Parent Jira for WASB deprecation: [HADOOP-19178] WASB Driver Deprecation and 
> eventual removal - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19179) ABFS: Support FNS Accounts over BlobEndpoint

2024-06-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-19179:
---
Attachment: Design Document for FNS Blob Support in ABFS Driver.pdf

> ABFS: Support FNS Accounts over BlobEndpoint
> 
>
> Key: HADOOP-19179
> URL: https://issues.apache.org/jira/browse/HADOOP-19179
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
> Attachments: Design Document for FNS Blob Support in ABFS Driver.pdf
>
>
> As a pre-requisite to deprecating WASB Driver, ABFS Driver will need to match 
> FNS account support as intended by WASB driver. This will provide an official 
> migrating means for customers still using the legacy driver to ABFS Driver. 
>  
> Parent Jira for WASB deprecation: [HADOOP-19178] WASB Driver Deprecation and 
> eventual removal - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19179) ABFS: Support FNS Accounts over BlobEndpoint

2024-05-15 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-19179:
--

 Summary: ABFS: Support FNS Accounts over BlobEndpoint
 Key: HADOOP-19179
 URL: https://issues.apache.org/jira/browse/HADOOP-19179
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.5.0, 3.4.1


As a pre-requisite to deprecating WASB Driver, ABFS Driver will need to match 
FNS account support as intended by WASB driver. This will provide an official 
migrating means for customers still using the legacy driver to ABFS Driver. 

 

Parent Jira for WASB deprecation: [HADOOP-19178] WASB Driver Deprecation and 
eventual removal - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19178) WASB Driver Deprecation and eventual removal

2024-05-15 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-19178:
---
Description: 
*WASB Driver*

WASB driver was developed to support FNS (FlatNameSpace) Azure Storage 
accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder operations 
hence are mimicked at client side by WASB driver and certain folder operations 
like Rename and Delete can lead to lot of IOPs with client-side enumeration and 
orchestration of rename/delete operation blob by blob. It was not ideal for 
other APIs too as initial checks for path is a file or folder needs to be done 
over multiple metadata calls. These led to a degraded performance.

To provide better service to Analytics customers, Microsoft released ADLS Gen2 
which are HNS (Hierarchical Namespace) , i.e File-Folder aware store. ABFS 
driver was designed to overcome the inherent deficiencies of WASB and customers 
were informed to migrate to ABFS driver.

*Customers who still use the legacy WASB driver and the challenges they face* 

Some of our customers have not migrated to the ABFS driver yet and continue to 
use the legacy WASB driver with FNS accounts.  

These customers face the following challenges: 
 * They cannot leverage the optimizations and benefits of the ABFS driver.
 * They need to deal with the compatibility issues should the files and folders 
were modified with the legacy WASB driver and the ABFS driver concurrently in a 
phased transition situation.
 * There are differences for supported features for FNS and HNS over ABFS Driver
 * In certain cases, they must perform a significant amount of re-work on their 
workloads to migrate to the ABFS driver, which is available only on HNS enabled 
accounts in a fully tested and supported scenario.

*Deprecation plans for WASB*

We are introducing a new feature that will enable the ABFS driver to support 
FNS accounts (over BlobEndpoint) using the ABFS scheme. This feature will 
enable customers to use the ABFS driver to interact with data stored in GPv2 
(General Purpose v2) storage accounts. 

With this feature, the customers who still use the legacy WASB driver will be 
able to migrate to the ABFS driver without much re-work on their workloads. 
They will however need to change the URIs from the WASB scheme to the ABFS 
scheme. 

Once ABFS driver has built FNS support capability to migrate WASB customers, 
WASB driver will be declared deprecated in OSS documentation and marked for 
removal in next major release. This will remove any ambiguity for new customer 
onboards as there will be only one Microsoft driver for Azure Storage and 
migrating customers will get SLA bound support for driver and service, which 
was not guaranteed over WASB.

 We anticipate that this feature will serve as a stepping stone for customers 
to move to HNS enabled accounts with the ABFS driver, which is our recommended 
stack for big data analytics on ADLS Gen2. 

*Any Impact for* *existing customers who are using ADLS Gen2 (HNS enabled 
account) with ABFS driver* *?*

This feature does not impact the existing customers who are using ADLS Gen2 
(HNS enabled account) with ABFS driver.

They do not need to make any changes to their workloads or configurations. They 
will still enjoy the benefits of HNS, such as atomic operations, fine-grained 
access control, scalability, and performance. 

*Official recommendation*

Microsoft continues to recommend all Big Data and Analytics customers to use 
Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to 
optimize this scenario in future, we believe that this new option will help all 
those customers to transition to a supported scenario immediately, while they 
plan to ultimately move to ADLS Gen2 (HNS enabled account).

 *New Authentication options that a WASB to ABFS Driver migrating customer will 
get*

Below auth types that WASB provides will continue to work on the new FNS over 
ABFS Driver over configuration that accepts these SAS types (similar to WASB)
 * SharedKey
 * Account SAS
 * Service/Container SAS

Below authentication types that were not supported by WASB driver but supported 
by ABFS driver will continue to be available for new FNS over ABFS Driver
 * OAuth 2.0 Client Credentials
 * OAuth 2.0: Refresh Token
 * Azure Managed Identity
 * Custom OAuth 2.0 Token Provider

ABFS Driver SAS Token Provider plugin present today for UserDelegation SAS and 
Directly SAS will continue to work only for HNS accounts.

  was:
*WASB Driver*

WASB driver was developed to support FNS (FlatNameSpace) Azure Storage 
accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder operations 
hence are mimicked at client side by WASB driver and certain folder operations 
like Rename and Delete can lead to lot of IOPs with client-side enumeration and 
orchestration of rename/delete operation blob by blob. It was not id

[jira] [Created] (HADOOP-19178) WASB Driver Deprecation and eventual removal

2024-05-15 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-19178:
--

 Summary: WASB Driver Deprecation and eventual removal
 Key: HADOOP-19178
 URL: https://issues.apache.org/jira/browse/HADOOP-19178
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.4.1


*WASB Driver*

WASB driver was developed to support FNS (FlatNameSpace) Azure Storage 
accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder operations 
hence are mimicked at client side by WASB driver and certain folder operations 
like Rename and Delete can lead to lot of IOPs with client-side enumeration and 
orchestration of rename/delete operation blob by blob. It was not ideal for 
other APIs too as initial checks for path is a file or folder needs to be done 
over multiple metadata calls. These led to a degraded performance.

 

To provide better service to Analytics customers, Microsoft released ADLS Gen2 
which are HNS (Hierarchical Namespace) , i.e File-Folder aware store. ABFS 
driver was designed to overcome the inherent deficiencies of WASB and customers 
were informed to migrate to ABFS driver.

 

*Customers who still use the legacy WASB driver and the challenges they face* 

Some of our customers have not migrated to the ABFS driver yet and continue to 
use the legacy WASB driver with FNS accounts.  

These customers face the following challenges: 
 *  They cannot leverage the optimizations and benefits of the ABFS driver.
 *  They need to deal with the compatibility issues should the files and 
folders were modified with the legacy WASB driver and the ABFS driver 
concurrently in a phased transition situation.
 *  There are differences for supported features for FNS and HNS over ABFS 
Driver
 *  In certain cases, they must perform a significant amount of re-work on 
their workloads to migrate to the ABFS driver, which is available only on HNS 
enabled accounts in a fully tested and supported scenario.

 ** 

*Deprecation plans for WASB* 

We are introducing a new feature that will enable the ABFS driver to support 
FNS accounts (over BlobEndpoint) using the ABFS scheme. This feature will 
enable customers to use the ABFS driver to interact with data stored in GPv2 
(General Purpose v2) storage accounts. 

With this feature, the customers who still use the legacy WASB driver will be 
able to migrate to the ABFS driver without much re-work on their workloads. 
They will however need to change the URIs from the WASB scheme to the ABFS 
scheme. 

Once ABFS driver has built FNS support capability to migrate WASB customers, 
WASB driver will be declared deprecated in OSS documentation and marked for 
removal in next major release. This will remove any ambiguity for new customer 
onboards as there will be only one Microsoft driver for Azure Storage and 
migrating customers will get SLA bound support for driver and service, which 
was not guaranteed over WASB.

 We anticipate that this feature will serve as a stepping stone for customers 
to move to HNS enabled accounts with the ABFS driver, which is our recommended 
stack for big data analytics on ADLS Gen2. 

*Any Impact for* *existing customers who are using ADLS Gen2 (HNS enabled 
account) with ABFS driver* *?*

This feature does not impact the existing customers who are using ADLS Gen2 
(HNS enabled account) with ABFS driver. 

They do not need to make any changes to their workloads or configurations. They 
will still enjoy the benefits of HNS, such as atomic operations, fine-grained 
access control, scalability, and performance. 

*Official recommendation*

Microsoft continues to recommend all Big Data and Analytics customers to use 
Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to 
optimize this scenario in future, we believe that this new option will help all 
those customers to transition to a supported scenario immediately, while they 
plan to ultimately move to ADLS Gen2 (HNS enabled account).

 *New Authentication options that a WASB to ABFS Driver migrating customer will 
get*

Below auth types that WASB provides will continue to work on the new FNS over 
ABFS Driver over configuration that accepts these SAS types (similar to WASB)
 * SharedKey
 * Account SAS
 * Service/Container SAS

Below authentication types that were not supported by WASB driver but supported 
by ABFS driver will continue to be available for new FNS over ABFS Driver
 * OAuth 2.0 Client Credentials
 * OAuth 2.0: Refresh Token
 * Azure Managed Identity
 * Custom OAuth 2.0 Token Provider

 

ABFS Driver SAS Token Provider plugin present today for UserDelegation SAS and 
Directly SAS will continue to work only for HNS accounts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Assigned] (HADOOP-18610) ABFS OAuth2 Token Provider to support Azure Workload Identity for AKS

2024-04-30 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan reassigned HADOOP-18610:
--

Assignee: Anuj Modi

> ABFS OAuth2 Token Provider to support Azure Workload Identity for AKS
> -
>
> Key: HADOOP-18610
> URL: https://issues.apache.org/jira/browse/HADOOP-18610
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 3.3.4
>Reporter: Haifeng Chen
>Assignee: Anuj Modi
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HADOOP-18610-preview.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> In Jan 2023, Microsoft Azure AKS replaced its original pod-managed identity 
> with with [Azure Active Directory (Azure AD) workload 
> identities|https://learn.microsoft.com/en-us/azure/active-directory/develop/workload-identities-overview]
>  (preview), which integrate with the Kubernetes native capabilities to 
> federate with any external identity providers. This approach is simpler to 
> use and deploy.
> Refer to 
> [https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview|https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview.]
>  and [https://azure.github.io/azure-workload-identity/docs/introduction.html] 
> for more details.
> The basic use scenario is to access Azure cloud resources (such as cloud 
> storage) from Kubernetes (such as AKS) workload using Azure managed identity 
> federated with Kubernetes service account. The credential environment 
> variables in pod projected by Azure AD workload identity are like following:
> AZURE_AUTHORITY_HOST: (Injected by the webhook, 
> [https://login.microsoftonline.com/])
> AZURE_CLIENT_ID: (Injected by the webhook)
> AZURE_TENANT_ID: (Injected by the webhook)
> AZURE_FEDERATED_TOKEN_FILE: (Injected by the webhook, 
> /var/run/secrets/azure/tokens/azure-identity-token)
> The token in the file pointed by AZURE_FEDERATED_TOKEN_FILE is a JWT (JASON 
> Web Token) client assertion token which we can use to request to 
> AZURE_AUTHORITY_HOST (url is  AZURE_AUTHORITY_HOST + tenantId + 
> "/oauth2/v2.0/token")  for a AD token which can be used to directly access 
> the Azure cloud resources.
> This approach is very common and similar among cloud providers such as AWS 
> and GCP. Hadoop AWS integration has WebIdentityTokenCredentialProvider to 
> handle the same case.
> The existing MsiTokenProvider can only handle the managed identity associated 
> with Azure VM instance. We need to implement a WorkloadIdentityTokenProvider 
> which handle Azure Workload Identity case. For this, we need to add one 
> method (getTokenUsingJWTAssertion) in AzureADAuthenticator which will be used 
> by WorkloadIdentityTokenProvider.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17872) ABFS: Refactor read flow to include ReadRequestParameter

2022-10-11 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615651#comment-17615651
 ] 

Sneha Vijayarajan commented on HADOOP-17872:


This Jira work is deferred to next release. 

> ABFS: Refactor read flow to include ReadRequestParameter
> 
>
> Key: HADOOP-17872
> URL: https://issues.apache.org/jira/browse/HADOOP-17872
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This Jira is to facilitate upcoming work as part of adding an alternate 
> connection :
>  HADOOP-17853 ABFS: Enable optional store connectivity over azure specific 
> protocol for data egress - ASF JIRA (apache.org)
> The scope of the change is to introduce a ReadRequestParameter that will 
> include the various inputs needed for the read request to AbfsClient class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17767) ABFS: Improve test scripts

2022-10-11 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17767:
---
Status: Patch Available  (was: Open)

> ABFS: Improve test scripts
> --
>
> Key: HADOOP-17767
> URL: https://issues.apache.org/jira/browse/HADOOP-17767
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Current test run scripts need manual update across all combinations in 
> runTests.sh for account name and is working off a single azure-auth-keys.xml 
> file. While having to test across accounts that span various geo, the config 
> file grows big and also needs a manual change for configs such as 
> fs.contract.test.[abfs/abfss] which has to be uniquely set. To use the script 
> across various combinations, dev to be aware of the names of all the 
> combinations defined in runTests.sh as well.
>  
> These concerns are addressed in the new version of the scripts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17015) ABFS: Make PUT and POST operations idempotent

2022-09-27 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17015:
---
Description: 
Initially changes were made as part of this PR to handle idempotency including 
rename operation with the understanding that last modified time gets updated. 
But that assumption was wrong and the rename idempotency handling has since 
evolved.


For a job clean up, if the Manifest Committer in below Jira is used, then 
rename idempotency works using the previously fetched etag :

[HADOOP-18163] hadoop-azure support for the Manifest Committer of 
MAPREDUCE-7341 - ASF JIRA (apache.org)

 

A part of the commit tracked under current Jira to handle DELETE idempotency is 
still relevant.

A means to handle idempotency between driver and backend inherently is being 
worked upon.

-- Older notes


Currently when a PUT or POST operation timeouts and the server has already 
successfully executed the operation, there is no check in driver to see if the 
operation did succeed or not and just retries the same operation again. This 
can cause driver to through invalid user errors.

 

Sample scenario:
 # Rename request times out. Though server has successfully executed the 
operation.
 # Driver retries rename and get source not found error.

In the scenario, driver needs to check if rename is being retried and success 
if source if not found, but destination is present.

 

  was:
Currently when a PUT or POST operation timeouts and the server has already 
successfully executed the operation, there is no check in driver to see if the 
operation did succeed or not and just retries the same operation again. This 
can cause driver to through invalid user errors.

 

Sample scenario:
 # Rename request times out. Though server has successfully executed the 
operation.
 # Driver retries rename and get source not found error.

In the scenario, driver needs to check if rename is being retried and success 
if source if not found, but destination is present.

 


> ABFS: Make PUT and POST operations idempotent
> -
>
> Key: HADOOP-17015
> URL: https://issues.apache.org/jira/browse/HADOOP-17015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> Initially changes were made as part of this PR to handle idempotency 
> including rename operation with the understanding that last modified time 
> gets updated. But that assumption was wrong and the rename idempotency 
> handling has since evolved.
> For a job clean up, if the Manifest Committer in below Jira is used, then 
> rename idempotency works using the previously fetched etag :
> [HADOOP-18163] hadoop-azure support for the Manifest Committer of 
> MAPREDUCE-7341 - ASF JIRA (apache.org)
>  
> A part of the commit tracked under current Jira to handle DELETE idempotency 
> is still relevant.
> A means to handle idempotency between driver and backend inherently is being 
> worked upon.
> -- Older notes
> Currently when a PUT or POST operation timeouts and the server has already 
> successfully executed the operation, there is no check in driver to see if 
> the operation did succeed or not and just retries the same operation again. 
> This can cause driver to through invalid user errors.
>  
> Sample scenario:
>  # Rename request times out. Though server has successfully executed the 
> operation.
>  # Driver retries rename and get source not found error.
> In the scenario, driver needs to check if rename is being retried and success 
> if source if not found, but destination is present.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17015) ABFS: Make PUT and POST operations idempotent

2022-09-27 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17609911#comment-17609911
 ] 

Sneha Vijayarajan commented on HADOOP-17015:


RENAME Idempotency code is revoked.

Initially changes were made as part of this PR to handle idempotency including 
rename operation with the understanding that last modified time gets updated. 
But that assumption was wrong and the rename idempotency handling has since 
evolved.

For a job clean up, if the Manifest Committer in below Jira is used, then 
rename idempotency works using the previously fetched etag :

HADOOP-18163 hadoop-azure support for the Manifest Committer of MAPREDUCE-7341 
- ASF JIRA (apache.org)

 

A part of the commit tracked under current Jira to handle DELETE idempotency is 
still relevant.

A means to handle idempotency between driver and backend inherently is being 
worked upon.

> ABFS: Make PUT and POST operations idempotent
> -
>
> Key: HADOOP-17015
> URL: https://issues.apache.org/jira/browse/HADOOP-17015
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.2.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> Initially changes were made as part of this PR to handle idempotency 
> including rename operation with the understanding that last modified time 
> gets updated. But that assumption was wrong and the rename idempotency 
> handling has since evolved.
> For a job clean up, if the Manifest Committer in below Jira is used, then 
> rename idempotency works using the previously fetched etag :
> [HADOOP-18163] hadoop-azure support for the Manifest Committer of 
> MAPREDUCE-7341 - ASF JIRA (apache.org)
>  
> A part of the commit tracked under current Jira to handle DELETE idempotency 
> is still relevant.
> A means to handle idempotency between driver and backend inherently is being 
> worked upon.
> -- Older notes
> Currently when a PUT or POST operation timeouts and the server has already 
> successfully executed the operation, there is no check in driver to see if 
> the operation did succeed or not and just retries the same operation again. 
> This can cause driver to through invalid user errors.
>  
> Sample scenario:
>  # Rename request times out. Though server has successfully executed the 
> operation.
>  # Driver retries rename and get source not found error.
> In the scenario, driver needs to check if rename is being retried and success 
> if source if not found, but destination is present.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18081) FileNotFoundException in abfs mkdirs() call

2022-01-18 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478151#comment-17478151
 ] 

Sneha Vijayarajan commented on HADOOP-18081:


Hi [~ste...@apache.org] , Could you please share the request failure timestamp 
? Incase you have a more recent instance, please do share the details.

> FileNotFoundException in abfs mkdirs() call
> ---
>
> Key: HADOOP-18081
> URL: https://issues.apache.org/jira/browse/HADOOP-18081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> seen in production: calling mkdirs in FileOutputCommitter setupJob is 
> triggering an FNFE
> {code}
>  java.io.FileNotFoundException: Operation failed: "The specified path does 
> not exist.", 404, PUT, 
> https://bcket.dfs.core.windows.net/table1/_temporary/0?resource=directory&timeout=90,
>  PathNotFound, "The specified path does not exist."
>   at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1131)
>   at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.mkdirs(AzureBlobFileSystem.java:445)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2347)
> {code}
> I suspect what is happening is that while this job is setting up, a previous 
> job is doing cleanup/abort on the same path
> assuming that abfs mkdirs is like the posix one -nonatomic, as it goes 
> up/down the chain of parent dirs, something else gets in the way.
> if so, this is something which can be handled in the client -when we get an 
> FNFE we could warn and retry.
> in the manifest committer each job will have a unique id under _temporary and 
> there will be the option to skip deleting the temp dir entirely, for better 
> coexistence of active jobs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18081) FileNotFoundException in abfs mkdirs() call

2022-01-18 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan reassigned HADOOP-18081:
--

Assignee: Sneha Vijayarajan

> FileNotFoundException in abfs mkdirs() call
> ---
>
> Key: HADOOP-18081
> URL: https://issues.apache.org/jira/browse/HADOOP-18081
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> seen in production: calling mkdirs in FileOutputCommitter setupJob is 
> triggering an FNFE
> {code}
>  java.io.FileNotFoundException: Operation failed: "The specified path does 
> not exist.", 404, PUT, 
> https://bcket.dfs.core.windows.net/table1/_temporary/0?resource=directory&timeout=90,
>  PathNotFound, "The specified path does not exist."
>   at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.checkException(AzureBlobFileSystem.java:1131)
>   at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.mkdirs(AzureBlobFileSystem.java:445)
>   at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:2347)
> {code}
> I suspect what is happening is that while this job is setting up, a previous 
> job is doing cleanup/abort on the same path
> assuming that abfs mkdirs is like the posix one -nonatomic, as it goes 
> up/down the chain of parent dirs, something else gets in the way.
> if so, this is something which can be handled in the client -when we get an 
> FNFE we could warn and retry.
> in the manifest committer each job will have a unique id under _temporary and 
> there will be the option to skip deleting the temp dir entirely, for better 
> coexistence of active jobs.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18012) ABFS: Enable config controlled ETag check for Rename idempotency

2021-11-21 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-18012:
---
Summary: ABFS: Enable config controlled ETag check for Rename idempotency  
(was: ABFS: Modify Rename idempotency code)

> ABFS: Enable config controlled ETag check for Rename idempotency
> 
>
> Key: HADOOP-18012
> URL: https://issues.apache.org/jira/browse/HADOOP-18012
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> ABFS driver has a handling for rename idempotency which relies on LMT of the 
> destination file to conclude if the rename was successful or not when source 
> file is absent and if the rename request had entered retry loop.
> This handling is incorrect as LMT of the destination does not change on 
> rename. 
> This Jira will track the change to undo the current implementation and add a 
> new one where for an incoming rename operation, source file eTag is fetched 
> first and then rename is done only if eTag matches for the source file.
> As this is going to be a costly operation given an extra HEAD request is 
> added to each rename, this implementation will be guarded over a config and 
> can enabled by customers who have workloads that do multiple renames. 
> Long term plan to handle rename idempotency without HEAD request is being 
> discussed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18012) ABFS: Modify Rename idempotency code

2021-11-15 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17444030#comment-17444030
 ] 

Sneha Vijayarajan commented on HADOOP-18012:


Need to fix test failures seen along with this (as they overlap):

[ERROR] Errors: 
[ERROR]   
ITestAzureBlobFileSystemDelete.testDeleteIdempotencyTriggerHttp404:269 » 
NullPointer
[ERROR]   
ITestAzureBlobFileSystemRename.testRenameIdempotencyTriggerHttpNotFound:232->testRenameIdempotencyTriggerChecks:268->lambda$testRenameIdempotencyTriggerChecks$0:269
 » NullPointer

> ABFS: Modify Rename idempotency code
> 
>
> Key: HADOOP-18012
> URL: https://issues.apache.org/jira/browse/HADOOP-18012
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> ABFS driver has a handling for rename idempotency which relies on LMT of the 
> destination file to conclude if the rename was successful or not when source 
> file is absent and if the rename request had entered retry loop.
> This handling is incorrect as LMT of the destination does not change on 
> rename. 
> This Jira will track the change to undo the current implementation and add a 
> new one where for an incoming rename operation, source file eTag is fetched 
> first and then rename is done only if eTag matches for the source file.
> As this is going to be a costly operation given an extra HEAD request is 
> added to each rename, this implementation will be guarded over a config and 
> can enabled by customers who have workloads that do multiple renames. 
> Long term plan to handle rename idempotency without HEAD request is being 
> discussed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18012) ABFS: Modify Rename idempotency code

2021-11-15 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-18012:
--

 Summary: ABFS: Modify Rename idempotency code
 Key: HADOOP-18012
 URL: https://issues.apache.org/jira/browse/HADOOP-18012
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


ABFS driver has a handling for rename idempotency which relies on LMT of the 
destination file to conclude if the rename was successful or not when source 
file is absent and if the rename request had entered retry loop.

This handling is incorrect as LMT of the destination does not change on rename. 

This Jira will track the change to undo the current implementation and add a 
new one where for an incoming rename operation, source file eTag is fetched 
first and then rename is done only if eTag matches for the source file.

As this is going to be a costly operation given an extra HEAD request is added 
to each rename, this implementation will be guarded over a config and can 
enabled by customers who have workloads that do multiple renames. 

Long term plan to handle rename idempotency without HEAD request is being 
discussed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18011) ABFS: Enable config control for default connection timeout

2021-11-15 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-18011:
---
Description: 
ABFS driver has a default connection timeout and read timeout value of 30 secs. 
For jobs that are time sensitive, preference would be quick failure and have 
shorter HTTP connection and read timeout. 

This Jira is created enable config control over the default connection and read 
timeout. 

New config name:

fs.azure.http.connection.timeout

fs.azure.http.read.timeout

  was:
ABFS driver has a default connection timeout value of 30 secs. For jobs that 
are time sensitive, preference would be quick failure and would prefer a 
shorted connection timeout. 

This Jira is created enable config control over the default connection timeout. 

New config name: fs.azure.http.connection.timeout


> ABFS: Enable config control for default connection timeout 
> ---
>
> Key: HADOOP-18011
> URL: https://issues.apache.org/jira/browse/HADOOP-18011
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> ABFS driver has a default connection timeout and read timeout value of 30 
> secs. For jobs that are time sensitive, preference would be quick failure and 
> have shorter HTTP connection and read timeout. 
> This Jira is created enable config control over the default connection and 
> read timeout. 
> New config name:
> fs.azure.http.connection.timeout
> fs.azure.http.read.timeout



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18011) ABFS: Enable config control for default connection timeout

2021-11-15 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-18011:
---
Description: 
ABFS driver has a default connection timeout value of 30 secs. For jobs that 
are time sensitive, preference would be quick failure and would prefer a 
shorted connection timeout. 

This Jira is created enable config control over the default connection timeout. 

New config name: fs.azure.http.connection.timeout

  was:
ABFS driver has a default connection timeout value of 30 secs. For jobs that 
are time sensitive, preference would be quick failure and would prefer a 
shorted connection timeout. 

This Jira is created enable config control over the default connection timeout. 

New config name: fs.azure.connection.timeout


> ABFS: Enable config control for default connection timeout 
> ---
>
> Key: HADOOP-18011
> URL: https://issues.apache.org/jira/browse/HADOOP-18011
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> ABFS driver has a default connection timeout value of 30 secs. For jobs that 
> are time sensitive, preference would be quick failure and would prefer a 
> shorted connection timeout. 
> This Jira is created enable config control over the default connection 
> timeout. 
> New config name: fs.azure.http.connection.timeout



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18011) ABFS: Enable config control for default connection timeout

2021-11-14 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-18011:
--

 Summary: ABFS: Enable config control for default connection 
timeout 
 Key: HADOOP-18011
 URL: https://issues.apache.org/jira/browse/HADOOP-18011
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


ABFS driver has a default connection timeout value of 30 secs. For jobs that 
are time sensitive, preference would be quick failure and would prefer a 
shorted connection timeout. 

This Jira is created enable config control over the default connection timeout. 

New config name: fs.azure.connection.timeout



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17890) ABFS: Refactor HTTP request handling code

2021-09-03 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17890:
--

 Summary: ABFS: Refactor HTTP request handling code
 Key: HADOOP-17890
 URL: https://issues.apache.org/jira/browse/HADOOP-17890
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Aims at Http request handling code refactoring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17853) ABFS: Enable optional store connectivity over azure specific protocol for data egress

2021-08-27 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405791#comment-17405791
 ] 

Sneha Vijayarajan commented on HADOOP-17853:


Hi [~ste...@apache.org],

Feature update in ABFS driver is to pass Http Request URL, queryparams and 
headers to APIs exposed by a maven artifact that takes up the responsibility to 
effectively use the feature path for read. In addition to the changes to 
redirect the requests, a class which holds on to the session details will also 
be added. The feature will be in dormant state and will have to be enabled 
consciously in environments that are suitable for the feature. One such 
requirements is that the Hadoop cluster needs to be on Azure VM.

The test update does not change or reduce the existing tests that run over 
REST. The read related tests for the feature are triggered separately for the 
respective test scenario.  Considering devs on ABFS driver need not have access 
to feature enabled environment, all the read tests validating the feature will 
run even if the feature flag is off and will rely on test mock code to 
replicate feature flow. This will ensure that future checkins by any dev will 
not break REST or the feature path.

On an actual store connected environment, close to 500 looped test runs have 
been completed successfully with no functional failures. This test run count is 
post the feature code freeze in ABFS driver.

Counters that IOStatistics collects remain valid while the feature is 
exercised. 

If the feature were to be enabled on an environment that isnt suitable to 
utilize it, request will fallback to REST and for the lifetime of that 
InputStream instance reads will only be attempted on REST. Fallbacks are in 
place even for the case should feature hit an irrecoverable issue. Hence 
callers are not exposed to any new error scenarios than what already exists.

Store connectivity over REST will continue to be a workflow that we optimize 
and support. 

Asserts in all the new tests added will use AssertJ.

I have closed the PR that I had initially linked to this JIRA. The changes 
included few test fixes and some refactoring needed for the feature as well. 
[~sumangala] has split the test fixes that we had to 4 PRs. Code updates that 
intent to refactor and make it easy to add the feature are split into 2 
different PRs. JIRAs of those PRs have been added here as children. Once these 
get checked in, we are left with just 1 PR that brings in the feature code.

The pending change is as atomic as possible for the feature. Committing it to a 
separate branch will need us to commit to trunk subsequently with no more add 
ons from the driver. Hence I would prefer to raise the PR on trunk directly.

 

> ABFS: Enable optional store connectivity over azure specific protocol for 
> data egress
> -
>
> Key: HADOOP-17853
> URL: https://issues.apache.org/jira/browse/HADOOP-17853
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This Jira is to provide an option to enable store access on read path over an 
> Azure specific protocol. This will only work on Azure VMs and hence will be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17872) ABFS: Refactor read flow to include ReadRequestParameter

2021-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17872:
---
Description: 
This Jira is to facilitate upcoming work as part of adding an alternate 
connection :
 HADOOP-17853 ABFS: Enable optional store connectivity over azure specific 
protocol for data egress - ASF JIRA (apache.org)

The scope of the change is to introduce a ReadRequestParameter that will 
include the various inputs needed for the read request to AbfsClient class.

  was:
This Jira is to facilitate upcoming work as part of adding an alternate 
connection :
[HADOOP-17853] ABFS: Enable optional store connectivity over azure specific 
protocol for data egress - ASF JIRA (apache.org)



The scope of the change is to make AbfsHttpOperation as abstract class and 
create a child class AbfsHttpConnection. Future connection types will be added 
as child of AbfsHttpOperation.


> ABFS: Refactor read flow to include ReadRequestParameter
> 
>
> Key: HADOOP-17872
> URL: https://issues.apache.org/jira/browse/HADOOP-17872
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> This Jira is to facilitate upcoming work as part of adding an alternate 
> connection :
>  HADOOP-17853 ABFS: Enable optional store connectivity over azure specific 
> protocol for data egress - ASF JIRA (apache.org)
> The scope of the change is to introduce a ReadRequestParameter that will 
> include the various inputs needed for the read request to AbfsClient class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17872) ABFS: Refactor read flow to include ReadRequestParameter

2021-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17872:
---
Labels:   (was: pull-request-available)

> ABFS: Refactor read flow to include ReadRequestParameter
> 
>
> Key: HADOOP-17872
> URL: https://issues.apache.org/jira/browse/HADOOP-17872
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> This Jira is to facilitate upcoming work as part of adding an alternate 
> connection :
> [HADOOP-17853] ABFS: Enable optional store connectivity over azure specific 
> protocol for data egress - ASF JIRA (apache.org)
> The scope of the change is to make AbfsHttpOperation as abstract class and 
> create a child class AbfsHttpConnection. Future connection types will be 
> added as child of AbfsHttpOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17872) ABFS: Refactor read flow to include ReadRequestParameter

2021-08-26 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17872:
--

 Summary: ABFS: Refactor read flow to include ReadRequestParameter
 Key: HADOOP-17872
 URL: https://issues.apache.org/jira/browse/HADOOP-17872
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


This Jira is to facilitate upcoming work as part of adding an alternate 
connection :
[HADOOP-17853] ABFS: Enable optional store connectivity over azure specific 
protocol for data egress - ASF JIRA (apache.org)



The scope of the change is to make AbfsHttpOperation as abstract class and 
create a child class AbfsHttpConnection. Future connection types will be added 
as child of AbfsHttpOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17853) ABFS: Enable optional store connectivity over azure specific protocol for data egress

2021-08-26 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405078#comment-17405078
 ] 

Sneha Vijayarajan commented on HADOOP-17853:


Thanks [~shv]. Got the issue with patch-diff from github. As the proto and 
development spanned over months rebase on trunk at-times ended up creating some 
corruption on the dev branch and merge looked a quick way forward then.
But now that we are breaking up the commits to various PRs, we will ensure that 
new PRs are rebased and such issues are not seen while applying patch.

 

Currently tests are run from onebox to have end to end functionality checks 
done. These tests confirm that there are no regressions and the performance is 
in par or slightly above REST. However for a qualitative analysis, the team is 
working out the logistics to have a cluster up which has all the necessary 
setup needed for the ideal functioning of the feature. The feature focuses more 
on optimal resource utilization and achieve higher throughout. So once the 
setup is up we should be able to derive feature relevant metrics. 
The test setup also needs these driver feature changes added to a Hadoop distro 
and we are hoping that we can backport the commits made to trunk for the 
feature by the time test environment is ready.

> ABFS: Enable optional store connectivity over azure specific protocol for 
> data egress
> -
>
> Key: HADOOP-17853
> URL: https://issues.apache.org/jira/browse/HADOOP-17853
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This Jira is to provide an option to enable store access on read path over an 
> Azure specific protocol. This will only work on Azure VMs and hence will be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17853) ABFS: Enable optional store connectivity over azure specific protocol for data egress

2021-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17853:
---
Labels:   (was: pull-request-available)

> ABFS: Enable optional store connectivity over azure specific protocol for 
> data egress
> -
>
> Key: HADOOP-17853
> URL: https://issues.apache.org/jira/browse/HADOOP-17853
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This Jira is to provide an option to enable store access on read path over an 
> Azure specific protocol. This will only work on Azure VMs and hence will be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17864) ABFS: Fork AbfsHttpOperation to add alternate connection

2021-08-25 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17864:
--

 Summary: ABFS: Fork AbfsHttpOperation to add alternate connection
 Key: HADOOP-17864
 URL: https://issues.apache.org/jira/browse/HADOOP-17864
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


This Jira is to facilitate upcoming work as part of adding an alternate 
connection :
[HADOOP-17853] ABFS: Enable optional store connectivity over azure specific 
protocol for data egress - ASF JIRA (apache.org)



The scope of the change is to make AbfsHttpOperation as abstract class and 
create a child class AbfsHttpConnection. Future connection types will be added 
as child of AbfsHttpOperation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17853) ABFS: Enable optional store connectivity over azure specific protocol for data egress

2021-08-22 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17402742#comment-17402742
 ] 

Sneha Vijayarajan commented on HADOOP-17853:


Hi [~shv], Thank you for taking a look into the changes. 

I looked further into the PR to see how best can I break it down and would like 
to check with you before I proceed. Will touch base with you.

Wrt to conflicts to trunk, could you please check your repo once. The PR 
conversation tab also prints if there is any conflict with trunk and will not 
trigger any CI run if conflicts exist, which isnt the case with this PR.

 

> ABFS: Enable optional store connectivity over azure specific protocol for 
> data egress
> -
>
> Key: HADOOP-17853
> URL: https://issues.apache.org/jira/browse/HADOOP-17853
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This Jira is to provide an option to enable store access on read path over an 
> Azure specific protocol. This will only work on Azure VMs and hence will be 
> disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17853) ABFS: Enable optional store connectivity over azure specific protocol for data egress

2021-08-17 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17853:
--

 Summary: ABFS: Enable optional store connectivity over azure 
specific protocol for data egress
 Key: HADOOP-17853
 URL: https://issues.apache.org/jira/browse/HADOOP-17853
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


This Jira is to provide an option to enable store access on read path over an 
Azure specific protocol. This will only work on Azure VMs and hence will be 
disabled by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17852) ABFS: Test with 100MB buffer size in ITestAbfsReadWriteAndSeek times out

2021-08-17 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17852:
--

 Summary: ABFS: Test with 100MB buffer size in 
ITestAbfsReadWriteAndSeek times out 
 Key: HADOOP-17852
 URL: https://issues.apache.org/jira/browse/HADOOP-17852
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.1
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


testReadAndWriteWithDifferentBufferSizesAndSeek with buffer size above 100 MB 
is failing with timeout. It is delaying the whole test run by 15-30 mins. 

[ERROR] 
testReadAndWriteWithDifferentBufferSizesAndSeek[Size=104,857,600](org.apache.hadoop.fs.azurebfs.ITestAbfsReadWriteAndSeek)
 Time elapsed: 1,800.041 s <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 180 
milliseconds
 at sun.misc.Unsafe.park(Native Method)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:429)
 at java.util.concurrent.FutureTask.get(FutureTask.java:191)
 at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.waitForAppendsToComplete(AbfsOutputStream.java:515)
 at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushWrittenBytesToService(AbfsOutputStream.java:533)
 at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.flushInternal(AbfsOutputStream.java:377)
 at 
org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.close(AbfsOutputStream.java:337)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77)
 at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
 at 
org.apache.hadoop.fs.azurebfs.ITestAbfsReadWriteAndSeek.testReadWriteAndSeek(ITestAbfsReadWriteAndSeek.java:81)
 at 
org.apache.hadoop.fs.azurebfs.ITestAbfsReadWriteAndSeek.testReadAndWriteWithDifferentBufferSizesAndSeek(ITestAbfsReadWriteAndSeek.java:66)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17767) ABFS: Improve test scripts

2021-06-21 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17767:
--

 Summary: ABFS: Improve test scripts
 Key: HADOOP-17767
 URL: https://issues.apache.org/jira/browse/HADOOP-17767
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.4.0


Current test run scripts need manual update across all combinations in 
runTests.sh for account name and is working off a single azure-auth-keys.xml 
file. While having to test across accounts that span various geo, the config 
file grows big and also needs a manual change for configs such as 
fs.contract.test.[abfs/abfss] which has to be uniquely set. To use the script 
across various combinations, dev to be aware of the names of all the 
combinations defined in runTests.sh as well.

 

These concerns are addressed in the new version of the scripts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17694) abfs: Unable to use OAuth authentication at storage account level if the default authn type is Custom

2021-05-17 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346249#comment-17346249
 ] 

Sneha Vijayarajan commented on HADOOP-17694:


[~arunravimv] - This is already fixed in trunk and branch 3.3

[HADOOP-17053] ABFS: FS initialize fails for incompatible account-agnostic 
Token Provider setting - ASF JIRA (apache.org)

If you need this in any of the minor branches of 3.3, please cherry pick the 
fix.

> abfs: Unable to use OAuth authentication at storage account level if the 
> default authn type is Custom
> -
>
> Key: HADOOP-17694
> URL: https://issues.apache.org/jira/browse/HADOOP-17694
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, tools
>Affects Versions: 3.3.0
>Reporter: Arun Ravi M V
>Priority: Major
>
> If we set the default auth type as Custom and then decided to use OAuth type 
> for some select storage accounts then the fs initialization for storage 
> accounts with Oauth type authn fails.
> Steps to recreate
> {code:java}
> conf.set("fs.abfss.impl", 
> "org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem")
> conf.set("fs.azure.account.auth.type", "Custom")
> conf.set("fs.azure.account.oauth.provider.type", "xxx.yyy.zzz.ADTokenAdaptee")
> conf.set("fs.azure.account.auth.type.abctest.dfs.core.windows.net", "OAuth")
> conf.set("fs.azure.account.oauth.provider.type.abctest.dfs.core.windows.net",
>   "org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider")
> val fs = FileSystem.get(
>   new 
> URI("abfss://conatiner...@abctest.dfs.core.windows.net/arion-scribe-de-dev"),
>   conf)
> {code}
> Error: java.lang.RuntimeException: class xxx.yyy.zzz.ADTokenAdaptee not 
> org.apache.hadoop.fs.azurebfs.oauth2.AccessTokenProvider
> Cause:
> In [AbfsConfiguration. 
> getTokenProvider|https://github.com/apache/hadoop/blob/aa96f1871bfd858f9bac59cf2a81ec470da649af/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java#L540]
>  , after evaluating the auth type as OAuth, the program proceeds to get the 
> implementing class using property 
> `fs.azure.account.auth.type.abctest.dfs.core.windows.net`,  while doing so 
> the first 
> [step|https://github.com/apache/hadoop/blob/aa96f1871bfd858f9bac59cf2a81ec470da649af/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsConfiguration.java#L321]
>  is to get the default auth class (`fs.azure.account.oauth.provider.type`), 
> which in our case is Custom. Here the problem is Default Auth class is 
> CustomTokenProviderAdaptee implementation and not implementing  
> AccessTokenProvider.class, hence the program would fail.
> proposed solution:
>  In the getClass function in AbfsConfiguration, we split the logic and not 
> use the default value property
> {code:java}
> public  Class getClass(String name, Class 
> defaultValue, Class xface) {
> Class klass = rawConfig.getClass(accountConf(name),
> null, xface);
> 
> if(klass!=null){
> return klass;
> }else{
> return rawConfig.getClass(name, defaultValue, xface);
> }
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17590) ABFS: Introduce Lease Operations with Append to provide single writer semantics

2021-03-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan reassigned HADOOP-17590:
--

Assignee: Sneha Varma

> ABFS: Introduce Lease Operations with Append to provide single writer 
> semantics
> ---
>
> Key: HADOOP-17590
> URL: https://issues.apache.org/jira/browse/HADOOP-17590
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sneha Varma
>Assignee: Sneha Varma
>Priority: Major
>
> The lease operations will be introduced as part of Append, Flush to ensure 
> the single writer semantics.
>  
> Details:
> Acquire Lease will be introduced in Create, Auto-Renew, Acquire will be added 
> to Append & Release, Auto-Renew, Acquire in Flush.
>  
> Duration the creation of the file the lease will be acquired, as part of 
> appends the lease will be auto-renewed & the lease can be released as part of 
> flush.
>  
> By default the lease duration will be of 60 seconds.
> "fs.azure.write.enforcelease" & "fs.azure.write.lease.duration" two configs 
> will be introduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17413) ABFS: Release Elastic ByteBuffer pool memory at outputStream close

2020-12-07 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17413:
---
Status: Patch Available  (was: Open)

> ABFS: Release Elastic ByteBuffer pool memory at outputStream close
> --
>
> Key: HADOOP-17413
> URL: https://issues.apache.org/jira/browse/HADOOP-17413
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Each AbfsOutputStream holds on to an instance of elastic bytebuffer pool. 
> This instance needs to be released so that the memory can be given back to 
> JVM's available memory pool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17407) ABFS: Delete Idempotency handling can lead to NPE

2020-12-07 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17407:
---
Status: Patch Available  (was: Open)

> ABFS: Delete Idempotency handling can lead to NPE
> -
>
> Key: HADOOP-17407
> URL: https://issues.apache.org/jira/browse/HADOOP-17407
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Delete idempotency code returns success with a dummy success HttpOperation. 
> the calling code that checks continuation token throws NPE as the dummy 
> success instance does not have any response headers.
> In case of non-HNS account, server coulf return continuation token.  Dummy 
> success response code is modified to not fail while accessing response 
> headers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17413) ABFS: Release Elastic ByteBuffer pool memory at outputStream close

2020-12-07 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17413:
--

 Summary: ABFS: Release Elastic ByteBuffer pool memory at 
outputStream close
 Key: HADOOP-17413
 URL: https://issues.apache.org/jira/browse/HADOOP-17413
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.3.0


Each AbfsOutputStream holds on to an instance of elastic bytebuffer pool. This 
instance needs to be released so that the memory can be given back to JVM's 
available memory pool. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17407) ABFS: Delete Idempotency handling can lead to NPE

2020-12-07 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17407:
---
Description: 
Delete idempotency code returns success with a dummy success HttpOperation. the 
calling code that checks continuation token throws NPE as the dummy success 
instance does not have any response headers.

In case of non-HNS account, server coulf return continuation token.  Dummy 
success response code is modified to not fail while accessing response headers.

 

  was:
Delete idempotency code returns success with a dummy success HttpOperation. the 
calling code that checks continuation token throws NPE as the dummy success 
instance does not have any response headers.

 

ABFS server endpoint doesnt utilize continuation token concept for delete and 
hence that code needs to be removed.


> ABFS: Delete Idempotency handling can lead to NPE
> -
>
> Key: HADOOP-17407
> URL: https://issues.apache.org/jira/browse/HADOOP-17407
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.3.1
>
>
> Delete idempotency code returns success with a dummy success HttpOperation. 
> the calling code that checks continuation token throws NPE as the dummy 
> success instance does not have any response headers.
> In case of non-HNS account, server coulf return continuation token.  Dummy 
> success response code is modified to not fail while accessing response 
> headers.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17407) ABFS: Delete Idempotency handling can lead to NPE

2020-12-03 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17407:
---
Description: 
Delete idempotency code returns success with a dummy success HttpOperation. the 
calling code that checks continuation token throws NPE as the dummy success 
instance does not have any response headers.

 

ABFS server endpoint doesnt utilize continuation token concept for delete and 
hence that code needs to be removed.

  was:
Delete idempotency code returns success with a dummy success HttpOperation. the 
calling code that checks continuation token throws NPE as the dummy success 
instance does not have any response headers.

 

Handling of idempotency needs to happen at a higher level to avoid this issue.


> ABFS: Delete Idempotency handling can lead to NPE
> -
>
> Key: HADOOP-17407
> URL: https://issues.apache.org/jira/browse/HADOOP-17407
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.3.1
>
>
> Delete idempotency code returns success with a dummy success HttpOperation. 
> the calling code that checks continuation token throws NPE as the dummy 
> success instance does not have any response headers.
>  
> ABFS server endpoint doesnt utilize continuation token concept for delete and 
> hence that code needs to be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17407) ABFS: Delete Idempotency handling can lead to NPE

2020-12-03 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17407:
--

 Summary: ABFS: Delete Idempotency handling can lead to NPE
 Key: HADOOP-17407
 URL: https://issues.apache.org/jira/browse/HADOOP-17407
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.3.1


Delete idempotency code returns success with a dummy success HttpOperation. the 
calling code that checks continuation token throws NPE as the dummy success 
instance does not have any response headers.

 

Handling of idempotency needs to happen at a higher level to avoid this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17404) ABFS: Piggyback flush on Append calls for short writes

2020-12-01 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17404:
---
Status: Patch Available  (was: Open)

> ABFS: Piggyback flush on Append calls for short writes
> --
>
> Key: HADOOP-17404
> URL: https://issues.apache.org/jira/browse/HADOOP-17404
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When Hflush or Hsync APIs are called, a call is made to store backend to 
> commit the data that was appended. 
> If the data size written by Hadoop app is small, i.e. data size :
>  * before any of HFlush/HSync call is made or
>  * between 2 HFlush/Hsync API calls
> is less than write buffer size, 2 separate calls, one for append and another 
> for flush is made,
> Apps that do such small writes eventually end up with almost similar number 
> of calls for flush and append.
> This PR enables Flush to be piggybacked onto append call for such short write 
> scenarios.
>  
> NOTE: The changes is guarded over a config, and is disabled by default until 
> relevant supported changes is made available on all store production clusters.
> New Config added: fs.azure.write.enableappendwithflush



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17404) ABFS: Piggyback flush on Append calls for short writes

2020-12-01 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17404:
---
Description: 
When Hflush or Hsync APIs are called, a call is made to store backend to commit 
the data that was appended. 

If the data size written by Hadoop app is small, i.e. data size :
 * before any of HFlush/HSync call is made or

 * between 2 HFlush/Hsync API calls

is less than write buffer size, 2 separate calls, one for append and another 
for flush is made,

Apps that do such small writes eventually end up with almost similar number of 
calls for flush and append.

This PR enables Flush to be piggybacked onto append call for such short write 
scenarios.

 

NOTE: The changes is guarded over a config, and is disabled by default until 
relevant supported changes is made available on all store production clusters.

New Config added: fs.azure.write.enableappendwithflush

  was:
When Hflush or Hsync APIs are called, a call is made to store backend to commit 
the data that was appended. 

If the data size written by Hadoop app is small, i.e. data size :
 * before any of HFlush/HSync call is made or

 * between 2 HFlush/Hsync API calls

is less than write buffer size, 2 separate calls, one for append and another 
for flush is made,

Apps that do such small writes eventually end up with almost similar number of 
calls for flush and append.

This PR enables Flush to be piggybacked onto append call for such short write 
scenarios.

 

NOTE: The changes is guarded over a config, and is disabled by default until 
relevant supported changes is made available on all store production clusters.


> ABFS: Piggyback flush on Append calls for short writes
> --
>
> Key: HADOOP-17404
> URL: https://issues.apache.org/jira/browse/HADOOP-17404
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.3.1
>
>
> When Hflush or Hsync APIs are called, a call is made to store backend to 
> commit the data that was appended. 
> If the data size written by Hadoop app is small, i.e. data size :
>  * before any of HFlush/HSync call is made or
>  * between 2 HFlush/Hsync API calls
> is less than write buffer size, 2 separate calls, one for append and another 
> for flush is made,
> Apps that do such small writes eventually end up with almost similar number 
> of calls for flush and append.
> This PR enables Flush to be piggybacked onto append call for such short write 
> scenarios.
>  
> NOTE: The changes is guarded over a config, and is disabled by default until 
> relevant supported changes is made available on all store production clusters.
> New Config added: fs.azure.write.enableappendwithflush



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17404) ABFS: Piggyback flush on Append calls for short writes

2020-12-01 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17404:
--

 Summary: ABFS: Piggyback flush on Append calls for short writes
 Key: HADOOP-17404
 URL: https://issues.apache.org/jira/browse/HADOOP-17404
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.3.1


When Hflush or Hsync APIs are called, a call is made to store backend to commit 
the data that was appended. 

If the data size written by Hadoop app is small, i.e. data size :
 * before any of HFlush/HSync call is made or

 * between 2 HFlush/Hsync API calls

is less than write buffer size, 2 separate calls, one for append and another 
for flush is made,

Apps that do such small writes eventually end up with almost similar number of 
calls for flush and append.

This PR enables Flush to be piggybacked onto append call for such short write 
scenarios.

 

NOTE: The changes is guarded over a config, and is disabled by default until 
relevant supported changes is made available on all store production clusters.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17397) ABFS: SAS Test updates for version and permission update

2020-11-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17397:
---
Status: Patch Available  (was: Open)

> ABFS: SAS Test updates for version and permission update
> 
>
> Key: HADOOP-17397
> URL: https://issues.apache.org/jira/browse/HADOOP-17397
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This Jira will track the below 2 updates to SAS test code:
>  # Upgrading the SAS version in Service SAS generator (test code)
>  # Updating the permission in Delegation SAS to "op" from "p" for ACL 
> operation as identities added as suoid/saoid added by tests are not owners of 
> test path (Again test code).
>  [Relevant public documentation: 
> https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview|https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17397) ABFS: SAS Test updates for version and permission update

2020-11-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17397:
---
Description: 
This Jira will track the below 2 updates to SAS test code:
 # Upgrading the SAS version in Service SAS generator (test code)
 # Updating the permission in Delegation SAS to "op" from "p" for ACL operation 
as identities added as suoid/saoid added by tests are not owners of test path 
(Again test code).
 [Relevant public documentation: 
https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview|https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview]

  was:
This Jira will track the below 2 updates to SAS test code:
 # Upgrading the SAS version
 # Updating the permission in SAS to "op" from "p" for ACL operation as 
identities added as suoid/saoid added by tests are not owners of test path
[Relevant public documentation: 
https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview|https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview]


> ABFS: SAS Test updates for version and permission update
> 
>
> Key: HADOOP-17397
> URL: https://issues.apache.org/jira/browse/HADOOP-17397
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
> Fix For: 3.3.0
>
>
> This Jira will track the below 2 updates to SAS test code:
>  # Upgrading the SAS version in Service SAS generator (test code)
>  # Updating the permission in Delegation SAS to "op" from "p" for ACL 
> operation as identities added as suoid/saoid added by tests are not owners of 
> test path (Again test code).
>  [Relevant public documentation: 
> https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview|https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17396) ABFS: testRenameFileOverExistingFile Fails after Contract test update

2020-11-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17396:
---
Status: Patch Available  (was: Open)

> ABFS: testRenameFileOverExistingFile Fails after Contract test update
> -
>
> Key: HADOOP-17396
> URL: https://issues.apache.org/jira/browse/HADOOP-17396
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Post updates to rename on existing file test, ABFS contract test is having 
> failure.
> Updates were made in the AbstractContractTest class in 
> https://issues.apache.org/jira/browse/HADOOP-17365.
> To align to test expectation, ABFS tests need config 
> "fs.contract.rename-returns-false-if-dest-exists" set to true. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17397) ABFS: SAS Test updates for version and permission update

2020-11-25 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17397:
--

 Summary: ABFS: SAS Test updates for version and permission update
 Key: HADOOP-17397
 URL: https://issues.apache.org/jira/browse/HADOOP-17397
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.3.0


This Jira will track the below 2 updates to SAS test code:
 # Upgrading the SAS version
 # Updating the permission in SAS to "op" from "p" for ACL operation as 
identities added as suoid/saoid added by tests are not owners of test path
[Relevant public documentation: 
https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview|https://docs.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas#specify-a-signed-object-id-for-a-security-principal-preview]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17396) ABFS: testRenameFileOverExistingFile Fails after Contract test update

2020-11-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17396:
---
Description: 
Post updates to rename on existing file test, ABFS contract test is having 
failure.

Updates were made in the AbstractContractTest class in 
https://issues.apache.org/jira/browse/HADOOP-17365.

To align to test expectation, ABFS tests need config 
"fs.contract.rename-returns-false-if-dest-exists" set to true. 

 

  was:
Post updates to rename on existing file test, ABFS contract test is having 
failure.

Updates made in https://issues.apache.org/jira/browse/HADOOP-17365.

To align to expecation, ABFS test config needs 
"fs.contract.rename-returns-false-if-dest-exists" set to true. 

 


> ABFS: testRenameFileOverExistingFile Fails after Contract test update
> -
>
> Key: HADOOP-17396
> URL: https://issues.apache.org/jira/browse/HADOOP-17396
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> Post updates to rename on existing file test, ABFS contract test is having 
> failure.
> Updates were made in the AbstractContractTest class in 
> https://issues.apache.org/jira/browse/HADOOP-17365.
> To align to test expectation, ABFS tests need config 
> "fs.contract.rename-returns-false-if-dest-exists" set to true. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17396) ABFS: testRenameFileOverExistingFile Fails after Contract test update

2020-11-25 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17396:
--

 Summary: ABFS: testRenameFileOverExistingFile Fails after Contract 
test update
 Key: HADOOP-17396
 URL: https://issues.apache.org/jira/browse/HADOOP-17396
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Post updates to rename on existing file test, ABFS contract test is having 
failure.

Updates made in https://issues.apache.org/jira/browse/HADOOP-17365.

To align to expecation, ABFS test config needs 
"fs.contract.rename-returns-false-if-dest-exists" set to true. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17296) ABFS: Allow Random Reads to be of Buffer Size

2020-10-23 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219636#comment-17219636
 ] 

Sneha Vijayarajan commented on HADOOP-17296:


Hi [~mukund-thakur],

Please find answers inline from the TPCDs runs we did internally:

1) How many worker threads per spark process?

- 20

2) What type of data parquet/orc?

-Parquet

3) Size of TPCD datasets ? 

- 1 TB and 10 TB datasets

4) If you could share more information about the Job1 to Job4? Also if we can 
extract the query planning time separately, it would be easier to compare the 
read times.  

 - We do not have access to customer job urls or the job source code. So don't 
have the data on those to share.

 

Thanks. 

> ABFS: Allow Random Reads to be of Buffer Size
> -
>
> Key: HADOOP-17296
> URL: https://issues.apache.org/jira/browse/HADOOP-17296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
> for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next 
> read doesn't skip by a lot and can be served by the earlier read if read was 
> done in buffer size. As a result the job triggered a higher count of read 
> calls/higher IOPS, resulting in higher IOPS throttling and hence resulted in 
> higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the 
> jobs fared well. 
> This Jira attempts to get a Gen1 customer migrating to Gen2 get the same 
> overall i/o pattern as gen1 and the same perf characteristics.
> *+Stats from Customer Job:+*
>  
> |*Customer Job*|*Gen 1 timing*|*Gen 2 Without patch*|*Gen2 with patch and 
> RAH=0*|
> |Job1|2 h 47 m|3 h 45 m|2 h 27 mins|
> |Job2|2 h 17 m|3 h 24 m|2 h 39 mins|
> |Job3|3 h 16 m|4 h 29 m|3 h 21 mins|
> |Job4|1 h 59 m|3 h 12 m|2 h 28 mins|
>  
> *+Stats from Internal TPCDs runs+* 
> [Total number of TPCDs queries per suite run = 80  
> Full suite repeat run count per config = 3]
> | |*Gen1*|Gen2 Without patch|*Gen2 With patch and RAH=0*
> *(Gen2 in Gen1 config)*|*Gen2 With patch and RAH=2*|
> |%Run Duration|100|140|213|70-90|
> |%Read IOPS|100|106|98|110-115|
>  
> *Without patch = default Jar with random read logic
> *With patch=Modified Jar with change to always read buffer size
> *RAH=ReadAheadQueueDepth
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17325) WASB: Test failure in trunk

2020-10-22 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17325:
--

 Summary: WASB: Test failure in trunk
 Key: HADOOP-17325
 URL: https://issues.apache.org/jira/browse/HADOOP-17325
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Esfandiar Manii


WASB tests are failing in Apache trunk resulting in Yetus run failures for PRs.

 
||Reason||Tests||
|Failed junit tests|hadoop.fs.azure.TestNativeAzureFileSystemMocked|
| |hadoop.fs.azure.TestNativeAzureFileSystemConcurrency|
| |hadoop.fs.azure.TestWasbFsck|
| |hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked|
| |hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck|
| |hadoop.fs.azure.TestNativeAzureFileSystemContractMocked|
| |hadoop.fs.azure.TestOutOfBandAzureBlobOperations|
| |hadoop.fs.azure.TestBlobMetadata|

Many PRs are hit by this. Test report link from one of the PRs:
[https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2368/5/testReport/]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17311) ABFS: Logs should redact SAS signature

2020-10-18 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17311:
--

 Summary: ABFS: Logs should redact SAS signature
 Key: HADOOP-17311
 URL: https://issues.apache.org/jira/browse/HADOOP-17311
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Signature part of the SAS should be redacted for security purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17301) ABFS: Fix bug introduced in HADOOP-16852 which reports read-ahead error back

2020-10-12 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17301:
---
Status: Patch Available  (was: Open)

> ABFS: Fix bug introduced in HADOOP-16852 which reports read-ahead error back
> 
>
> Key: HADOOP-17301
> URL: https://issues.apache.org/jira/browse/HADOOP-17301
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When reads done by readahead buffers failed, the exceptions where dropped and 
> the failure was not getting reported to the calling app. 
> Jira HADOOP-16852: Report read-ahead error back
> tried to handle the scenario by reporting the error back to calling app. But 
> the commit has introduced a bug which can lead to ReadBuffer being injected 
> into read completed queue twice. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17296) ABFS: Allow Random Reads to be of Buffer Size

2020-10-08 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17296:
---
Description: 
ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
for when the read pattern is random. 

It was observed in some spark jobs that though the reads are random, the next 
read doesn't skip by a lot and can be served by the earlier read if read was 
done in buffer size. As a result the job triggered a higher count of read 
calls/higher IOPS, resulting in higher IOPS throttling and hence resulted in 
higher job runtime.

When these jobs were run against Gen1 which always reads in buffer size , the 
jobs fared well. 

This Jira attempts to get a Gen1 customer migrating to Gen2 get the same 
overall i/o pattern as gen1 and the same perf characteristics.

*+Stats from Customer Job:+*

 
|*Customer Job*|*Gen 1 timing*|*Gen 2 Without patch*|*Gen2 with patch and 
RAH=0*|
|Job1|2 h 47 m|3 h 45 m|2 h 27 mins|
|Job2|2 h 17 m|3 h 24 m|2 h 39 mins|
|Job3|3 h 16 m|4 h 29 m|3 h 21 mins|
|Job4|1 h 59 m|3 h 12 m|2 h 28 mins|

 

*+Stats from Internal TPCDs runs+* 

[Total number of TPCDs queries per suite run = 80  

Full suite repeat run count per config = 3]
| |*Gen1*|Gen2 Without patch|*Gen2 With patch and RAH=0*
*(Gen2 in Gen1 config)*|*Gen2 With patch and RAH=2*|
|%Run Duration|100|140|213|70-90|
|%Read IOPS|100|106|98|110-115|

 

*Without patch = default Jar with random read logic

*With patch=Modified Jar with change to always read buffer size

*RAH=ReadAheadQueueDepth

 

  was:
ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
for when the read pattern is random. 

It was observed in some spark jobs that though the reads are random, the next 
read doesn't skip by a lot and can be served by the earlier read if read was 
done in buffer size. As a result the job triggered a higher count of read calls 
and resulted in higher job runtime.

When these jobs were run against Gen1 which always reads in buffer size , the 
jobs fared well. 

In this Jira we try to provide a control over config on random read to be of 
requested size or buffer size.


> ABFS: Allow Random Reads to be of Buffer Size
> -
>
> Key: HADOOP-17296
> URL: https://issues.apache.org/jira/browse/HADOOP-17296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
> for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next 
> read doesn't skip by a lot and can be served by the earlier read if read was 
> done in buffer size. As a result the job triggered a higher count of read 
> calls/higher IOPS, resulting in higher IOPS throttling and hence resulted in 
> higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the 
> jobs fared well. 
> This Jira attempts to get a Gen1 customer migrating to Gen2 get the same 
> overall i/o pattern as gen1 and the same perf characteristics.
> *+Stats from Customer Job:+*
>  
> |*Customer Job*|*Gen 1 timing*|*Gen 2 Without patch*|*Gen2 with patch and 
> RAH=0*|
> |Job1|2 h 47 m|3 h 45 m|2 h 27 mins|
> |Job2|2 h 17 m|3 h 24 m|2 h 39 mins|
> |Job3|3 h 16 m|4 h 29 m|3 h 21 mins|
> |Job4|1 h 59 m|3 h 12 m|2 h 28 mins|
>  
> *+Stats from Internal TPCDs runs+* 
> [Total number of TPCDs queries per suite run = 80  
> Full suite repeat run count per config = 3]
> | |*Gen1*|Gen2 Without patch|*Gen2 With patch and RAH=0*
> *(Gen2 in Gen1 config)*|*Gen2 With patch and RAH=2*|
> |%Run Duration|100|140|213|70-90|
> |%Read IOPS|100|106|98|110-115|
>  
> *Without patch = default Jar with random read logic
> *With patch=Modified Jar with change to always read buffer size
> *RAH=ReadAheadQueueDepth
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17296) ABFS: Allow Random Reads to be of Buffer Size

2020-10-08 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209897#comment-17209897
 ] 

Sneha Vijayarajan edited comment on HADOOP-17296 at 10/8/20, 9:03 AM:
--

[~mukund-thakur] - 

Readahead.range will provide a static increase to whatever the read size will 
be for the requested read, which makes the read to store to be of different 
size. 

 The specific case mentioned in description was a pattern observed for a 
parquet file which had very small row group size, which I gather isnt an 
optimal structure for parquet file. Gen1 job run was more performant as it was 
reading a full buffer and a buffer read ended up reading more row groups.

Gen2's random read logic ended up triggering more IOPs as when random it reads 
only the requested bytes. To highlight that it was the randomness of read 
pattern that lead to high job runtime and more IOPs, forcing Gen2 to read a 
full buffer like Gen1 helped. 

But reading a full buffer for every random read is definitely not ideal esp a 
blocking read call for app. Hence the configs that enforce a full buffer read 
will be set to false by default. We get similar asks for comparisons between 
Gen1 to Gen2 for same workloads, and this Jira configs will get a gen1 customer 
migrating to gen2 the same overall i/o pattern as gen1 and the same perf 
characteristics.

Readahead.range which will be a consistent amount of data read ahead on top of 
the varying requested read size is definitely a better solution for a 
performant random read on Gen2 and we should pursue on that. And this change 
wont make any override with the range update. 

 

 


was (Author: snvijaya):
[~mukund-thakur] - 

Readahead.range will provide a static increase to whatever the read size will 
be for the requested read, which makes the read to store to be of different 
size. 

 The specific case mentioned in description was a pattern observed for a 
parquet file which had very small row group size, which I gather isnt an 
optimal structure for parquet file. Gen1 job run was more performant as it was 
reading a full buffer and a buffer read ended up reading more row groups. 

Gen2's random read logic ended up triggering more IOPs as when random it reads 
only the requested bytes. To highlight that it was the randomness of read 
pattern that lead to high job runtime and more IOPs, forcing Gen2 to read a 
full buffer like Gen1 helped. 

But reading a full buffer for every random read is definitely not ideal esp a 
blocking read call for app. Hence the configs that enforce a full buffer read 
will be set to false by default. We get similar asks for comparisons between 
Gen1 to Gen2 for same workloads, and we are hoping that rerun of the workload 
with this config turned on will be easier to get that information with the IO 
pattern matching.

Readahead.range which will be a consistent amount of data read ahead on top of 
the varying requested read size is definitely a better solution for a 
performant random read on Gen2 and we should pursue on that. 

 

 

> ABFS: Allow Random Reads to be of Buffer Size
> -
>
> Key: HADOOP-17296
> URL: https://issues.apache.org/jira/browse/HADOOP-17296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
> for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next 
> read doesn't skip by a lot and can be served by the earlier read if read was 
> done in buffer size. As a result the job triggered a higher count of read 
> calls and resulted in higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the 
> jobs fared well. 
> In this Jira we try to provide a control over config on random read to be of 
> requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17301) ABFS: Fix bug introduced in HADOOP-16852 which reports read-ahead error back

2020-10-07 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17301:
--

 Summary: ABFS: Fix bug introduced in HADOOP-16852 which reports 
read-ahead error back
 Key: HADOOP-17301
 URL: https://issues.apache.org/jira/browse/HADOOP-17301
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


When reads done by readahead buffers failed, the exceptions where dropped and 
the failure was not getting reported to the calling app. 

Jira HADOOP-16852: Report read-ahead error back

tried to handle the scenario by reporting the error back to calling app. But 
the commit has introduced a bug which can lead to ReadBuffer being injected 
into read completed queue twice. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17296) ABFS: Allow Random Reads to be of Buffer Size

2020-10-07 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17296:
---
Status: Patch Available  (was: Open)

> ABFS: Allow Random Reads to be of Buffer Size
> -
>
> Key: HADOOP-17296
> URL: https://issues.apache.org/jira/browse/HADOOP-17296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
> for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next 
> read doesn't skip by a lot and can be served by the earlier read if read was 
> done in buffer size. As a result the job triggered a higher count of read 
> calls and resulted in higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the 
> jobs fared well. 
> In this Jira we try to provide a control over config on random read to be of 
> requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17296) ABFS: Allow Random Reads to be of Buffer Size

2020-10-07 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209897#comment-17209897
 ] 

Sneha Vijayarajan commented on HADOOP-17296:


[~mukund-thakur] - 

Readahead.range will provide a static increase to whatever the read size will 
be for the requested read, which makes the read to store to be of different 
size. 

 The specific case mentioned in description was a pattern observed for a 
parquet file which had very small row group size, which I gather isnt an 
optimal structure for parquet file. Gen1 job run was more performant as it was 
reading a full buffer and a buffer read ended up reading more row groups. 

Gen2's random read logic ended up triggering more IOPs as when random it reads 
only the requested bytes. To highlight that it was the randomness of read 
pattern that lead to high job runtime and more IOPs, forcing Gen2 to read a 
full buffer like Gen1 helped. 

But reading a full buffer for every random read is definitely not ideal esp a 
blocking read call for app. Hence the configs that enforce a full buffer read 
will be set to false by default. We get similar asks for comparisons between 
Gen1 to Gen2 for same workloads, and we are hoping that rerun of the workload 
with this config turned on will be easier to get that information with the IO 
pattern matching.

Readahead.range which will be a consistent amount of data read ahead on top of 
the varying requested read size is definitely a better solution for a 
performant random read on Gen2 and we should pursue on that. 

 

 

> ABFS: Allow Random Reads to be of Buffer Size
> -
>
> Key: HADOOP-17296
> URL: https://issues.apache.org/jira/browse/HADOOP-17296
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
> for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next 
> read doesn't skip by a lot and can be served by the earlier read if read was 
> done in buffer size. As a result the job triggered a higher count of read 
> calls and resulted in higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the 
> jobs fared well. 
> In this Jira we try to provide a control over config on random read to be of 
> requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17250) ABFS: Random read perf improvement

2020-10-03 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan reassigned HADOOP-17250:
--

Assignee: Mukund Thakur  (was: Sneha Vijayarajan)

> ABFS: Random read perf improvement
> --
>
> Key: HADOOP-17250
> URL: https://issues.apache.org/jira/browse/HADOOP-17250
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: abfsactive, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Random read if marginally read ahead was seen to improve perf for a TPCH 
> query. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17296) ABFS: Allow Random Reads to be of Buffer Size

2020-10-03 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17296:
--

 Summary: ABFS: Allow Random Reads to be of Buffer Size
 Key: HADOOP-17296
 URL: https://issues.apache.org/jira/browse/HADOOP-17296
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
for when the read pattern is random. 

It was observed in some spark jobs that though the reads are random, the next 
read doesn't skip by a lot and can be served by the earlier read if read was 
done in buffer size. As a result the job triggered a higher count of read calls 
and resulted in higher job runtime.

When these jobs were run against Gen1 which always reads in buffer size , the 
jobs fared well. 

In this Jira we try to provide a control over config on random read to be of 
requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17250) ABFS: Random read perf improvement

2020-10-03 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17250:
---
Description: 
Random read if marginally read ahead was seen to improve perf for a TPCH query. 

 

  was:
ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
for when the read pattern is random. 

It was observed in some spark jobs that though the reads are random, the next 
read doesn't skip by a lot and can be served by the earlier read if read was 
done in buffer size. As a result the job triggered a higher count of read calls 
and resulted in higher job runtime.

When these jobs were run against Gen1 which always reads in buffer size , the 
jobs fared well. 

In this Jira we try to provide a control over config on random read to be of 
requested size or buffer size.


> ABFS: Random read perf improvement
> --
>
> Key: HADOOP-17250
> URL: https://issues.apache.org/jira/browse/HADOOP-17250
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Random read if marginally read ahead was seen to improve perf for a TPCH 
> query. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17250) ABFS: Random read perf improvement

2020-10-03 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17250:
---
Summary: ABFS: Random read perf improvement  (was: ABFS: Allow random read 
sizes to be of buffer size)

> ABFS: Random read perf improvement
> --
>
> Key: HADOOP-17250
> URL: https://issues.apache.org/jira/browse/HADOOP-17250
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive, pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
> for when the read pattern is random. 
> It was observed in some spark jobs that though the reads are random, the next 
> read doesn't skip by a lot and can be served by the earlier read if read was 
> done in buffer size. As a result the job triggered a higher count of read 
> calls and resulted in higher job runtime.
> When these jobs were run against Gen1 which always reads in buffer size , the 
> jobs fared well. 
> In this Jira we try to provide a control over config on random read to be of 
> requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17279) ABFS: Test testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account

2020-09-23 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17279:
---
Status: Patch Available  (was: Open)

> ABFS: Test testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS 
> account
> 
>
> Key: HADOOP-17279
> URL: https://issues.apache.org/jira/browse/HADOOP-17279
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Test testNegativeScenariosForCreateOverwriteDisabled fails when run against a 
> non-HNS account. The test creates a mock AbfsClient to mimic negative 
> scenarios.
> Mock is triggered for valid values that come in for permission and umask 
> while creating a file. Permission and umask get defaulted to null values with 
> driver when creating a file for a nonHNS account. The mock trigger was not 
> enabled for these null parameters.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17279) ABFS: Test testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account

2020-09-22 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17279:
--

 Summary: ABFS: Test 
testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account
 Key: HADOOP-17279
 URL: https://issues.apache.org/jira/browse/HADOOP-17279
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Test testNegativeScenariosForCreateOverwriteDisabled fails when run against a 
non-HNS account. The test creates a mock AbfsClient to mimic negative scenarios.

Mock is triggered for valid values that come in for permission and umask while 
creating a file. Permission and umask get defaulted to null values with driver 
when creating a file for a nonHNS account. The mock trigger was not enabled for 
these null parameters.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17260) ABFS: Test testAbfsStreamOps timing out

2020-09-13 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17260:
--

 Summary: ABFS: Test testAbfsStreamOps timing out 
 Key: HADOOP-17260
 URL: https://issues.apache.org/jira/browse/HADOOP-17260
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Test testAbfsStreamOps is timing out when log4j settings are at DEBUG/TRACE 
level for AbfsInputStream.

log4j.logger.org.apache.hadoop.fs.azurebfs.services.AbfsInputStream=TRACE

 

org.junit.runners.model.TestTimedOutException: test timed out after 90 
millisecondsorg.junit.runners.model.TestTimedOutException: test timed out after 
90 milliseconds
 at java.lang.Throwable.getStackTraceElement(Native Method) at 
java.lang.Throwable.getOurStackTrace(Throwable.java:828) at 
java.lang.Throwable.getStackTrace(Throwable.java:817) at 
sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.log4j.spi.LocationInfo.(LocationInfo.java:139) at 
org.apache.log4j.spi.LoggingEvent.getLocationInformation(LoggingEvent.java:253) 
at 
org.apache.log4j.helpers.PatternParser$LocationPatternConverter.convert(PatternParser.java:500)
 at org.apache.log4j.helpers.PatternConverter.format(PatternConverter.java:65) 
at org.apache.log4j.PatternLayout.format(PatternLayout.java:506) at 
org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310) at 
org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at 
org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
 at org.apache.log4j.Category.callAppenders(Category.java:206) at 
org.apache.log4j.Category.forcedLog(Category.java:391) at 
org.apache.log4j.Category.log(Category.java:856) at 
org.slf4j.impl.Log4jLoggerAdapter.debug(Log4jLoggerAdapter.java:273) at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readOneBlock(AbfsInputStream.java:150)
 at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:131)
 at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:104)
 at java.io.FilterInputStream.read(FilterInputStream.java:83) at 
org.apache.hadoop.fs.azurebfs.AbstractAbfsTestWithTimeout.validateContent(AbstractAbfsTestWithTimeout.java:117)
 at 
org.apache.hadoop.fs.azurebfs.ITestAbfsStreamStatistics.testAbfsStreamOps(ITestAbfsStreamStatistics.java:155)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
 at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17250) ABFS: Allow random read sizes to be of buffer size

2020-09-08 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17250:
--

 Summary: ABFS: Allow random read sizes to be of buffer size
 Key: HADOOP-17250
 URL: https://issues.apache.org/jira/browse/HADOOP-17250
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.1.4
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


ADLS Gen2/ABFS driver is optimized to read only the bytes that are requested 
for when the read pattern is random. 

It was observed in some spark jobs that though the reads are random, the next 
read doesn't skip by a lot and can be served by the earlier read if read was 
done in buffer size. As a result the job triggered a higher count of read calls 
and resulted in higher job runtime.

When these jobs were run against Gen1 which always reads in buffer size , the 
jobs fared well. 

In this Jira we try to provide a control over config on random read to be of 
requested size or buffer size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17229) Test failure as failed request body counted in byte received metric - ITestAbfsNetworkStatistics#testAbfsHttpResponseStatistics

2020-08-27 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17185666#comment-17185666
 ] 

Sneha Vijayarajan commented on HADOOP-17229:


[~mehakmeetSingh] - the scenario mentioned in the description can reproduce 
this. 
 # Create 0 byte file
 # Create with overwrite = false on same path. This request will fail.

 

The second step will end up adding bytes received from the error response body.

> Test failure as failed request body counted in byte received metric - 
> ITestAbfsNetworkStatistics#testAbfsHttpResponseStatistics
> ---
>
> Key: HADOOP-17229
> URL: https://issues.apache.org/jira/browse/HADOOP-17229
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: abfsactive
>
> Bytes received counter increments for every request response received. 
> [https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]
> This increments even for failed requests. 
> Observed during testing done for HADOOP-17215. A request failed with 409 
> Conflict contains response body as below:
> {"error":\{"code":"PathAlreadyExists","message":"The specified path already 
> exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}
> The error body of 168 size is incremented in bytes_received counter. 
> This also breaks the testcase testAbfsHttpResponseStatistics. 
> {code:java}
> [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 22.746 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] Tests run: 2, 
> Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 22.746 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] 
> testAbfsHttpResponseStatistics(org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics)
>   Time elapsed: 13.183 s  <<< FAILURE!java.lang.AssertionError: Mismatch in 
> bytes_received expected:<143> but was:<311> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.apache.hadoop.fs.azurebfs.AbstractAbfsIntegrationTest.assertAbfsStatistics(AbstractAbfsIntegrationTest.java:445)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics.testAbfsHttpResponseStatistics(ITestAbfsNetworkStatistics.java:291)
> {code}
> [~mehakmeetSingh] - is the bytes_received counter increment for failed 
> requests an expected behaviour ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17215) ABFS: Excessive Create Overwrites leads to race conditions

2020-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17215:
---
Status: Patch Available  (was: Open)

> ABFS: Excessive Create Overwrites leads to race conditions
> --
>
> Key: HADOOP-17215
> URL: https://issues.apache.org/jira/browse/HADOOP-17215
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Filesystem Create APIs that do not accept an argument for overwrite flag end 
> up defaulting it to true. 
> We are observing that request count of creates with overwrite=true is more 
> and primarily because of the default setting of the flag is true of the 
> called Create API. When a create with overwrite ends up timing out, we have 
> observed that it could lead to race conditions between the first create and 
> retried one running almost parallel.
> To avoid this scenario for create with overwrite=true request, ABFS driver 
> will always attempt to create without overwrite. If the create fails due to 
> fileAlreadyPresent, it will resend the request with overwrite=true. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17215) ABFS: Excessive Create Overwrites leads to race conditions

2020-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17215:
---
Component/s: fs/azure

> ABFS: Excessive Create Overwrites leads to race conditions
> --
>
> Key: HADOOP-17215
> URL: https://issues.apache.org/jira/browse/HADOOP-17215
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> Filesystem Create APIs that do not accept an argument for overwrite flag end 
> up defaulting it to true. 
> We are observing that request count of creates with overwrite=true is more 
> and primarily because of the default setting of the flag is true of the 
> called Create API. When a create with overwrite ends up timing out, we have 
> observed that it could lead to race conditions between the first create and 
> retried one running almost parallel.
> To avoid this scenario for create with overwrite=true request, ABFS driver 
> will always attempt to create without overwrite. If the create fails due to 
> fileAlreadyPresent, it will resend the request with overwrite=true. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17215) ABFS: Excessive Create Overwrites leads to race conditions

2020-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17215:
---
Labels: abfsactive  (was: )

> ABFS: Excessive Create Overwrites leads to race conditions
> --
>
> Key: HADOOP-17215
> URL: https://issues.apache.org/jira/browse/HADOOP-17215
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Filesystem Create APIs that do not accept an argument for overwrite flag end 
> up defaulting it to true. 
> We are observing that request count of creates with overwrite=true is more 
> and primarily because of the default setting of the flag is true of the 
> called Create API. When a create with overwrite ends up timing out, we have 
> observed that it could lead to race conditions between the first create and 
> retried one running almost parallel.
> To avoid this scenario for create with overwrite=true request, ABFS driver 
> will always attempt to create without overwrite. If the create fails due to 
> fileAlreadyPresent, it will resend the request with overwrite=true. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17215) ABFS: Excessive Create Overwrites leads to race conditions

2020-08-26 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17215:
---
Affects Version/s: 3.3.0

> ABFS: Excessive Create Overwrites leads to race conditions
> --
>
> Key: HADOOP-17215
> URL: https://issues.apache.org/jira/browse/HADOOP-17215
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> Filesystem Create APIs that do not accept an argument for overwrite flag end 
> up defaulting it to true. 
> We are observing that request count of creates with overwrite=true is more 
> and primarily because of the default setting of the flag is true of the 
> called Create API. When a create with overwrite ends up timing out, we have 
> observed that it could lead to race conditions between the first create and 
> retried one running almost parallel.
> To avoid this scenario for create with overwrite=true request, ABFS driver 
> will always attempt to create without overwrite. If the create fails due to 
> fileAlreadyPresent, it will resend the request with overwrite=true. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17229) Test failure as failed request body counted in byte received metric - ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics

2020-08-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17229:
---
Labels: abfsactive  (was: )

> Test failure as failed request body counted in byte received metric - 
> ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics
> ---
>
> Key: HADOOP-17229
> URL: https://issues.apache.org/jira/browse/HADOOP-17229
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: abfsactive
>
> Bytes received counter increments for every request response received. 
> [https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]
> This increments even for failed requests. 
> Observed during testing done for HADOOP-17215. A request failed with 409 
> Conflict contains response body as below:
> {"error":\{"code":"PathAlreadyExists","message":"The specified path already 
> exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}
> The error body of 168 size is incremented in bytes_received counter. 
> This also breaks the testcase testAbfsHttpResponseStatistics. 
> {code:java}
> [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 22.746 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] Tests run: 2, 
> Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 22.746 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] 
> testAbfsHttpResponseStatistics(org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics)
>   Time elapsed: 13.183 s  <<< FAILURE!java.lang.AssertionError: Mismatch in 
> bytes_received expected:<143> but was:<311> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.apache.hadoop.fs.azurebfs.AbstractAbfsIntegrationTest.assertAbfsStatistics(AbstractAbfsIntegrationTest.java:445)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics.testAbfsHttpResponseStatistics(ITestAbfsNetworkStatistics.java:291)
> {code}
> [~mehakmeetSingh] - is the bytes_received counter increment for failed 
> requests an expected behaviour ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17229) Test failure as failed request body counted in byte received metric - ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics

2020-08-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17229:
---
Description: 
Bytes received counter increments for every request response received. 

[https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]

This increments even for failed requests. 

Observed during testing done for HADOOP-17215. A request failed with 409 
Conflict contains response body as below:

{"error":\{"code":"PathAlreadyExists","message":"The specified path already 
exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}

The error body of 168 size is incremented in bytes_received counter. 

This also breaks the testcase testAbfsHttpResponseStatistics. 
{code:java}
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 22.746 
s <<< FAILURE! - in 
org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] Tests run: 2, 
Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 22.746 s <<< FAILURE! - in 
org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] 
testAbfsHttpResponseStatistics(org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics)
  Time elapsed: 13.183 s  <<< FAILURE!java.lang.AssertionError: Mismatch in 
bytes_received expected:<143> but was:<311> at 
org.junit.Assert.fail(Assert.java:88) at 
org.junit.Assert.failNotEquals(Assert.java:834) at 
org.junit.Assert.assertEquals(Assert.java:645) at 
org.apache.hadoop.fs.azurebfs.AbstractAbfsIntegrationTest.assertAbfsStatistics(AbstractAbfsIntegrationTest.java:445)
 at 
org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics.testAbfsHttpResponseStatistics(ITestAbfsNetworkStatistics.java:291)
{code}
[~mehakmeetSingh] - is the bytes_received counter increment for failed requests 
an expected behaviour ?

  was:
Bytes received counter increments for every request response received. 

[https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]

This increments even for failed requests. 

Observed during testing done for HADOOP-17215. A request failed with 409 
Conflict contains response body as below:

{"error":\{"code":"PathAlreadyExists","message":"The specified path already 
exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}

The error body of 168 size is incremented in bytes_received counter. 

This also breaks the testcase testAbfsHttpResponseStatistics. 

[~mehakmeetSingh] - is the bytes_received counter increment for failed requests 
an expected behaviour ?


> Test failure as failed request body counted in byte received metric - 
> ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics
> ---
>
> Key: HADOOP-17229
> URL: https://issues.apache.org/jira/browse/HADOOP-17229
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Mehakmeet Singh
>Priority: Major
>
> Bytes received counter increments for every request response received. 
> [https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]
> This increments even for failed requests. 
> Observed during testing done for HADOOP-17215. A request failed with 409 
> Conflict contains response body as below:
> {"error":\{"code":"PathAlreadyExists","message":"The specified path already 
> exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}
> The error body of 168 size is incremented in bytes_received counter. 
> This also breaks the testcase testAbfsHttpResponseStatistics. 
> {code:java}
> [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
> 22.746 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] Tests run: 2, 
> Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 22.746 s <<< FAILURE! - in 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] 
> testAbfsHttpResponseStatistics(org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics)
>   Time elapsed: 13.183 s  <<< FAILURE!java.lang.AssertionError: Mismatch in 
> bytes_received expected:<143> but was:<311> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.apache.hadoop.fs.azurebfs.AbstractAbfsIntegrationTest.asser

[jira] [Updated] (HADOOP-17229) Test failure as failed request body counted in byte received metric - ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics

2020-08-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17229:
---
Description: 
Bytes received counter increments for every request response received. 

[https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]

This increments even for failed requests. 

Observed during testing done for HADOOP-17215. A request failed with 409 
Conflict contains response body as below:

{"error":\{"code":"PathAlreadyExists","message":"The specified path already 
exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}

The error body of 168 size is incremented in bytes_received counter. 

This also breaks the testcase testAbfsHttpResponseStatistics. 

[~mehakmeetSingh] - is the bytes_received counter increment for failed requests 
an expected behaviour ?

> Test failure as failed request body counted in byte received metric - 
> ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics
> ---
>
> Key: HADOOP-17229
> URL: https://issues.apache.org/jira/browse/HADOOP-17229
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Mehakmeet Singh
>Priority: Major
>
> Bytes received counter increments for every request response received. 
> [https://github.com/apache/hadoop/blob/d23cc9d85d887f01d72180bdf1af87dfdee15c5a/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsRestOperation.java#L251]
> This increments even for failed requests. 
> Observed during testing done for HADOOP-17215. A request failed with 409 
> Conflict contains response body as below:
> {"error":\{"code":"PathAlreadyExists","message":"The specified path already 
> exists.\nRequestId:c3b2c55c-b01f-0061-7b31-7b6ee300\nTime:2020-08-25T22:44:07.2356054Z"}}
> The error body of 168 size is incremented in bytes_received counter. 
> This also breaks the testcase testAbfsHttpResponseStatistics. 
> [~mehakmeetSingh] - is the bytes_received counter increment for failed 
> requests an expected behaviour ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17229) Test failure as failed request body counted in byte received metric - ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics

2020-08-25 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17229:
---
Description: (was: Intermittent test timeout for 
ITestAbfsInputStreamStatistics#testReadAheadCounters happening due to race 
conditions in readAhead threads.

Test error:


{code:java}
[ERROR] 
testReadAheadCounters(org.apache.hadoop.fs.azurebfs.ITestAbfsInputStreamStatistics)
  Time elapsed: 30.723 s  <<< 
ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds    at java.lang.Thread.sleep(Native Method)    at 
org.apache.hadoop.fs.azurebfs.ITestAbfsInputStreamStatistics.testReadAheadCounters(ITestAbfsInputStreamStatistics.java:346)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at 
java.lang.Thread.run(Thread.java:748) {code}
Possible Reasoning:

- ReadAhead queue doesn't get completed and hence the counter values are not 
satisfied in 30 seconds time for some systems.

- The condition that readAheadBytesRead and remoteBytesRead counter values need 
to be greater than or equal to 4KB and 32KB respectively doesn't occur in some 
machines due to the fact that sometimes instead of reading for readAhead 
Buffer, remote reads are performed due to Threads still being in the readAhead 
queue to fill that buffer. Thus resulting in either of the 2 counter values to 
be not satisfying the condition and getting in an infinite loop and hence 
timing out the test eventually.

Possible Fixes:

- Write better test(That would pass under all conditions).
- Maybe UT instead of IT?

Possible fix to better the test would be preferable and UT as the last resort.)

> Test failure as failed request body counted in byte received metric - 
> ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics
> ---
>
> Key: HADOOP-17229
> URL: https://issues.apache.org/jira/browse/HADOOP-17229
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Mehakmeet Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17229) Test failure as failed request body counted in byte received metric - ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics

2020-08-25 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17229:
--

 Summary: Test failure as failed request body counted in byte 
received metric - ITestAbfsInputStreamStatistics#testAbfsHttpResponseStatistics
 Key: HADOOP-17229
 URL: https://issues.apache.org/jira/browse/HADOOP-17229
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.3.0
Reporter: Sneha Vijayarajan
Assignee: Mehakmeet Singh


Intermittent test timeout for 
ITestAbfsInputStreamStatistics#testReadAheadCounters happening due to race 
conditions in readAhead threads.

Test error:


{code:java}
[ERROR] 
testReadAheadCounters(org.apache.hadoop.fs.azurebfs.ITestAbfsInputStreamStatistics)
  Time elapsed: 30.723 s  <<< 
ERROR!org.junit.runners.model.TestTimedOutException: test timed out after 3 
milliseconds    at java.lang.Thread.sleep(Native Method)    at 
org.apache.hadoop.fs.azurebfs.ITestAbfsInputStreamStatistics.testReadAheadCounters(ITestAbfsInputStreamStatistics.java:346)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)    
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at 
java.lang.Thread.run(Thread.java:748) {code}
Possible Reasoning:

- ReadAhead queue doesn't get completed and hence the counter values are not 
satisfied in 30 seconds time for some systems.

- The condition that readAheadBytesRead and remoteBytesRead counter values need 
to be greater than or equal to 4KB and 32KB respectively doesn't occur in some 
machines due to the fact that sometimes instead of reading for readAhead 
Buffer, remote reads are performed due to Threads still being in the readAhead 
queue to fill that buffer. Thus resulting in either of the 2 counter values to 
be not satisfying the condition and getting in an infinite loop and hence 
timing out the test eventually.

Possible Fixes:

- Write better test(That would pass under all conditions).
- Maybe UT instead of IT?

Possible fix to better the test would be preferable and UT as the last resort.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17215) ABFS: Excessive Create Overwrites leads to race conditions

2020-08-19 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17215:
---
Summary: ABFS: Excessive Create Overwrites leads to race conditions  (was: 
ABFS: Excessive Create overwrites leading to unnecessary extra transactions)

> ABFS: Excessive Create Overwrites leads to race conditions
> --
>
> Key: HADOOP-17215
> URL: https://issues.apache.org/jira/browse/HADOOP-17215
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>
> Filesystem Create APIs that do not accept an argument for overwrite flag end 
> up defaulting it to true. 
> We are observing that request count of creates with overwrite=true is more 
> and primarily because of the default setting of the flag is true of the 
> called Create API. When a create with overwrite ends up timing out, we have 
> observed that it could lead to race conditions between the first create and 
> retried one running almost parallel.
> To avoid this scenario for create with overwrite=true request, ABFS driver 
> will always attempt to create without overwrite. If the create fails due to 
> fileAlreadyPresent, it will resend the request with overwrite=true. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17215) ABFS: Excessive Create overwrites leading to unnecessary extra transactions

2020-08-19 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17215:
--

 Summary: ABFS: Excessive Create overwrites leading to unnecessary 
extra transactions
 Key: HADOOP-17215
 URL: https://issues.apache.org/jira/browse/HADOOP-17215
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


Filesystem Create APIs that do not accept an argument for overwrite flag end up 
defaulting it to true. 

We are observing that request count of creates with overwrite=true is more and 
primarily because of the default setting of the flag is true of the called 
Create API. When a create with overwrite ends up timing out, we have observed 
that it could lead to race conditions between the first create and retried one 
running almost parallel.

To avoid this scenario for create with overwrite=true request, ABFS driver will 
always attempt to create without overwrite. If the create fails due to 
fileAlreadyPresent, it will resend the request with overwrite=true. 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16966) ABFS: Upgrade Store REST API Version to 2019-12-12

2020-08-17 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-16966:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> ABFS: Upgrade Store REST API Version to 2019-12-12
> --
>
> Key: HADOOP-16966
> URL: https://issues.apache.org/jira/browse/HADOOP-16966
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Ishani
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Store REST API version on the backend clusters has been upgraded to 
> 2019-12-12. This Jira will align the Driver requests to reflect this latest 
> API version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17195) Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan reassigned HADOOP-17195:
--

Assignee: Bilahari T H  (was: Mehakmeet Singh)

> Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs 
> ---
>
> Key: HADOOP-17195
> URL: https://issues.apache.org/jira/browse/HADOOP-17195
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Mehakmeet Singh
>Assignee: Bilahari T H
>Priority: Major
>  Labels: abfsactive
>
> OutOfMemory error due to new ThreadPools being made each time 
> AbfsOutputStream is created. Since threadPool aren't limited a lot of data is 
> loaded in buffer and thus it causes OutOfMemory error.
> Possible fixes:
> - Limit the number of ThreadCounts while performing hdfs copyFromLocal (Using 
> -t property).
> - Reducing OUTPUT_BUFFER_SIZE significantly which would limit the amount of 
> buffer to be loaded in threads.
> - Don't create new ThreadPools each time AbfsOutputStream is created and 
> limit the number of ThreadPools each AbfsOutputStream could create.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17195) Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs

2020-08-16 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178716#comment-17178716
 ] 

Sneha Vijayarajan commented on HADOOP-17195:


Assigning it to [~bilahari.th] as he is already working on a similar JIRA. 

> Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs 
> ---
>
> Key: HADOOP-17195
> URL: https://issues.apache.org/jira/browse/HADOOP-17195
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Mehakmeet Singh
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: abfsactive
>
> OutOfMemory error due to new ThreadPools being made each time 
> AbfsOutputStream is created. Since threadPool aren't limited a lot of data is 
> loaded in buffer and thus it causes OutOfMemory error.
> Possible fixes:
> - Limit the number of ThreadCounts while performing hdfs copyFromLocal (Using 
> -t property).
> - Reducing OUTPUT_BUFFER_SIZE significantly which would limit the amount of 
> buffer to be loaded in threads.
> - Don't create new ThreadPools each time AbfsOutputStream is created and 
> limit the number of ThreadPools each AbfsOutputStream could create.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17195) Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17195:
---
Labels: abfsactive  (was: )

> Intermittent OutOfMemory error while performing hdfs CopyFromLocal to abfs 
> ---
>
> Key: HADOOP-17195
> URL: https://issues.apache.org/jira/browse/HADOOP-17195
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Mehakmeet Singh
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: abfsactive
>
> OutOfMemory error due to new ThreadPools being made each time 
> AbfsOutputStream is created. Since threadPool aren't limited a lot of data is 
> loaded in buffer and thus it causes OutOfMemory error.
> Possible fixes:
> - Limit the number of ThreadCounts while performing hdfs copyFromLocal (Using 
> -t property).
> - Reducing OUTPUT_BUFFER_SIZE significantly which would limit the amount of 
> buffer to be loaded in threads.
> - Don't create new ThreadPools each time AbfsOutputStream is created and 
> limit the number of ThreadPools each AbfsOutputStream could create.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17203) Test failures in ITestAzureBlobFileSystemCheckAccess in ABFS

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan reassigned HADOOP-17203:
--

Assignee: Bilahari T H

> Test failures in ITestAzureBlobFileSystemCheckAccess in ABFS
> 
>
> Key: HADOOP-17203
> URL: https://issues.apache.org/jira/browse/HADOOP-17203
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Mehakmeet Singh
>Assignee: Bilahari T H
>Priority: Major
>  Labels: abfsactive
>
> ITestAzureBlobFileSystemCheckAccess is giving test failures while running 
> both in parallel as well as in stand-alone(in IDE).
> Tested by:  mvn -T 1C -Dparallel-tests=abfs clean verify
>  Region: East US



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16791) ABFS: Have all external dependent module execution tracked with DurationInfo

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-16791:
---
Labels: abfsbacklog  (was: )

> ABFS: Have all external dependent module execution tracked with DurationInfo
> 
>
> Key: HADOOP-16791
> URL: https://issues.apache.org/jira/browse/HADOOP-16791
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsbacklog
>
> To be able to break down the perf impact of the external module executions 
> within ABFS Driver, add execution time computation using DurationInfo in all 
> the relative places. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16937) Multiple Tests failure in hadoop-azure componnt

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan resolved HADOOP-16937.

Resolution: Cannot Reproduce

Current latest ABFS driver test run from Apache trunk branch is not seeing any 
failures for open tests. Hence resolving this JIRA.

> Multiple Tests failure in hadoop-azure componnt
> ---
>
> Key: HADOOP-16937
> URL: https://issues.apache.org/jira/browse/HADOOP-16937
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Affects Versions: 3.3.0
>Reporter: Mukund Thakur
>Assignee: Ishani
>Priority: Major
>
> I am seeing many test failures in hadoop azure in trunk. Posting some samples:
> [*ERROR*] *Failures:* 
> [*ERROR*]   
> *ITestAbfsContractUnbuffer>AbstractContractUnbufferTest.testMultipleUnbuffers:100->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88
>  failed to read expected number of bytes from stream expected:<1024> but 
> was:<-1>*
> [*ERROR*]   
> *ITestAbfsContractUnbuffer>AbstractContractUnbufferTest.testUnbufferAfterRead:53->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88
>  failed to read expected number of bytes from stream expected:<1024> but 
> was:<-1>*
> [*ERROR*]   
> *ITestAbfsContractUnbuffer>AbstractContractUnbufferTest.testUnbufferBeforeRead:63->AbstractContractUnbufferTest.validateFullFileContents:132->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88
>  failed to read expected number of bytes from stream expected:<1024> but 
> was:<-1>*
> [*ERROR*]   
> *ITestAbfsContractUnbuffer>AbstractContractUnbufferTest.testUnbufferMultipleReads:111->AbstractContractUnbufferTest.validateFileContents:139->Assert.assertEquals:645->Assert.failNotEquals:834->Assert.fail:88
>  failed to read expected number of bytes from stream expected:<128> but 
> was:<-1>*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17057) ABFS driver enhancement - Allow customizable translation from AAD SPNs and security groups to Linux user and group

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan resolved HADOOP-17057.

Resolution: Fixed

> ABFS driver enhancement - Allow customizable translation from AAD SPNs and 
> security groups to Linux user and group
> --
>
> Key: HADOOP-17057
> URL: https://issues.apache.org/jira/browse/HADOOP-17057
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/azure
>Reporter: Karthik Amarnath
>Assignee: Karthik Amarnath
>Priority: Major
> Fix For: 3.3.1
>
>
> ABFS driver does not support the translation of AAD Service principal (SPI) 
> to Linux identities causing metadata operation failure. Hadoop MapReduce 
> client 
> [[JobSubmissionFiles|https://github.com/apache/hadoop/blob/d842dfffa53c8b565f3d65af44ccd7e1cc706733/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmissionFiles.java#L138]]
>  expects the file owner permission to be the Linux identity, but the 
> underlying ABFS driver returns the AAD Object identity. Hence need ABFS 
> driver enhancement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17203) Test failures in ITestAzureBlobFileSystemCheckAccess in ABFS

2020-08-16 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17203:
---
Labels: abfsactive  (was: )

> Test failures in ITestAzureBlobFileSystemCheckAccess in ABFS
> 
>
> Key: HADOOP-17203
> URL: https://issues.apache.org/jira/browse/HADOOP-17203
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Mehakmeet Singh
>Priority: Major
>  Labels: abfsactive
>
> ITestAzureBlobFileSystemCheckAccess is giving test failures while running 
> both in parallel as well as in stand-alone(in IDE).
> Tested by:  mvn -T 1C -Dparallel-tests=abfs clean verify
>  Region: East US



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16966) ABFS: Upgrade Store REST API Version to 2019-12-12

2020-08-12 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-16966:
---
Status: Patch Available  (was: Open)

> ABFS: Upgrade Store REST API Version to 2019-12-12
> --
>
> Key: HADOOP-16966
> URL: https://issues.apache.org/jira/browse/HADOOP-16966
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Ishani
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Store REST API version on the backend clusters has been upgraded to 
> 2019-12-12. This Jira will align the Driver requests to reflect this latest 
> API version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-16842) abfs can't access storage account if soft delete is enabled

2020-07-26 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-16842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17165417#comment-17165417
 ] 

Sneha Vijayarajan commented on HADOOP-16842:


ABFS currently doesnt have Soft delete feature support and is in planning phase.

> abfs can't access storage account if soft delete is enabled
> ---
>
> Key: HADOOP-16842
> URL: https://issues.apache.org/jira/browse/HADOOP-16842
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.1.3
>Reporter: Raghvendra Singh
>Priority: Minor
>  Labels: abfsactive
>
> Facing the issue in which if soft delete is enabled on storage account.
> Hadoop fs -ls command fails with 
> {noformat}
>  Operation failed: "This endpoint does not support BlobStorageEvents or 
> SoftDelete. Please disable these account features if you would like to use 
> this endpoint.", 409, HEAD, 
> https://.[dfs.core.windows.net/test-container-1//?upn=false&action=getAccessControl&timeout=90|http://dfs.core.windows.net/test-container-1//?upn=false&action=getAccessControl&timeout=90]
> {noformat}
> Trying to access storage account by issuing below command :
> {noformat}
>  hadoop fs 
> -Dfs.azure.account.auth.type..[dfs.core.windows.net|http://dfs.core.windows.net/]=OAuth
>  
> -Dfs.azure.account.oauth.provider.type..[dfs.core.windows.net|http://dfs.core.windows.net/]=org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider
>  -ls 
> [abfs://test-container-1]@.[dfs.core.windows.net/|http://dfs.core.windows.net/]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17149) ABFS: Test failure: testFailedRequestWhenCredentialsNotCorrect fails when run with SharedKey

2020-07-24 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17149:
---
Description: 
When authentication is set to SharedKey, below test fails.

 

[ERROR]   
ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:161 
Expecting 
org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException 
with text "Server failed to authenticate the request. Make sure the value of 
Authorization header is formed correctly including the signature.", 403 but got 
: "void"

 

This test fails when the newly introduced config "fs.azure.account.hns.enabled" 
is set. This config will avoid network call to check if namespace is enabled, 
whereas the test expects thsi call to be made. 

 

The assert in test to 403 needs check too. Should ideally be 401.

  was:
When authentication is set to SharedKey, below test fails.

 

[ERROR]   
ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:161 
Expecting 
org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException 
with text "Server failed to authenticate the request. Make sure the value of 
Authorization header is formed correctly including the signature.", 403 but got 
: "void"

 

2 problems:
 # This test should probably be disabled for SharedKey
 # Assert is wrong. Expected Http Status code should 401.


> ABFS: Test failure: testFailedRequestWhenCredentialsNotCorrect fails when run 
> with SharedKey
> 
>
> Key: HADOOP-17149
> URL: https://issues.apache.org/jira/browse/HADOOP-17149
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Minor
>  Labels: abfsactive
> Fix For: 3.4.0
>
>
> When authentication is set to SharedKey, below test fails.
>  
> [ERROR]   
> ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:161 
> Expecting 
> org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException 
> with text "Server failed to authenticate the request. Make sure the value of 
> Authorization header is formed correctly including the signature.", 403 but 
> got : "void"
>  
> This test fails when the newly introduced config 
> "fs.azure.account.hns.enabled" is set. This config will avoid network call to 
> check if namespace is enabled, whereas the test expects thsi call to be made. 
>  
> The assert in test to 403 needs check too. Should ideally be 401.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17137) ABFS: Tests ITestAbfsNetworkStatistics need to be config setting agnostic

2020-07-23 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17137:
---
Labels: abfsactive  (was: )

> ABFS: Tests ITestAbfsNetworkStatistics need to be config setting agnostic
> -
>
> Key: HADOOP-17137
> URL: https://issues.apache.org/jira/browse/HADOOP-17137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Minor
>  Labels: abfsactive
>
> Tess in ITestAbfsNetworkStatistics have asserts to a  static number of 
> network calls made from the start of fileystem instance creation. But this 
> number of calls are dependent on the certain configs settings which allow 
> creation of container or account is HNS enabled to avoid GetAcl call.
>  
> The tests need to be modified to ensure that count asserts are made for the 
> requests made by the tests alone.
>  
> {code:java}
> [INFO] Running org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[INFO] 
> Running org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] Tests 
> run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 4.148 s <<< 
> FAILURE! - in org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics[ERROR] 
> testAbfsHttpResponseStatistics(org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics)
>   Time elapsed: 4.148 s  <<< FAILURE!java.lang.AssertionError: Mismatch in 
> get_responses expected:<8> but was:<7> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.apache.hadoop.fs.azurebfs.AbstractAbfsIntegrationTest.assertAbfsStatistics(AbstractAbfsIntegrationTest.java:445)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics.testAbfsHttpResponseStatistics(ITestAbfsNetworkStatistics.java:207)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:748)
> [ERROR] 
> testAbfsHttpSendStatistics(org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics)
>   Time elapsed: 2.987 s  <<< FAILURE!java.lang.AssertionError: Mismatch in 
> connections_made expected:<6> but was:<5> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.apache.hadoop.fs.azurebfs.AbstractAbfsIntegrationTest.assertAbfsStatistics(AbstractAbfsIntegrationTest.java:445)
>  at 
> org.apache.hadoop.fs.azurebfs.ITestAbfsNetworkStatistics.testAbfsHttpSendStatistics(ITestAbfsNetworkStatistics.java:91)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(Fa

[jira] [Updated] (HADOOP-17149) ABFS: Test failure: testFailedRequestWhenCredentialsNotCorrect fails when run with SharedKey

2020-07-23 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17149:
---
Labels: abfsactive  (was: )

> ABFS: Test failure: testFailedRequestWhenCredentialsNotCorrect fails when run 
> with SharedKey
> 
>
> Key: HADOOP-17149
> URL: https://issues.apache.org/jira/browse/HADOOP-17149
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Minor
>  Labels: abfsactive
> Fix For: 3.4.0
>
>
> When authentication is set to SharedKey, below test fails.
>  
> [ERROR]   
> ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:161 
> Expecting 
> org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException 
> with text "Server failed to authenticate the request. Make sure the value of 
> Authorization header is formed correctly including the signature.", 403 but 
> got : "void"
>  
> 2 problems:
>  # This test should probably be disabled for SharedKey
>  # Assert is wrong. Expected Http Status code should 401.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17150) ABFS: Test failure: Disable ITestAzureBlobFileSystemDelegationSAS tests

2020-07-23 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17150:
---
Labels: abfsactive  (was: )

> ABFS: Test failure: Disable ITestAzureBlobFileSystemDelegationSAS tests
> ---
>
> Key: HADOOP-17150
> URL: https://issues.apache.org/jira/browse/HADOOP-17150
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> ITestAzureBlobFileSystemDelegationSAS has tests for the SAS feature in 
> preview stage. The tests should not run until the API version reflects the 
> one in preview as when run against production clusters they will fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-16966) ABFS: Upgrade Store REST API Version to 2019-12-12

2020-07-23 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-16966:
---
Description: 
Store REST API version on the backend clusters has been upgraded to 2019-12-12. 
This Jira will align the Driver requests to reflect this latest API version.

 

  was:
When the new RestVersion(2019-12-12) is enabled in the backend, enable that in 
the driver along with the documentation for the appendblob.key config values 
which are possible with the new RestVersion.

 Configs:

fs.azure.appendblob.directories

 

Summary: ABFS: Upgrade Store REST API Version to 2019-12-12  (was: 
ABFS: Enable new Rest Version and add documentation for appendblob)

> ABFS: Upgrade Store REST API Version to 2019-12-12
> --
>
> Key: HADOOP-16966
> URL: https://issues.apache.org/jira/browse/HADOOP-16966
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Ishani
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Store REST API version on the backend clusters has been upgraded to 
> 2019-12-12. This Jira will align the Driver requests to reflect this latest 
> API version.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17150) ABFS: Test failure: Disable ITestAzureBlobFileSystemDelegationSAS tests

2020-07-23 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17150:
--

 Summary: ABFS: Test failure: Disable 
ITestAzureBlobFileSystemDelegationSAS tests
 Key: HADOOP-17150
 URL: https://issues.apache.org/jira/browse/HADOOP-17150
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan


ITestAzureBlobFileSystemDelegationSAS has tests for the SAS feature in preview 
stage. The tests should not run until the API version reflects the one in 
preview as when run against production clusters they will fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17149) ABFS: Test failure: testFailedRequestWhenCredentialsNotCorrect fails when run with SharedKey

2020-07-23 Thread Sneha Vijayarajan (Jira)
Sneha Vijayarajan created HADOOP-17149:
--

 Summary: ABFS: Test failure: 
testFailedRequestWhenCredentialsNotCorrect fails when run with SharedKey
 Key: HADOOP-17149
 URL: https://issues.apache.org/jira/browse/HADOOP-17149
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Sneha Vijayarajan
Assignee: Sneha Vijayarajan
 Fix For: 3.4.0


When authentication is set to SharedKey, below test fails.

 

[ERROR]   
ITestGetNameSpaceEnabled.testFailedRequestWhenCredentialsNotCorrect:161 
Expecting 
org.apache.hadoop.fs.azurebfs.contracts.exceptions.AbfsRestOperationException 
with text "Server failed to authenticate the request. Make sure the value of 
Authorization header is formed correctly including the signature.", 403 but got 
: "void"

 

2 problems:
 # This test should probably be disabled for SharedKey
 # Assert is wrong. Expected Http Status code should 401.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17132) ABFS: Fix For Idempotency code

2020-07-22 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17132:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> ABFS: Fix For Idempotency code
> --
>
> Key: HADOOP-17132
> URL: https://issues.apache.org/jira/browse/HADOOP-17132
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
> Fix For: 3.4.0
>
>
> Trigger to handle the idempotency code introduced in 
> https://issues.apache.org/jira/browse/HADOOP-17015 is incomplete. 
> This PR is to fix the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17132) ABFS: Fix For Idempotency code

2020-07-20 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17132:
---
Fix Version/s: 3.4.0
Affects Version/s: 3.4.0
   Status: Patch Available  (was: Open)

> ABFS: Fix For Idempotency code
> --
>
> Key: HADOOP-17132
> URL: https://issues.apache.org/jira/browse/HADOOP-17132
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
> Fix For: 3.4.0
>
>
> Trigger to handle the idempotency code introduced in 
> https://issues.apache.org/jira/browse/HADOOP-17015 is incomplete. 
> This PR is to fix the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17132) ABFS: Fix For Idempotency code

2020-07-20 Thread Sneha Vijayarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sneha Vijayarajan updated HADOOP-17132:
---
Description: 
Trigger to handle the idempotency code introduced in 
https://issues.apache.org/jira/browse/HADOOP-17015 is incomplete. 

This PR is to fix the issue.

  was:
Trigger to handle the idempotency code introduced in 
https://issues.apache.org/jira/browse/HADOOP-17137 is incomplete. 

This PR is to fix the issue.


> ABFS: Fix For Idempotency code
> --
>
> Key: HADOOP-17132
> URL: https://issues.apache.org/jira/browse/HADOOP-17132
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Trigger to handle the idempotency code introduced in 
> https://issues.apache.org/jira/browse/HADOOP-17015 is incomplete. 
> This PR is to fix the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17132) ABFS: Fix For Idempotency code

2020-07-20 Thread Sneha Vijayarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161510#comment-17161510
 ] 

Sneha Vijayarajan commented on HADOOP-17132:


Thanks. Fixed description.

> ABFS: Fix For Idempotency code
> --
>
> Key: HADOOP-17132
> URL: https://issues.apache.org/jira/browse/HADOOP-17132
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Sneha Vijayarajan
>Assignee: Sneha Vijayarajan
>Priority: Major
>  Labels: abfsactive
>
> Trigger to handle the idempotency code introduced in 
> https://issues.apache.org/jira/browse/HADOOP-17015 is incomplete. 
> This PR is to fix the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



  1   2   3   >